Lip-Reading.com

How to Use AI Lip Reading on Replicate: A Complete Guide for Beginners

Table of Contents

Have you ever needed to understand what someone is saying in a video without sound? Maybe you have a video with poor audio quality, or perhaps you're working with footage that has no sound at all. You can now get text transcripts from videos by using AI lip reading technology. It's quicker and more precise than human lip readers. This guide will walk you through how to use this service on a platform called Replicate.

What is Replicate?

Replicate is a website that hosts various AI tools that anyone can use. Like an app store, but for AI services. You don't need to be tech-savvy or know how to code - you just need to upload your video and the AI will do the rest!

Creating Your Replicate Account

  • Visit the Replicate website (replicate.com)
  • Click on the "Sign In" button in the top right corner
  • You can currently only sign in with Github. You will need to create a github account if you don't have one. It's quick to do, like for a regular website account.
  • Once your Github account is created, go back to Replicate to Sign In. And Sign in using your new github account.
  • Once verified, you're ready to start using the service

Setting Up Payment

Replicate charges per video processed. Before you can use the lip reading service, you'll need to add payment information:

  • Click on your profile picture or icon in the top right corner
  • Select "Billing" from the menu
  • Click "Add Payment Method"
  • Enter your credit card information

- The service uses secure payment processing
- You only pay for what you use
- Videos typically cost between $0.10 and $0.50 each to process

Using the Lip Reading Service

  • Go to the lip reading model page
  • Look for the upload section (usually a box where you can drag and drop files)
  • Click "Choose File" or drag your video file into the upload area
    !!!Links to an URL are not supported, it will only work if you upload the file from your computer/phone!!!
  • Wait for the upload to complete
  • Click "Run" to start the lip reading process
  • The system will process your video then provide you the text transcript
Replicate Dashboard Example

Example of the Replicate dashboard showing a processed video transcript

Video Requirements for Best Results

  • Length: Between 2 and 40 seconds
  • Maximum resolution: 1080p
  • File types supported: MP4, MOV, MKV, or WebM
  • Only one person's face should be visible at a time (this is required)
  • The person's face should be well-lit and clearly visible
  • The face can be front-view or profile (but at least half the lips must be visible)
  • Avoid videos where the mouth is covered by masks, hands, or other objects
  • The closer the camera is to the face, the better (while keeping the full face in frame)
  • !!!Links to an URL are not supported, it will only work if you upload the file from your computer/phone!!!