How to Use AI Lip Reading on Replicate: A Complete Guide for Beginners

Have you ever needed to understand what someone is saying in a video without sound? Maybe you have a video with poor audio quality, or perhaps you're working with footage that has no sound at all. You can now get text transcripts from videos by using AI lip reading technology. It's quicker and more precise than human lip readers. This guide will walk you through how to use this service on a platform called Replicate.

What is Replicate?

Replicate is a website that hosts various AI tools that anyone can use. Like an app store, but for AI services. You don't need to be tech-savvy or know how to code - you just need to upload your video and the AI will do the rest!

Creating Your Replicate Account

Visit the Replicate website (replicate.com)
Click on the "Sign In" button in the top right corner
You can currently only sign in with Github. You will need to create a github account if you don't have one. It's quick to do, like for a regular website account.
Once your Github account is created, go back to Replicate to Sign In. And Sign in using your new github account.
Once verified, you're ready to start using the service

Setting Up Payment

Replicate charges per video processed. Before you can use the lip reading service, you'll need to add payment information:

Click on your profile picture or icon in the top right corner
Select "Billing" from the menu
Click "Add Payment Method"
Enter your credit card information

- The service uses secure payment processing
- You only pay for what you use
- Videos typically cost between $0.10 and $0.50 each to process

Using the Lip Reading Service

Go to the lip reading model page
Look for the upload section (usually a box where you can drag and drop files)
Click "Choose File" or drag your video file into the upload area
!!!Links to an URL are not supported, it will only work if you upload the file from your computer/phone!!!
Wait for the upload to complete
Click "Run" to start the lip reading process
The system will process your video then provide you the text transcript

Example of the Replicate dashboard showing a processed video transcript

Video Requirements for Best Results

Length: Between 2 and 40 seconds
Maximum resolution: 1080p
File types supported: MP4, MOV, MKV, or WebM
Only one person's face should be visible at a time (this is required)
The person's face should be well-lit and clearly visible
The face can be front-view or profile (but at least half the lips must be visible)
Avoid videos where the mouth is covered by masks, hands, or other objects
The closer the camera is to the face, the better (while keeping the full face in frame)
!!!Links to an URL are not supported, it will only work if you upload the file from your computer/phone!!!

Table of Contents

What is Replicate?

Creating Your Replicate Account

Setting Up Payment

Using the Lip Reading Service

Video Requirements for Best Results

Related Articles

Lip Reading Accuracy Statistics