Best Practices for Real Human videos

📔Six things you must check BEFORE you upload your AI generated video to LipDub AI.

Are your files supported?
Lip Diversity
Video Length
Audio Length
Number of faces visible in the video
Understand the difference between Single-Actor Projects versus Multi-Actor Projects.

1. Are your files supported by LipDub AI?

Supported video formats and file specifications

If you have your own dub audio that you recorded or that you created using audio platforms such as Elevenlabs then make sure your audio files are supported by click on below:

🎙️

Audio requirements for Lipdub

If you want to create your own audio on Lipdub AI using text to speech you can! But it is only available for Single-Actor Projects. (click below)

Dialogue Replacement

2. Does your video show the actor talking?

No, my actor has a closed mouth for the entire video

Do you have other footage with the same lighting, camera angles, and similar color grade where the same actor(s) is speaking that you can use as training footage?

What is training footage? (click below)

🎥

What is ‘Additional Footage for Training’ & How to capture it On-Set?

Yes. I do have additional footage for training —> please use Multi-Actor Project. See bottom of page.

No. I don’t have additional footage for training —> please select a different video you’d like to use for lipsync.

Why? - LipDub studies how the actor’s lips look like in each viseme. The network then creates a finetuned model for said actor. This finetuning process allows LipDub AI to create seamless results. If the video only shows the actor with a closed mouth then the result will have poor articulation. (see below)

This user uploaded a video with this actor not moving their lips. The result turned out poor since the original video did not have the actor moving their lips.

Yes my actor on screen is moving their mouth

Next Steps: Please continue reading.

3. Is your video at least 30 seconds long?

No, my video is <30 seconds long.

Next Steps: Two options:

Why? - For the highest quality results, please upload at least 30 seconds to 1min of footage of the actor moving their mouths and making different vismes.
- There are diminishing returns when finetuning the AI model so after 5min+ of data, no need to provide any more than 5min.

Yes, my video is at least 30 seconds long.

Next Steps: Please continue reading.

4. Is your audio that you’d like to use for lipsync longer than your video?

Yes, my dub audio is longer than the length of my original video.

Next Steps: Please select a dub audio that is less than or equal to the length of the video.

Why? - LipDub only modifies the lip area. It does not create new video frames. So if your dub audio is longer than your video, then we will cut the audio short and give you back the video.
- Example: If my dub audio is 2min long and my original video is 30 seconds long. Then LipDub will only output 30 second long video.

Here’s a rough idea of where the mask is that LipDub AI pastes back on top of your original video.

No, my dub audio is shorter than my video or I don’t know the length of my audio because I want to create it using the Text-to-Speech feature on LipDub AI

Next Steps: Please continue reading.

5. How many faces are visible in your video?

Only 1 face is visible for my entire video.

Next Steps: Please use Single-Actor Project type. (click below)

📽️

In-depth Video Guide - Single-Actor

This an example of a single actor video. No other faces are visible in frame.

This video is lip-syncing the girl in the background. The user selected Single-Actor project type. But because there are other faces visible, Single-Actor Project types does not know which actor to lip-sync correctly, so it may lip-sync the incorrect person. This videos is considered a multi-actor video. (See next steps below)

Multiple faces are visible in my video

Next Steps: Please use Multi-Actor Project type. (click below)

📽️

I n-depth Video Guides - Multi-Actor

6. What is the difference between Single Actor vs. Multi-Actor projects types?

Single Actor - video guide

Single Actor videos. (where only 1 person is visible in the whole video)

Pros	Cons
Easy to use - Everything is automatic. Simply upload your video and click generate	Lack of control - because everything is automatic, the network may detect a face in the background and it may lip-sync the incorrect person.
Audio features - automatic translation, text-to-speech, SRT upload.	Can only upload ONE video per project - If you want to lip-sync a 2nd video with the same actor in the same lighting, you must create a new Single-Actor Project and train a AI model again for that 2nd video.
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video.
Match shorter audio files - you have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 10 seconds long. (LipDub assumes you don't need the remaining 50 seconds of silence)

Multi-Actor - video guide

Multi Actor videos. (where multiple people’s faces are visible)

Pros	Cons
More control - users can select each face detection that they want to lip-sync.	No audio features - automatic translation, text-to-speech, and SRT upload are not yet available. User must upload their own dub audio that they want to use for lip-sync.
Can upload MULTIPLE videos - If you want to lip-sync five videos with the same actor in the same lighting, you can simply upload all videos to one Advanced flow project.	More prone to user error - because it is more in-depth and requires more user input, the chances of user error goes up.
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video.	Will not match shorter audio files - You have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 1 min long. Why? - This is common in multi-speaker videos. One person speaks for the first 10 seconds but then is silent the remaining of the video, then another person starts to speak later in the video. So LipDub does not assume that it can cut the remaining 50 seconds, unlike Single-Actor flow