Best Practices for Real Human videos
Read if you upload human actor videos.
- Are your files supported?
- Lip Diversity
- Video Length
- Audio Length
- Number of faces visible in the video
- Understand the difference between Single-Actor Projects versus Multi-Actor Projects.
1. Are your files supported by LipDub AI?
If you have your own dub audio that you recorded or that you created using audio platforms such as Elevenlabs then make sure your audio files are supported by click on below:
If you want to create your own audio on Lipdub AI using text to speech you can! But it is only available for Single-Actor Projects. (click below)
2. Does your video show the actor talking?
- Do you have other footage with the same lighting, camera angles, and similar color grade where the same actor(s) is speaking that you can use as training footage?
- Yes. I do have additional footage for training —> please use Multi-Actor Project. See bottom of page.
- No. I don’t have additional footage for training —> please select a different video you’d like to use for lipsync.
- Why? - LipDub studies how the actor’s lips look like in each viseme. The network then creates a finetuned model for said actor. This finetuning process allows LipDub AI to create seamless results. If the video only shows the actor with a closed mouth then the result will have poor articulation. (see below)
This user uploaded a video with this actor not moving their lips. The result turned out poor since the original video did not have the actor moving their lips.
- Next Steps: Please continue reading.
3. Is your video at least 30 seconds long?
No, my video is <30 seconds long.
- Next Steps: Two options:
- Use the advanced flow project and upload additional footage of the actor speaking for training. (see bottom of page)
- Continue with your short clip and you may get still fantastic results! But there is a possibility for a sub-par output.
- Why? - For the highest quality results, please upload at least 30 seconds to 1min of footage of the actor moving their mouths and making different vismes.
- There are diminishing returns when finetuning the AI model so after 5min+ of data, no need to provide any more than 5min.
- Next Steps: Please continue reading.
- Next Steps: Please select a dub audio that is less than or equal to the length of the video.
- Why? - LipDub only modifies the lip area. It does not create new video frames. So if your dub audio is longer than your video, then we will cut the audio short and give you back the video.
- Example: If my dub audio is 2min long and my original video is 30 seconds long. Then LipDub will only output 30 second long video.
.png?table=block&id=19763dbb-5363-8117-b968-f23817a346f7&cache=v2)
- Next Steps: Please continue reading.
5. How many faces are visible in your video?
Only 1 face is visible for my entire video.- Next Steps: Please use Single-Actor Project type. (click below)
This an example of a single actor video. No other faces are visible in frame.
This video is lip-syncing the girl in the background. The user selected Single-Actor project type. But because there are other faces visible, Single-Actor Project types does not know which actor to lip-sync correctly, so it may lip-sync the incorrect person. This videos is considered a multi-actor video. (See next steps below)
- Next Steps: Please use Multi-Actor Project type. (click below)
6. What is the difference between Single Actor vs. Multi-Actor projects types?
Single Actor - video guide
Single Actor videos. (where only 1 person is visible in the whole video)
Pros
|
Cons
|
Easy to use - Everything is automatic. Simply upload your video and click generate
|
Lack of control - because everything is automatic, the network may detect a face in the background and it may lip-sync the incorrect person.
|
Audio features - automatic translation, text-to-speech, SRT upload.
|
Can only upload ONE video per project - If you want to lip-sync a 2nd video with the same actor in the same lighting, you must create a new Single-Actor Project and train a AI model again for that 2nd video.
|
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video.
|
|
Match shorter audio files - you have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 10 seconds long. (LipDub assumes you don't need the remaining 50 seconds of silence)
|
Multi-Actor - video guide
Multi Actor videos. (where multiple people’s faces are visible)
Pros
|
Cons
|
More control - users can select each face detection that they want to lip-sync.
|
No audio features - automatic translation, text-to-speech, and SRT upload are not yet available. User must upload their own dub audio that they want to use for lip-sync.
|
Can upload MULTIPLE videos - If you want to lip-sync five videos with the same actor in the same lighting, you can simply upload all videos to one Advanced flow project.
|
More prone to user error - because it is more in-depth and requires more user input, the chances of user error goes up.
|
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video.
|
Will not match shorter audio files - You have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 1 min long. Why? - This is common in multi-speaker videos. One person speaks for the first 10 seconds but then is silent the remaining of the video, then another person starts to speak later in the video. So LipDub does not assume that it can cut the remaining 50 seconds, unlike Single-Actor flow
|