Text-to-Speech

Inside an EasyDub Project you now have the ability to replace dialogue in your videos using AI-generated speech, making it easier than ever to customize content directly in the platform.

A few things to note:

Character limit: You can add up to 1,000 characters of text.

Language: Currently, only English is supported (more languages are on the way!).

Video input: Static images won’t lip sync. You need to upload a video showing speaker articulation to properly train the model.

Output length: You can only generate a video that matches the length of the video you uploaded.

📔 Example: If your input video is 30 seconds but your text-to-speech audio is 1.5 minutes, the output will only be 30 seconds.

LipDub AI does not create new video frames.

Video requirements: To avoid mismatched output, upload a video that is at least 2 minutes long. Then, anything up to 1,000 characters of text (approximately 2 minutes of audio) will fit within the video.

This is just the first iteration of our text-to-speech feature, and we’re actively working to enhance it in the near future—including support for more languages

Voice clone for the original actor.

📔 Select the “Clone the Speakers voice” option and LipDub will automatically clone the voice of the original speaker so it will sound like them!

HARD REQUIREMENT

The video that you upload must have audio. LipDub cannot voice clone a video without any audio.

For Best results please upload at least a 1min video. This will ensure the platform will have enough data to create a similar sounding voice.

SOFT REQUIREMENT

For best results please ensure the original video is voice isolated (i.e. no background noise, or music which may interfere with the voice clone)

FAQ:

What if I upload a video with 4 actors that are speaking?

- Since LipDub EasyDub project is built for single person videos. Uploading a video like this may impact the quality of the result.

Text-to-Speech

Text-to-speech is in Single Actor projects

HARD REQUIREMENT

SOFT REQUIREMENT

FAQ: