Skip to content
  • There are no suggestions because the search field is empty.

Text-to-Speech

Text-to-speech is in Single Actor projects


Notion Image

Inside an EasyDub Project you now have the ability to replace dialogue in your videos using AI-generated speech, making it easier than ever to customize content directly in the platform.

 A few things to note:
 

Character limit: You can add up to 1,000 characters of text.

Language: Currently, only English is supported (more languages are on the way!).

Video input: Static images won’t lip sync. You need to upload a video showing speaker articulation to properly train the model.

Output length: You can only generate a video that matches the length of the video you uploaded.

 📔 Example: If your input video is 30 seconds but your text-to-speech audio is 1.5 minutes, the output will only be 30 seconds.

LipDub AI does not create new video frames.
 
Video requirements: To avoid mismatched output, upload a video that is at least 2 minutes long. Then, anything up to 1,000 characters of text (approximately 2 minutes of audio) will fit within the video.
 

This is just the first iteration of our text-to-speech feature, and we’re actively working to enhance it in the near future—including support for more languages

 Voice clone for the original actor.
 
📔 Select the “Clone the Speakers voice” option and LipDub will automatically clone the voice of the original speaker so it will sound like them!
 

HARD REQUIREMENT

  • The video that you upload must have audio. LipDub cannot voice clone a video without any audio.
  • For Best results please upload at least a 1min video. This will ensure the platform will have enough data to create a similar sounding voice.

SOFT REQUIREMENT

  • For best results please ensure the original video is voice isolated (i.e. no background noise, or music which may interfere with the voice clone)
 

FAQ:

What if I upload a video with 4 actors that are speaking?

    • Since LipDub EasyDub project is built for single person videos. Uploading a video like this may impact the quality of the result.