Skip to content
  • There are no suggestions because the search field is empty.

Upload to Multi-Actor Projects

Watch to learn how to use Multi-Actor projects

NOTE: If you are dubbing 1 actor but have additional training footage, you'll need to click on “Multi-Actor projects”. There is no separate section to upload training footage for “Single-Actor projects"

Step 1: Upload to Multi-Actor Projects

 

Hi, I'm Jordan. I'm the Associate Product Manager for onboarding at ub, and I'm here to help you get onboarded to the platform. Thank you so much for choosing us. Now let's get started, and I will say if you have any questions after watching this video, I will link our help center in the description below. Check that out—it goes into much more detail. This will just be a brief introduction. Also, don't hesitate to reach out to support@lipdub.ai. We're happy to help.

Okay, so when you get started with Lipdub, your projects will probably be pretty empty. So let's create a new one here. You can name your project. We'll call this Project 1.1. What is the source language of the video you'd like to lip dub? So this is for my content—I believe it is Russian. And what language do you want to lip dub into? For me, it's English.

Now you'll be asked to select this checkbox, which just says you have the legal rights to lip dub and lip sync the actors found in this content. So now we can select “Create Project.” When you first open your project, you'll see this UI, and if you look on the left-hand screen, every project is broken up into different scenes. Within every scene, there are four main steps you need to go through before you can actually generate a lip dub video result. This is what I'm going to walk you through.

Before I walk you through these four steps, let me just briefly explain what a scene is. Lipdub studies every frame where a person's face appears on screen. A person's face on screen can vary widely. For example, if I upload a full feature film like Mission: Impossible, the person's face may be on a mountain, in a deep dark cave, or in a nightclub with flashing lights. These are all very different lighting environments and backgrounds, and visually the actor's face will look very different even though it's the same actor. That’s why content is broken up into different scenes.

For films and complex content, this is by no means a limiting factor. You can freely upload a video that you want to lip dub in this section with multiple scene changes and lighting changes and complex content all in one scene. It just increases the chances for artifacts that may appear in your result. In my situation, though, the content I'm working on is just one scene, so I'm going to keep it that way. Even though it changes camera angles and the camera moves a bit, it's still all within the same scene, same location, and same lighting. So I will just keep it all under one scene.

Let me upload my footage that I want to lip dub. Great, so now I’ve uploaded the one video that I want to lip sync. If I want to add more videos, I could. I do want to mention that there are two sections within the upload video step. There's "Lipdub Footage," where you can upload videos you want to lip sync, and then there's "Additional Footage for Training." This is a really important distinction because we assume you do not want those videos to be lip synced. They're purely additional data to help the program study the frames of the actor's faces in this footage—just for training purposes—when we create a specific model for every actor’s face on screen that you want to lip sync. It’s by no means necessary if you have enough data in lip dub footage, but it’s encouraged if you don’t.

Let’s say the lip dub footage is only 15 seconds long—that is not our recommended amount. The recommended amount of data for every person’s face at every camera angle is one to two minutes for each angle and for each actor. The purpose of providing extra data on top of the clip you want to lip dub is because Lipdub studies every frame of when an actor appears on screen and how their face looks. So if I only give a 15-second commercial for Lipdub to sync, that’s not a lot of data—especially if the actor only says one line. That might be only two seconds of actual talking. Therefore, if I upload one minute of footage, it adds more context of what that actor’s mouth should look like and makes it easier to recreate it when we go to generate a result.

Before uploading your footage, it’s really important to understand the limitations and where Lipdub struggles to lip dub the face. Sometimes side profiles can be really difficult to lip dub, or sometimes big beards and high texture can be difficult to replicate to 100% accuracy—especially in closeups. These limitations and preferred video metadata for the Lipdub platform are all detailed in our help center, which I will link below.

So that’s the first step: upload footage. Now we wait for this to finish processing. Okay, I’m back. It’s been about five to ten minutes. My one-minute video has now finished uploading to the platform. I don’t see any errors. I can play the video back—this is just the same source language as I uploaded it. I can now move on to the second step.

Step 2: Label Actors

 

So once you've uploaded your footage, this is the next step: label actors. As soon as you upload your videos to Lip Dub, the platform will automatically detect the faces of any human that appears on screen.

You can imagine if there is a crowd scene and there are a thousand faces in the background and you just have the main speaker, it can take quite a long time for the platform to identify each and every person. That’s why uploading videos with lots of faces in the background could take longer than a video with only a few clearly identifiable faces and no background clutter.

It’s important to know that facial tracking and automatic facial detection sometimes get it wrong. The platform might even miss some facial trackings. That’s why it’s crucial for you to quickly double-check and QC the footage—confirming, for example, that all the facial trackings belong to the correct person. If everything looks right for one actor, you can move on and do the same for the others.

Now, I want to make something clear: the ML (machine learning) technology doesn’t always get it 100% accurate. Sometimes a facial tracking is missed, and it’s up to you as the end user to assign it to the correct person. That’s why we’ve given you the functionality to manually assign them.

For example, you can go to the "unassigned tracks" section from the dropdown menu. You’ll be able to select the tracks of a particular person’s face and assign them. If the dropdown is empty, it’s likely because you haven’t labeled any actors yet. So go ahead and label one—say, call him Charlie. Once you label the identity as Charlie, it won’t automatically merge all of his other face tracks, but you can select them from the dropdown and assign them to Charlie. That will group all of Charlie’s faces together.

If you label someone else—say, Fred—you’ll then be able to assign Fred’s face tracks in the same way. Select the face in "unassigned tracks," assign it, and Fred’s name will appear in the dropdown list. Confirm the assignment, and the facial track will move from the unassigned section to Fred’s grouping in the UI.

Now, let’s say you accidentally assign Charlie’s face to Fred’s group. You’ll see Charlie’s face under Fred’s group, which is incorrect. To fix it, simply remove the selected face from Fred’s group, and it will return to the unassigned section. Then, reassign it to Charlie to correct the grouping.

It’s also important to know that when you move on to the third step—"Train AI"—you will only see the actors you have labeled. If someone isn’t labeled, you won’t be able to train the model on them. That’s why it’s essential to name and label all your actors before continuing.

To do that, label the actor and click "Add." Now all the face tracks under that grouping will be recognized as one identity—say, Mabel—and you’ll be able to train the AI to recognize her.

So that’s the "label actors" step. Now you understand that as soon as you upload your video, just do a quick QC of all the tracked faces. If there are more than three, quickly go through the ones that matter. You don’t need to review all of them—just the ones you plan to lip sync.

If you only want to lip sync three characters and there are a hundred face detections, you only need to label and name the three faces you’re using.

Perfect. That is the second step: label actors