Upload a video and audio file to generate lip-synced output.
Note: First run will download required model weights (~2GB).
Adjust face bounding box position
MuseTalk generates lip-synchronized videos from input video and audio files.