Skip to main content
All CollectionsEnterprise Collection
Video Translation Guidelines
Video Translation Guidelines

Let's take a deeper dive into video translation requirements and best practices.

Avi Yaffe avatar
Written by Avi Yaffe
Updated over a month ago

Why Heygen Video translation feature is awesome?

Video Translate is an innovative tool designed to enhance communication by translating the speech in videos in real-time. The following diagram shows the old workflow of translating a video compared to the new one using Heygen:

For optimal performance and to ensure the highest quality of lip sync translation, it is important to follow the guidelines provided below.


Optimal Conditions for Use

Speaker Positioning and Camera Angle

Ideal Speaker Count: Video Translate performs best when there are up to two speakers visible on screen.

Speaker Orientation: To ensure accurate voice capture and translation, speakers should face the camera directly, with their orientation no greater than 45 degrees away from facing the camera straight on. This positioning helps the product accurately capture audio and lip sync for effective video quality.

Camera Distance and Framing

Proximity to Camera: For best results, speakers should be within 10 feet of the camera. This distance allows the product to effectively capture audio clarity and facial expressions, enhancing the translation accuracy and lip sync quality.

Camera Framing: Close-up shots of the speakers are preferred. Close-ups help in capturing detailed visual cues, which are essential for high quality lip sync.


Limitations and Considerations

Camera Movement and Shot Composition

Dynamic Shot Cuts: Video Translate does not perform well in scenarios with frequent and dynamic camera shot cuts. Such conditions can disrupt the continuous capture of audio and visual cues necessary for lip sync. To ensure optimal performance, maintain a steady shot focusing on the speakers.

Speaker Overlap and Audio Clarity

Multiple Speakers Talking Over Each Other: The product's accuracy diminishes when multiple speakers talk simultaneously. For the most effective use of Video Translate, ensure that only one speaker talks at a time, allowing the product to accurately lip sync.

Additional Recommendations

Background Noise: Minimize background noise to ensure the audio captured is as clear as possible. Excessive noise can interfere with speech recognition and lip sync accuracy.


By adhering to these guidelines, users can maximize the effectiveness of Video Translate, ensuring clear, accurate, and timely translation lip sync for videos. These practices are designed to enhance the lip sync quality by optimizing the conditions under which Video Translate operates.

Did this answer your question?