TTS and Audio

ACT3 AI includes full audio management for dialogue, music, voice-over, and sound effects. You can generate dialogue using AI text-to-speech, import recorded audio, and sync everything to your timeline.

TTS Engine: Azure Neural TTS

ACT3 AI uses Azure Neural TTS for voice generation. Azure Neural TTS provides natural-sounding voices across dozens of languages and accents. The generated audio is used for character dialogue and drives both the lip sync duration and NVIDIA Audio2Face facial animation.

Text-to-Speech (TTS)

TTS generates spoken dialogue from your script text without recording anything. Select a voice, paste the dialogue, and ACT3 AI produces a professional audio track ready for lip sync and video assembly.

Generating TTS Audio

Open a scene or shot in the editor
Navigate to the Audio → Dialogue track
Select Generate TTS
Paste or type the dialogue text
Choose a voice from the library
Set speed and expressiveness
Click Generate — the audio appears on your timeline

TTS generation consumes credits at a rate based on length. Short lines cost very few credits.

Voice Selection

The TTS voice library includes voices across:

Gender expression (male, female, neutral)
Age range (young adult, middle-aged, elderly)
Accent and regional variation
Tone (authoritative, warm, casual, dramatic)
Specializations (narration, dialogue, commercial)

Each voice can be previewed before committing. Save preferred voices to your project for consistent character voice identity.

Importing Audio

Import pre-recorded audio for any track type:

Open the Audio panel in the Editor
Click Add Track → Import
Select your audio file (WAV, MP3, AAC supported)
The file is added to your Asset Library and placed on the timeline

Use WAV files for highest quality during editing. Export and distribute as MP3 or AAC for smaller file sizes.

Audio Track Types

Track Type	Description
Dialogue	Character speech, synchronized with lip sync
Voice-Over	Narration overlaid on video
Music	Background score, licensed tracks, or AI-generated music
Sound Effects (SFX)	Environmental sounds, impacts, ambient
Ambient	Background audio setting the location atmosphere

Multi-Track Mixing

The Audio Mixing Panel lets you control:

Volume — Per-track level control
Fade In / Out — Smooth audio transitions
Panning — Left-right stereo placement
Ducking — Automatically lower music when dialogue plays
Timeline Alignment — Drag tracks to sync with video events

Audio in the Timeline

Audio tracks appear as colored bands below the video tracks in the Timeline view. You can:

Trim the start and end of each audio clip
Move clips to different timecodes
Layer multiple tracks simultaneously
Preview audio playback synchronized with the video preview

Music and Sound Effects

For background music:

Import licensed music files you own the rights to
Use ACT3 AI's built-in sound library for common ambient sounds and SFX
Generate adaptive background scores using the AI music tool (select mood, tempo, and duration)

For sound effects:

Browse the built-in SFX library by category (door sounds, footsteps, weather, impacts, etc.)
Upload custom sounds from your own library
Drag SFX directly onto the timeline aligned with the action

Multi-Language TTS and Dubbing

ACT3 AI supports multi-lingual TTS through Azure Neural TTS. When generating a voice line, select the target language and accent. Supported languages include English (multiple accents), Spanish, French, German, Portuguese, Japanese, Korean, Chinese, and dozens more.

For dubbing existing video into another language:

Generate translated dialogue using TTS in the target language
Place the translated audio track in the timeline
Re-run lip sync on the same digital actor — the mouth animation updates to match the new language's phoneme patterns
Export the dubbed version as a separate file without re-rendering the video

This allows you to produce multi-language versions of the same production efficiently.

Voice Cloning

Upload a short audio sample (30–60 seconds of clear speech) to create a custom voice that matches your recording. Generated lines in the cloned voice are indistinguishable from the original for most use cases. Voice cloning requires explicit consent from the person whose voice is being cloned.

Credit Usage

Audio imports are free
TTS generation consumes credits based on text length
AI-generated music consumes credits based on duration
Audio mixing and export are included in the video render cost

Best Practices

Use WAV format for all source audio during the editing phase
Normalize dialogue tracks to a consistent level to avoid volume inconsistencies
Keep music tracks lower in the mix during dialogue scenes — the industry standard is -12 to -18 dB below dialogue
For longer projects, name tracks clearly (e.g., "Carter Dialogue Act 2," "Office Ambience")
Lock audio tracks once approved to prevent accidental edits during final production

Troubleshooting

Audio is out of sync with video — Check that frame rates match between your audio export settings and your project settings.

TTS voice sounds robotic — Try a different voice preset, reduce speaking speed, or break long lines into shorter phrases.

Music bleeds into dialogue scenes — Use ducking in the Mixing Panel to automatically lower music when speech is detected.

Import failed — Verify the file format is WAV, MP3, or AAC and that the file is not corrupted.

TTS Engine: Azure Neural TTS​

Text-to-Speech (TTS)​

Generating TTS Audio​

Voice Selection​

Importing Audio​

Audio Track Types​

Multi-Track Mixing​

Audio in the Timeline​

Music and Sound Effects​

Multi-Language TTS and Dubbing​

Voice Cloning​

Credit Usage​

Best Practices​

Troubleshooting​

Related​