Skip to main content

TTS and Audio

ACT3 AI includes full audio management for dialogue, music, voice-over, and sound effects. You can generate dialogue using AI text-to-speech, import recorded audio, and sync everything to your timeline.

TTS Engine: Azure Neural TTS

ACT3 AI uses Azure Neural TTS for voice generation. Azure Neural TTS provides natural-sounding voices across dozens of languages and accents. The generated audio is used for character dialogue and drives both the lip sync duration and NVIDIA Audio2Face facial animation.

Text-to-Speech (TTS)

TTS generates spoken dialogue from your script text without recording anything. Select a voice, paste the dialogue, and ACT3 AI produces a professional audio track ready for lip sync and video assembly.

Generating TTS Audio

  1. Open a scene or shot in the editor
  2. Navigate to the Audio → Dialogue track
  3. Select Generate TTS
  4. Paste or type the dialogue text
  5. Choose a voice from the library
  6. Set speed and expressiveness
  7. Click Generate — the audio appears on your timeline

TTS generation consumes credits at a rate based on length. Short lines cost very few credits.

Voice Selection

The TTS voice library includes voices across:

  • Gender expression (male, female, neutral)
  • Age range (young adult, middle-aged, elderly)
  • Accent and regional variation
  • Tone (authoritative, warm, casual, dramatic)
  • Specializations (narration, dialogue, commercial)

Each voice can be previewed before committing. Save preferred voices to your project for consistent character voice identity.

Importing Audio

Import pre-recorded audio for any track type:

  1. Open the Audio panel in the Editor
  2. Click Add Track → Import
  3. Select your audio file (WAV, MP3, AAC supported)
  4. The file is added to your Asset Library and placed on the timeline

Use WAV files for highest quality during editing. Export and distribute as MP3 or AAC for smaller file sizes.

Audio Track Types

Track TypeDescription
DialogueCharacter speech, synchronized with lip sync
Voice-OverNarration overlaid on video
MusicBackground score, licensed tracks, or AI-generated music
Sound Effects (SFX)Environmental sounds, impacts, ambient
AmbientBackground audio setting the location atmosphere

Multi-Track Mixing

The Audio Mixing Panel lets you control:

  • Volume — Per-track level control
  • Fade In / Out — Smooth audio transitions
  • Panning — Left-right stereo placement
  • Ducking — Automatically lower music when dialogue plays
  • Timeline Alignment — Drag tracks to sync with video events

Audio in the Timeline

Audio tracks appear as colored bands below the video tracks in the Timeline view. You can:

  • Trim the start and end of each audio clip
  • Move clips to different timecodes
  • Layer multiple tracks simultaneously
  • Preview audio playback synchronized with the video preview

Music and Sound Effects

For background music:

  • Import licensed music files you own the rights to
  • Use ACT3 AI's built-in sound library for common ambient sounds and SFX
  • Generate adaptive background scores using the AI music tool (select mood, tempo, and duration)

For sound effects:

  • Browse the built-in SFX library by category (door sounds, footsteps, weather, impacts, etc.)
  • Upload custom sounds from your own library
  • Drag SFX directly onto the timeline aligned with the action

Multi-Language TTS and Dubbing

ACT3 AI supports multi-lingual TTS through Azure Neural TTS. When generating a voice line, select the target language and accent. Supported languages include English (multiple accents), Spanish, French, German, Portuguese, Japanese, Korean, Chinese, and dozens more.

For dubbing existing video into another language:

  1. Generate translated dialogue using TTS in the target language
  2. Place the translated audio track in the timeline
  3. Re-run lip sync on the same digital actor — the mouth animation updates to match the new language's phoneme patterns
  4. Export the dubbed version as a separate file without re-rendering the video

This allows you to produce multi-language versions of the same production efficiently.

Voice Cloning

Upload a short audio sample (30–60 seconds of clear speech) to create a custom voice that matches your recording. Generated lines in the cloned voice are indistinguishable from the original for most use cases. Voice cloning requires explicit consent from the person whose voice is being cloned.

Credit Usage

  • Audio imports are free
  • TTS generation consumes credits based on text length
  • AI-generated music consumes credits based on duration
  • Audio mixing and export are included in the video render cost

Best Practices

  • Use WAV format for all source audio during the editing phase
  • Normalize dialogue tracks to a consistent level to avoid volume inconsistencies
  • Keep music tracks lower in the mix during dialogue scenes — the industry standard is -12 to -18 dB below dialogue
  • For longer projects, name tracks clearly (e.g., "Carter Dialogue Act 2," "Office Ambience")
  • Lock audio tracks once approved to prevent accidental edits during final production

Troubleshooting

Audio is out of sync with video — Check that frame rates match between your audio export settings and your project settings.

TTS voice sounds robotic — Try a different voice preset, reduce speaking speed, or break long lines into shorter phrases.

Music bleeds into dialogue scenes — Use ducking in the Mixing Panel to automatically lower music when speech is detected.

Import failed — Verify the file format is WAV, MP3, or AAC and that the file is not corrupted.