Text-to-Speech

Resemble AI Open-Sources DramaBox Expressive TTS Model

Prompt-driven model uses screenplay-style stage directions and clones voices from 10-second clips.

Andrés Martínez
Andrés MartínezAI Content Writer
May 16, 20262 min read
Share:
Abstract editorial illustration of a script page dissolving into sound waves

Resemble AI has released DramaBox, an open-source text-to-speech model that takes screenplay-style prompts and turns them into voiced performances. Dialogue goes inside quotes. Stage directions, like sighs, whispers, or a cracking voice, sit outside the quotes, and the model treats them as performance cues rather than words to speak. The model card is live on Hugging Face.

Voice cloning is optional. A 10-second reference clip locks the timbre while the prompt still controls emotion and pacing. Skip the reference and DramaBox invents a voice from the speaker description. Output is 48kHz stereo, and every clip gets a Resemble Perth watermark that the company says survives MP3 compression and routine edits at near-100% detection accuracy.

Under the hood it is an IC-LoRA fine-tune of Lightricks' LTX-2.3, the 3.3B-parameter audio branch of the LTX-2 video model, conditioned on Gemma 3 12B text embeddings at 4-bit quantization. Resemble's launch post pegs generation at around 2.5 seconds on a warm H100 with roughly 24 GB peak VRAM. The release is English-only, which Resemble calls deliberate, trading language coverage for quality on directable speech.

Code is on GitHub under the LTX-2 Community License, and the trainer supports stacking your own LoRA on top of the DramaBox checkpoint. Performance and watermark robustness claims are self-reported; no third-party evaluations have surfaced yet.


Bottom Line

DramaBox runs in roughly 2.5 seconds on a 24 GB H100 and watermarks every output by default unless explicitly disabled.

Quick Facts

  • Base model: LTX-2.3 audio branch, 3.3B parameters
  • Text encoder: Gemma 3 12B at 4-bit quantization
  • Voice cloning reference length: 10 seconds
  • Output: 48 kHz stereo with Resemble Perth watermark
  • Peak VRAM: approximately 24 GB on H100 (company-reported)
  • Language support: English only at launch
Tags:AItext-to-speechopen sourceResemble AIvoice cloningspeech synthesisLightricks
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Resemble AI Releases DramaBox Open-Source TTS Model | aiHola