Hume AI Open Sources TADA Text-to-Speech Model with Zero Hallucinations
Hume AI released TADA (Text Audio Dual Alignment), their first open source text-to-speech model. The key innovation is a tokenization architecture that aligns text tokens directly with audio tokens, eliminating the content hallucinations that plague other TTS systems.
Performance
TADA achieves a real-time factor of 0.09, meaning it generates speech roughly 5x faster than comparable LLM-based TTS systems. It supports up to 700 seconds of audio context and can run on mobile devices.
Models Available
Hume released two variants: tada-1b (1 billion parameters) and tada-3b-ml (3 billion parameters, multilingual). Both models, code, and a research paper are fully open source on GitHub and Hugging Face.
Why It Matters
Open source TTS models have lagged behind proprietary options in quality and reliability. TADA's zero-hallucination approach and fast inference speed make it a practical option for developers building voice applications without relying on paid APIs.
Discussion
Sign in to comment. Your account must be at least 1 day old.