
ChindaTTS
Natural Thai-English Text-to-Speech, now live
Production neural TTS for Thai + English — voice assistants, IVR, announcements and narration. Natural Thai prosody, correct numbers and dates, clean long-form, and voice cloning, behind an API that is drop-in compatible with iApp TTS v3.
What it does
🎙️ 3 natural voices
Kaitom (flagship), Kaimook (female) and Kai Daeng (male) — each with a distinct, natural central-Thai delivery.🎭 8 speaking styles
Neutral, friendly, cheerful, calm, serious, sad, excited and empathetic — pick the tone per request.🌏 Thai + English code-switch
Mix Thai and English in one sentence. Pure-English passages auto-switch to a native-English mode, per sentence.🔢 Reads everything correctly
Numbers, dates, currency, percentages, phone/ID runs, abbreviations, emails, URLs and ALL-CAPS emphasis — spoken the way a person would.🧬 Voice cloning
Clone a voice from ~10–20 seconds of consented audio. Stateless — the sample is used once and never stored.⚡ Streaming, long-form
First audio in under ~1 second, and full-length narration up to ~100 seconds per request — no run-on, no cut-off.Hear it for yourself
Real, unedited ChindaTTS samples. For the full interactive demo — every voice, every style, and live voice cloning — visit the showcase at one.iapp.co.th.
The three main voices
Kaitom (flagship)
Kai Daeng (male)
Kaimook (female)
Numbers, dates and money — read correctly
Thai + English code-switching
Speaking styles — same engine, different delivery
Cheerful
Calm
Empathetic
ChindaTTS Prime (optional)
ChindaTTS Prime adds a speech-recognition gate that re-rolls a garbled or cut take — it selects a better take, it does not alter the audio. Same sound; the gain is stability on hard inputs, especially expressive styles.
| Metric — CER (lower is better) | ChindaTTS | ChindaTTS Prime |
|---|---|---|
| Everyday text | 3.2% | 3.2% |
| Numbers / dates / currency | 1.8% | 1.4% |
| Code-switch | 7.9% | 7.6% |
| Expressive styles | 8.3% | 3.9% |
| Bad-take rate (styles) | 15.0% | 6.7% |
| Worst-case p90 (styles) | 31.8% | 11.4% |
Everyday text already sits at the recognizer noise floor; Prime's value shows up on expressive styles, cutting the bad-take rate by more than half. Cost: ~+9% time on easy text, +~2 GB VRAM and a second container; it falls back to standard if the recognizer is down.
Speed (per GPU, ~6–7× real-time)
| Request | Audio | Processing |
|---|---|---|
| ~100 chars | ~9 s | ~1.5 s |
| ~300 chars | ~25 s | ~4 s |
| ~1000 chars | ~90 s | ~14 s |
Streaming delivers first audio in under ~1 second; throughput scales by adding GPUs.
Run / deployment
One current-gen GPU (~8 GB; ~10 GB with Prime, 12–16 GB recommended), 16 GB RAM, Linux, one container (+1 for Prime). Available as a managed API or on-premise. Production-ready for pilots — voices, cloning, streaming, the iApp v3 API and Prime are all live and tested.
Good to know
- Thai + Latin-script only; other scripts (CJK, Arabic) are dropped.
- Up to ~1,200 characters (~100 s) per request; longer text → multiple requests or a batch job.
- Tone is set via the fixed style menu; rate via the
speedparameter. - One stream per GPU; concurrency scales with more GPUs.
Ready to give your product a Thai voice?
Hear every voice and style on the live showcase, then talk to us about a pilot.
🔊 Open the live showcaseContact sales
