Skip to main content
ChindaTTS - Natural Thai speech: tone, numbers, code-switching

ChindaTTS

Natural Thai-English Text-to-Speech, now live

Production neural TTS for Thai + English — voice assistants, IVR, announcements and narration. Natural Thai prosody, correct numbers and dates, clean long-form, and voice cloning, behind an API that is drop-in compatible with iApp TTS v3.

🔊 Hear it live at one.iapp.co.thTalk to us about a pilot


What it does

🎙️ 3 natural voices

Kaitom (flagship), Kaimook (female) and Kai Daeng (male) — each with a distinct, natural central-Thai delivery.

🎭 8 speaking styles

Neutral, friendly, cheerful, calm, serious, sad, excited and empathetic — pick the tone per request.

🌏 Thai + English code-switch

Mix Thai and English in one sentence. Pure-English passages auto-switch to a native-English mode, per sentence.

🔢 Reads everything correctly

Numbers, dates, currency, percentages, phone/ID runs, abbreviations, emails, URLs and ALL-CAPS emphasis — spoken the way a person would.

🧬 Voice cloning

Clone a voice from ~10–20 seconds of consented audio. Stateless — the sample is used once and never stored.

⚡ Streaming, long-form

First audio in under ~1 second, and full-length narration up to ~100 seconds per request — no run-on, no cut-off.


Hear it for yourself

Real, unedited ChindaTTS samples. For the full interactive demo — every voice, every style, and live voice cloning — visit the showcase at one.iapp.co.th.

The three main voices

Kaitom (flagship)

Kai Daeng (male)

Kaimook (female)

Numbers, dates and money — read correctly

Thai + English code-switching

Speaking styles — same engine, different delivery

Cheerful

Calm

Empathetic


ChindaTTS Prime (optional)

ChindaTTS Prime adds a speech-recognition gate that re-rolls a garbled or cut take — it selects a better take, it does not alter the audio. Same sound; the gain is stability on hard inputs, especially expressive styles.

Metric — CER (lower is better)ChindaTTSChindaTTS Prime
Everyday text3.2%3.2%
Numbers / dates / currency1.8%1.4%
Code-switch7.9%7.6%
Expressive styles8.3%3.9%
Bad-take rate (styles)15.0%6.7%
Worst-case p90 (styles)31.8%11.4%

Everyday text already sits at the recognizer noise floor; Prime's value shows up on expressive styles, cutting the bad-take rate by more than half. Cost: ~+9% time on easy text, +~2 GB VRAM and a second container; it falls back to standard if the recognizer is down.

Speed (per GPU, ~6–7× real-time)

RequestAudioProcessing
~100 chars~9 s~1.5 s
~300 chars~25 s~4 s
~1000 chars~90 s~14 s

Streaming delivers first audio in under ~1 second; throughput scales by adding GPUs.

Run / deployment

One current-gen GPU (~8 GB; ~10 GB with Prime, 12–16 GB recommended), 16 GB RAM, Linux, one container (+1 for Prime). Available as a managed API or on-premise. Production-ready for pilots — voices, cloning, streaming, the iApp v3 API and Prime are all live and tested.

Good to know

  • Thai + Latin-script only; other scripts (CJK, Arabic) are dropped.
  • Up to ~1,200 characters (~100 s) per request; longer text → multiple requests or a batch job.
  • Tone is set via the fixed style menu; rate via the speed parameter.
  • One stream per GPU; concurrency scales with more GPUs.

Ready to give your product a Thai voice?

Hear every voice and style on the live showcase, then talk to us about a pilot.

🔊 Open the live showcaseContact sales

ChindaTTS launch - by iApp Technology