跳到主要内容

Introducing ChindaTTS — Production Thai-English Text-to-Speech, Now Live

· 4 分钟阅读
Kobkrit Viriyayudhakorn
CEO @ iApp Technology

ChindaTTS — Natural Thai speech: tone, numbers, code-switching

Today we are launching ChindaTTS, our production neural Thai + English text-to-speech engine. It speaks natural Thai with correct prosody, reads numbers, dates, money and code-switched text the way a person would, handles long-form narration cleanly, and can clone a voice from a short consented sample — all behind an API that is drop-in compatible with iApp TTS v3.

👉 Hear every voice and style live at one.iapp.co.th — our new One iApp showcase for Text, Voice and Vision.

Why ChindaTTS

General-purpose TTS has always treated Thai as an afterthought — wrong tones, mangled final consonants, robotic delivery, and a complete inability to mix Thai with English. ChindaTTS was built for Thai from the ground up. The result is speech that sounds like a real Thai speaker, even when the sentence is full of numbers, English loanwords, and brand names.

What it does

  • Thai + English + code-switch in one request. The text frontend reads numbers, dates, currency, percentages, phone/ID runs, abbreviations, emails, URLs, ALL-CAPS emphasis, and English tech loanwords naturally.
  • Correct central-Thai accent. Pure-English passages automatically switch to a native-English mode, per sentence.
  • Long-form, spoken in full — no run-on, no cut-off — up to ~100 seconds per request.
  • 3 voices (Kaitom, Kaimook, Kai Daeng) × 8 speaking styles (neutral, friendly, cheerful, calm, serious, sad, excited, empathetic).
  • Voice cloning from ~10–20 seconds of consented audio; stateless — used once, never stored.
  • Streaming — first audio in under ~1 second; drop-in iApp v3 endpoint.

Hear it for yourself

These are real, unedited ChindaTTS samples. For the full interactive demo — every voice, every style, and live voice cloning — visit one.iapp.co.th.

The three main voices

Kaitom (flagship):

Kai Daeng (male):

Kaimook (female):

Numbers, dates and money — read correctly

Thai + English code-switching

Speaking styles (same engine, different delivery)

Cheerful:

Calm:

Empathetic:

ChindaTTS Prime (optional)

ChindaTTS Prime adds a speech-recognition gate that re-rolls a garbled or cut take — it selects a better take, it does not alter the audio. Same sound; the gain is stability on hard inputs, especially expressive styles.

Metric — CER (lower is better)ChindaTTSChindaTTS Prime
Everyday text3.2%3.2%
Numbers / dates / currency1.8%1.4%
Code-switch7.9%7.6%
Expressive styles8.3%3.9%
Bad-take rate (styles)15.0%6.7%
Worst-case p90 (styles)31.8%11.4%

Everyday text already sits at the recognizer noise floor; Prime's value shows up on expressive styles, cutting the bad-take rate by more than half.

Speed (per GPU, ~6–7× real-time)

RequestAudioProcessing
~100 chars~9 s~1.5 s
~300 chars~25 s~4 s
~1000 chars~90 s~14 s

Streaming delivers first audio in under ~1 second, and throughput scales by adding GPUs.

Where it fits

Voice assistants and IVR, in-app and announcement narration, accessibility, e-learning, notifications, and any product that needs a natural Thai voice that also handles English without breaking stride.

Spread the word

ChindaTTS is now live — by iApp Technology

Try ChindaTTS

ChindaTTS is production-ready for pilots today — voices, cloning, streaming, the iApp v3 API, and Prime are all live and tested. We can't wait to hear what you build with it.