Introducing Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

We're excited to announce Kaitom Voice V3, the next generation of our Thai Text-to-Speech API. This major update brings significant improvements to speech quality, introduces smart text normalization, and simplifies integration with a modern JSON-based API.
What's New in V3
Kaitom Voice V3 represents a complete overhaul of our TTS engine, delivering the most natural-sounding Thai speech synthesis we've ever created.
Smart Text Normalization
V3 automatically handles complex text elements that previously required pre-processing:
| Type | Input | Spoken Output |
|---|---|---|
| Numbers | 1,234.56 | "หนึ่งพันสองร้อยสามสิบสี่จุดห้าหก" |
| Dates | 27/01/2569 | "วันที่ยี่สิบเจ็ดมกราคมพ.ศ.สองพันห้าร้อยหกสิบเก้า" |
| Currency | ฿1,500 | "หนึ่งพันห้า ร้อยบาท" |
| Time | 14:30 | "สิบสี่นาฬิกาสามสิบนาที" |
| Percentages | 25% | "ยี่สิบห้าเปอร์เซ็นต์" |
Automatic Language Detection
No more specifying language modes! V3 automatically detects and handles Thai-English mixed content:
Hello and Welcome! ยินดีต้อนรับสู่ iApp Technology ผู้นำด้าน AI ของประเทศไทย
Simplified JSON API
We've modernized the API with a clean JSON interface:
curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}'
Streaming Audio Output
- 24 kHz mono PCM streamed — start playback as soon as bytes arrive
- Real-time factor ~0.3–0.5 — 10 s of audio synthesized in 3–5 s
- Up to ~1,000 Thai characters per request (longer text auto-chunks server-side)
- Wrap with a WAV header (44-byte) on the client to play or save as
.wav
🎤 NEW: Thai Voice Cloning
V3 introduces voice cloning for Thai as a separate endpoint. Provide an 8–12 second clean Thai voice clip plus its literal transcript, and the synthesized speech will mimic that voice:
curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียง' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'
Voice cloning currently supports Thai language only. See the interactive cloning demo to record yourself and try it in the browser.
Quick Start
Python
import requests
url = "https://api.iapp.co.th/v3/store/audio/tts"
headers = {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
}
data = {"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}
response = requests.post(url, headers=headers, json=data)
with open("output.wav", "wb") as f:
f.write(response.content)
JavaScript
const response = await fetch("https://api.iapp.co.th/v3/store/audio/tts", {
method: "POST",
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({ text: "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3" })
});
const blob = await response.blob();
V3 vs V2 Comparison
| Feature | V2 | V3 |
|---|---|---|
| API Format | Form Data | JSON |
| Language Mode | Required (TH / TH_MIX_EN) | Auto-detected |
| Text Normalization | Basic | Smart (numbers, dates, currency) |
| Max Characters | Unlimited | 10,000 |
| Audio Quality | Standard | 24 kHz streamed PCM |
Pricing
V3 is currently in Alpha and is FREE to use until 31 May 2026 for both endpoints:
/v3/store/audio/tts(default Kaitom voice) — FREE/v3/store/audio/tts/clone(Thai voice cloning) — FREE
Pricing for general availability will be announced before the alpha period ends.