Skip to main content

Introducing Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

· 5 min read
Kobkrit Viriyayudhakorn
CEO @ iApp Technology

Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

We're excited to announce Kaitom Voice V3, the next generation of our Thai Text-to-Speech API. This major update brings significant improvements to speech quality, introduces smart text normalization, and simplifies integration with a modern JSON-based API.

What's New in V3

Kaitom Voice V3 represents a complete overhaul of our TTS engine, delivering the most natural-sounding Thai speech synthesis we've ever created.

Smart Text Normalization

V3 automatically handles complex text elements that previously required pre-processing:

TypeInputSpoken Output
Numbers1,234.56"หนึ่งพันสองร้อยสามสิบสี่จุดห้าหก"
Dates27/01/2569"วันที่ยี่สิบเจ็ดมกราคมพ.ศ.สองพันห้าร้อยหกสิบเก้า"
Currency฿1,500"หนึ่งพันห้าร้อยบาท"
Time14:30"สิบสี่นาฬิกาสามสิบนาที"
Percentages25%"ยี่สิบห้าเปอร์เซ็นต์"

Automatic Language Detection

No more specifying language modes! V3 automatically detects and handles Thai-English mixed content:

Hello and Welcome! ยินดีต้อนรับสู่ iApp Technology ผู้นำด้าน AI ของประเทศไทย

Simplified JSON API

We've modernized the API with a clean JSON interface:

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}'

Streaming Audio Output

  • 24 kHz mono PCM streamed — start playback as soon as bytes arrive
  • Real-time factor ~0.3–0.5 — 10 s of audio synthesized in 3–5 s
  • Up to ~1,000 Thai characters per request (longer text auto-chunks server-side)
  • Wrap with a WAV header (44-byte) on the client to play or save as .wav

🎤 NEW: Thai Voice Cloning

V3 introduces voice cloning for Thai as a separate endpoint. Provide an 8–12 second clean Thai voice clip plus its literal transcript, and the synthesized speech will mimic that voice:

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียง' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'

Voice cloning currently supports Thai language only. See the interactive cloning demo to record yourself and try it in the browser.

Quick Start

Python

import requests

url = "https://api.iapp.co.th/v3/store/audio/tts"
headers = {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
}
data = {"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}

response = requests.post(url, headers=headers, json=data)
with open("output.wav", "wb") as f:
f.write(response.content)

JavaScript

const response = await fetch("https://api.iapp.co.th/v3/store/audio/tts", {
method: "POST",
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({ text: "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3" })
});
const blob = await response.blob();

V3 vs V2 Comparison

FeatureV2V3
API FormatForm DataJSON
Language ModeRequired (TH / TH_MIX_EN)Auto-detected
Text NormalizationBasicSmart (numbers, dates, currency)
Max CharactersUnlimited10,000
Audio QualityStandard24 kHz streamed PCM

Pricing

V3 is currently in Alpha and is FREE to use until 31 May 2026 for both endpoints:

  • /v3/store/audio/tts (default Kaitom voice) — FREE
  • /v3/store/audio/tts/clone (Thai voice cloning) — FREE

Pricing for general availability will be announced before the alpha period ends.

Use Cases

E-Learning & Education

Convert educational content into audio lessons with proper pronunciation of numbers, dates, and technical terms.

Chatbots & Virtual Assistants

Create natural-sounding voice responses for Thai chatbots with automatic language handling.

Content Creation

Generate professional voiceovers for videos and podcasts with high-quality audio output.

Accessibility

Make digital content accessible to visually impaired users with clear, natural speech.

IVR Systems

Build interactive voice response systems with smart text normalization for phone numbers, amounts, and dates.

Migration Guide

Migrating from V2 to V3 is straightforward:

Before (V2):

curl -X POST 'https://api.iapp.co.th/v3/store/speech/text-to-speech/kaitom' \
--header 'apikey: YOUR_API_KEY' \
--form 'text="สวัสดีครับ"' \
--form 'language="TH"'

After (V3):

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สวัสดีครับ"}'

Key changes:

  1. New endpoint: /v3/store/audio/tts
  2. Content-Type: application/json
  3. Request body: JSON format {"text": "..."}
  4. No language parameter needed

Try It Now

Ready to experience the next generation of Thai TTS?

What's Next

We're continuously improving Kaitom Voice. Upcoming features include:

  • Additional voice options (coming soon)
  • SSML support for fine-grained control
  • Voice cloning for English and other languages

Feedback

We'd love to hear your feedback on Kaitom Voice V3! Join our community:


Kaitom Voice V3 is available now for all iApp API users. Existing V1 and V2 APIs will continue to be supported.