Introducing Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

2026年1月27日 · 5 分钟阅读

CEO @ iApp Technology

Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

We're excited to announce Kaitom Voice V3, the next generation of our Thai Text-to-Speech API. This major update brings significant improvements to speech quality, introduces smart text normalization, and simplifies integration with a modern JSON-based API.

What's New in V3

Kaitom Voice V3 represents a complete overhaul of our TTS engine, delivering the most natural-sounding Thai speech synthesis we've ever created.

Smart Text Normalization

V3 automatically handles complex text elements that previously required pre-processing:

Type	Input	Spoken Output
Numbers	`1,234.56`	"หนึ่งพันสองร้อยสามสิบสี่จุดห้าหก"
Dates	`27/01/2569`	"วันที่ยี่สิบเจ็ดมกราคมพ.ศ.สองพันห้าร้อยหกสิบเก้า"
Currency	`฿1,500`	"หนึ่งพันห้าร้อยบาท"
Time	`14:30`	"สิบสี่นาฬิกาสามสิบนาที"
Percentages	`25%`	"ยี่สิบห้าเปอร์เซ็นต์"

Automatic Language Detection

No more specifying language modes! V3 automatically detects and handles Thai-English mixed content:

Hello and Welcome! ยินดีต้อนรับสู่ iApp Technology ผู้นำด้าน AI ของประเทศไทย

Simplified JSON API

We've modernized the API with a clean JSON interface:

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
    --header 'apikey: YOUR_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}'

Streaming Audio Output

24 kHz mono PCM streamed — start playback as soon as bytes arrive
Real-time factor ~0.3–0.5 — 10 s of audio synthesized in 3–5 s
Up to ~1,000 Thai characters per request (longer text auto-chunks server-side)
Wrap with a WAV header (44-byte) on the client to play or save as .wav

🎤 NEW: Thai Voice Cloning

V3 introduces voice cloning for Thai as a separate endpoint. Provide an 8–12 second clean Thai voice clip plus its literal transcript, and the synthesized speech will mimic that voice:

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
    --header 'apikey: YOUR_API_KEY' \
    --form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียง' \
    --form 'speed=1.0' \
    --form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
    --form 'ref_audio=@reference.wav' \
    --output 'output.pcm'

Voice cloning currently supports Thai language only. See the interactive cloning demo to record yourself and try it in the browser.

Quick Start

Python

import requests

url = "https://api.iapp.co.th/v3/store/audio/tts"
headers = {
    "apikey": "YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}

response = requests.post(url, headers=headers, json=data)
with open("output.wav", "wb") as f:
    f.write(response.content)

JavaScript

const response = await fetch("https://api.iapp.co.th/v3/store/audio/tts", {
    method: "POST",
    headers: {
        "apikey": "YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    body: JSON.stringify({ text: "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3" })
});
const blob = await response.blob();

V3 vs V2 Comparison

Feature	V2	V3
API Format	Form Data	JSON
Language Mode	Required (`TH` / `TH_MIX_EN`)	Auto-detected
Text Normalization	Basic	Smart (numbers, dates, currency)
Max Characters	Unlimited	10,000
Audio Quality	Standard	24 kHz streamed PCM

Pricing

V3 pricing:

/v3/store/audio/tts (default Kaitom voice) — 1 IC per 400 chars
/v3/store/audio/tts/clone (Thai voice cloning) — 1 IC per 400 chars

Use Cases

E-Learning & Education

Convert educational content into audio lessons with proper pronunciation of numbers, dates, and technical terms.

Chatbots & Virtual Assistants

Create natural-sounding voice responses for Thai chatbots with automatic language handling.

Content Creation

Generate professional voiceovers for videos and podcasts with high-quality audio output.

Accessibility

Make digital content accessible to visually impaired users with clear, natural speech.

IVR Systems

Build interactive voice response systems with smart text normalization for phone numbers, amounts, and dates.

Migration Guide

Migrating from V2 to V3 is straightforward:

Before (V2):

curl -X POST 'https://api.iapp.co.th/v3/store/speech/text-to-speech/kaitom' \
    --header 'apikey: YOUR_API_KEY' \
    --form 'text="สวัสดีครับ"' \
    --form 'language="TH"'

After (V3):

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
    --header 'apikey: YOUR_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{"text": "สวัสดีครับ"}'

Key changes:

New endpoint: /v3/store/audio/tts
Content-Type: application/json
Request body: JSON format {"text": "..."}
No language parameter needed

Try It Now

Ready to experience the next generation of Thai TTS?

Interactive Demo - Try V3 directly in your browser
API Documentation - Complete technical reference
Get API Key - Start building today

What's Next

We're continuously improving Kaitom Voice. Upcoming features include:

Additional voice options (coming soon)
SSML support for fine-grained control
Voice cloning for English and other languages

Feedback

We'd love to hear your feedback on Kaitom Voice V3! Join our community:

Discord: discord.gg/kYcpmdEcS2
Email: sale@iapp.co.th
Phone: 086-322-5858

Kaitom Voice V3 is available now for all iApp API users. Existing V1 and V2 APIs will continue to be supported.

Introducing Kaitom Voice V3 - Next-Generation Thai Text-to-Speech

What's New in V3

Smart Text Normalization

Automatic Language Detection

Simplified JSON API

Streaming Audio Output

🎤 NEW: Thai Voice Cloning

Quick Start

Python

JavaScript

V3 vs V2 Comparison

Pricing

Use Cases

E-Learning & Education

Chatbots & Virtual Assistants

Content Creation

Accessibility

IVR Systems

Migration Guide

Try It Now

What's Next

Feedback

ChindaX

SpeechFlow

ChindaGO

What's New in V3​

Smart Text Normalization​

Automatic Language Detection​

Simplified JSON API​

Streaming Audio Output​

🎤 NEW: Thai Voice Cloning​

Quick Start​

Python​

JavaScript​

V3 vs V2 Comparison​

Pricing​

Use Cases​

E-Learning & Education​

Chatbots & Virtual Assistants​

Content Creation​

Accessibility​

IVR Systems​

Migration Guide​

Try It Now​

What's Next​

Feedback​

What's New in V3

Smart Text Normalization

Automatic Language Detection

Simplified JSON API

Streaming Audio Output

🎤 NEW: Thai Voice Cloning

Quick Start

Python

JavaScript

V3 vs V2 Comparison

Pricing

Use Cases

E-Learning & Education

Chatbots & Virtual Assistants

Content Creation

Accessibility

IVR Systems

Migration Guide

Try It Now

What's Next

Feedback