🗣️ Thai Text-to-Speech V3 (Kaitom Voice)
⚠️ Alpha Version Notice: This API is currently in alpha testing. The service may experience intermittent availability. For production use, please use TTS V2 (Stable) instead. V3 is FREE to use until 31 May 2026 during the alpha testing period.
Welcome to Thai Text-to-Speech API V3 featuring the all-new Kaitom Voice (น้องไข่ต้ม เวอร์ชั่น 3). This next-generation version delivers significantly improved speech naturalness with advanced text normalization, voice cloning support, and automatic Thai-English language handling.
What's New in V3
- Smart Text Normalization - Automatically handles numbers, dates, currency, and special characters
- Automatic Language Detection - No need to specify language mode, V3 handles Thai-English mixing automatically
- Extended Character Limit - Support for up to 10,000 characters per request
- Simplified JSON API - Clean JSON request body instead of form data
- High-Quality Audio - 24 kHz WAV output for professional applications
Try Demo — Default Voice (Kaitom)
V3 automatically handles Thai-English mixed text. No language mode selection needed.
Natural range: 0.8 – 1.2. Default 1.0.
Try Demo — Voice Cloning (Thai)
🎁 FREE until 31 May 2026 ⚠️ ALPHAUpload an 8–12 second clean Thai voice clip with its literal transcript, and the model will speak your Thai text in that voice. Voice cloning is currently Thai-only.
📋 How to use this demo
- Step 1: Record yourself (max 10 seconds) speaking any short Thai sentence — OR upload an existing Thai audio clip.
- Step 2: Type the exact Thai words you spoke into the "Reference Transcript" box. (Word-for-word — not a description.)
- Step 3: Type the new Thai text you want the cloned voice to say.
- Step 4: Click Generate Cloned Voice.
💡 Speak a natural Thai sentence such as: "สวัสดีครับ ผมชื่อไข่ต้ม วันนี้อากาศดีมาก". Recording will stop automatically at 10 seconds.
⚠️ This must match your recording word-for-word. Do not write a description like "เสียงผู้ชายพูดทักทาย" — write the actual sentence you spoke. The clone quality depends on this matching the audio exactly.
Natural range: 0.8 – 1.2. Default 1.0.
Getting Started
-
Prerequisites
- An API key from iApp Technology
- Text input in Thai and/or English
- Maximum text length: 10,000 characters
- Supported output format: WAV (24 kHz)
-
Quick Start
- Fast processing with high-quality output
- Improved natural speech generation
- Automatic Thai-English mixed text support
- No language mode selection required
-
Key Features
- Next-generation speech synthesis engine
- Smart text normalization (numbers, dates, currency)
- Automatic Thai-English language handling
- Emoji and special character support
- Extended 10,000 character limit
-
Security & Compliance
- GDPR and PDPA compliant
- No data retention after processing
Please visit API Key Management page to view your existing API key or request a new one.
- V2: Looking for the POST-based API with language mode selection? See Text-to-Speech V2
- V1: Looking for the legacy GET-based API with Kaitom V1 or Cee voice? See Text-to-Speech V1
API Endpoints
| Endpoint | Method | Content-Type | Description | Cost (Alpha) |
|---|---|---|---|---|
/v3/store/audio/tts | POST | application/json | Default voice (Kaitom) — Thai/English mixed text | FREE until 31 May 2026 |
/v3/store/audio/tts/clone | POST | multipart/form-data | Voice cloning — synthesize Thai text in a custom voice | FREE until 31 May 2026 |
Quick Example
Default Voice — Sample Request
curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สว ัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3", "speed": 1.0}' \
--output 'output.pcm'
Voice Clone — Sample Request
curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียง' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'
Sample Response
The response body is raw signed 16-bit little-endian PCM, mono, 24 kHz, streamed as application/octet-stream. Wrap it in a WAV header to play or save as .wav:
ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
API Reference
1. Default Voice Endpoint (Kaitom)
- Endpoint:
POSThttps://api.iapp.co.th/v3/store/audio/tts - Content-Type:
application/json - Headers:
apikey: Your API key (required)
Request Body
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
text | string | yes | — | Up to ~1,000 Thai characters per call. Longer text is auto-chunked server-side. |
speed | float | no | 1.0 | Natural range 0.8–1.2. Lower = slower. |
{
"text": "สวัสดีครับ ยินดีต้อนรับสู่ iApp",
"speed": 1.0
}
2. Voice Cloning Endpoint (Thai-only)
- Endpoint:
POSThttps://api.iapp.co.th/v3/store/audio/tts/clone - Content-Type:
multipart/form-data - Headers:
apikey: Your API key (required)
Form Fields
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
text | string | yes | — | Thai text to synthesize |
speed | float | no | 1.0 | Speech rate |
ref_text | string | yes | — | Literal Thai transcript of ref_audio (not a description) |
ref_audio | file | yes | — | WAV or MP3, 8–12 s of clean mono Thai speech |
Constraints:
- Reference clip must be ≤ 15 seconds. Longer clips are silently trimmed; if
ref_textdescribes the trimmed portion, output speeds up and distorts. ref_textmust accurately match what is spoken inref_audio, word-for-word.- Voice clone requests are processed serially per server. Expect queued latency under concurrent load.
- Voice cloning currently supports Thai language only.
Response (both endpoints)
- Content-Type:
application/octet-stream - Body: raw signed 16-bit little-endian PCM, mono, 24 kHz — streamed as bytes arrive
- Compute duration:
duration_seconds = byte_length / 48000 - To save as a playable file, wrap with a WAV header (see Browser example below) or use
ffmpeg -f s16le -ar 24000 -ac 1 -i out.pcm out.wav
Code Examples
Python
import requests
import json
url = "https://api.iapp.co.th/v3/store/audio/tts"
headers = {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
}
data = {
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
}
response = requests.post(url, headers=headers, json=data)
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")
JavaScript (Node.js)
const axios = require("axios")
const fs = require("fs")
const url = "https://api.iapp.co.th/v3/store/audio/tts"
const config = {
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
responseType: "arraybuffer"
}
const data = {
text: "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
}
axios.post(url, data, config)
.then((response) => {
fs.writeFileSync("output.wav", response.data)
console.log("Audio saved to output.wav")
})
.catch((error) => console.error(error))
JavaScript (Fetch API)
The endpoint streams raw PCM. Wrap it in a WAV header before playback:
async function playThaiTTS(text) {
const resp = await fetch("https://api.iapp.co.th/v3/store/audio/tts", {
method: "POST",
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({ text, speed: 1.0 })
});
if (!resp.ok) throw new Error(`TTS failed: ${resp.status}`);
const pcm = new Uint8Array(await resp.arrayBuffer());
const wav = pcmToWav(pcm, 24000, 1);
const url = URL.createObjectURL(new Blob([wav], { type: "audio/wav" }));
new Audio(url).play();
}
function pcmToWav(pcm, sampleRate, channels) {
const byteRate = sampleRate * channels * 2;
const buf = new ArrayBuffer(44 + pcm.byteLength);
const v = new DataView(buf);
const write = (o, s) => [...s].forEach((c, i) => v.setUint8(o + i, c.charCodeAt(0)));
write(0, "RIFF"); v.setUint32(4, 36 + pcm.byteLength, true);
write(8, "WAVE"); write(12, "fmt ");
v.setUint32(16, 16, true); v.setUint16(20, 1, true);
v.setUint16(22, channels, true); v.setUint32(24, sampleRate, true);
v.setUint32(28, byteRate, true); v.setUint16(32, channels * 2, true);
v.setUint16(34, 16, true); write(36, "data");
v.setUint32(40, pcm.byteLength, true);
new Uint8Array(buf, 44).set(pcm);
return buf;
}
Python — Voice Cloning
import requests, wave
with open("reference.wav", "rb") as f:
files = {"ref_audio": ("reference.wav", f, "audio/wav")}
data = {
"text": "สวัสดีครับ วันนี้ทดสอบการโคลนเสียง",
"speed": "1.0",
"ref_text": "ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม",
}
r = requests.post(
"https://api.iapp.co.th/v3/store/audio/tts/clone",
headers={"apikey": "YOUR_API_KEY"},
data=data, files=files, stream=True, timeout=60,
)
r.raise_for_status()
with wave.open("cloned.wav", "wb") as wf:
wf.setnchannels(1); wf.setsampwidth(2); wf.setframerate(24000)
for chunk in r.iter_content(chunk_size=None):
if chunk:
wf.writeframes(chunk)
print("Saved cloned.wav")
PHP
<?php
$curl = curl_init();
$data = json_encode([
"text" => "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
]);
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://api.iapp.co.th/v3/store/audio/tts',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => $data,
CURLOPT_HTTPHEADER => array(
'apikey: YOUR_API_KEY',
'Content-Type: application/json'
),
));
$response = curl_exec($curl);
curl_close($curl);
file_put_contents("output.wav", $response);
echo "Audio saved to output.wav";
?>
Swift
import Foundation
let url = URL(string: "https://api.iapp.co.th/v3/store/audio/tts")!
var request = URLRequest(url: url, timeoutInterval: Double.infinity)
request.addValue("YOUR_API_KEY", forHTTPHeaderField: "apikey")
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
request.httpMethod = "POST"
let body: [String: Any] = [
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"
]
request.httpBody = try? JSONSerialization.data(withJSONObject: body)
let task = URLSession.shared.dataTask(with: request) { data, response, error in
guard let data = data else {
print(String(describing: error))
return
}
// Save or play audio data
try? data.write(to: URL(fileURLWithPath: "output.wav"))
print("Audio saved to output.wav")
}
task.resume()
Kotlin
val client = OkHttpClient()
val mediaType = "application/json".toMediaType()
val body = """{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}""".toRequestBody(mediaType)
val request = Request.Builder()
.url("https://api.iapp.co.th/v3/store/audio/tts")
.post(body)
.addHeader("apikey", "YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.build()
val response = client.newCall(request).execute()
// Handle audio response
Java
OkHttpClient client = new OkHttpClient().newBuilder().build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType,
"{\"text\": \"สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3\"}");
Request request = new Request.Builder()
.url("https://api.iapp.co.th/v3/store/audio/tts")
.method("POST", body)
.addHeader("apikey", "YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.build();
Response response = client.newCall(request).execute();
// Handle audio response
Dart
import 'dart:convert';
import 'package:http/http.dart' as http;
import 'dart:io';
void main() async {
var url = Uri.parse('https://api.iapp.co.th/v3/store/audio/tts');
var headers = {
'apikey': 'YOUR_API_KEY',
'Content-Type': 'application/json'
};
var body = jsonEncode({
'text': 'สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3'
});
var response = await http.post(url, headers: headers, body: body);
if (response.statusCode == 200) {
File('output.wav').writeAsBytesSync(response.bodyBytes);
print('Audio saved to output.wav');
} else {
print('Error: ${response.statusCode}');
}
}
Go
package main
import (
"bytes"
"encoding/json"
"io"
"net/http"
"os"
)
func main() {
url := "https://api.iapp.co.th/v3/store/audio/tts"
data := map[string]string{
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3",
}
jsonData, _ := json.Marshal(data)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("apikey", "YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
resp, _ := client.Do(req)
defer resp.Body.Close()
audioData, _ := io.ReadAll(resp.Body)
os.WriteFile("output.wav", audioData, 0644)
}