跳到主要内容

🗣️ Thai Text-to-Speech V3 (Kaitom Voice)

⚠️ ALPHA VERSION

⚠️ Alpha Version Notice: This API is currently in alpha testing. The service may experience intermittent availability. For production use, please use TTS V2 (Stable) instead. V3 is FREE to use until 31 May 2026 during the alpha testing period.

FREE(Alpha — no credits charged until 31 May 2026)
v3.0⚠️ ALPHA🎙️ Speech

Welcome to Thai Text-to-Speech API V3 featuring the all-new Kaitom Voice (น้องไข่ต้ม เวอร์ชั่น 3). This next-generation version delivers significantly improved speech naturalness with advanced text normalization, voice cloning support, and automatic Thai-English language handling.

iApp Text to Speech API V3 - Kaitom Voice

What's New in V3

  • Smart Text Normalization - Automatically handles numbers, dates, currency, and special characters
  • Automatic Language Detection - No need to specify language mode, V3 handles Thai-English mixing automatically
  • Extended Character Limit - Support for up to 10,000 characters per request
  • Simplified JSON API - Clean JSON request body instead of form data
  • High-Quality Audio - 24 kHz WAV output for professional applications

Try Demo — Default Voice (Kaitom)

试用 AI 演示

登录或创建免费账户来使用此 AI 服务演示并探索我们强大的 API。

注册即可获得 100 积分 (IC) 免费赠送!

优惠截止至 2025 年 12 月 31 日

V3 automatically handles Thai-English mixed text. No language mode selection needed.

0.5x1.0x1.5x

Natural range: 0.8 – 1.2. Default 1.0.

Try Demo — Voice Cloning (Thai)

🎁 FREE until 31 May 2026 ⚠️ ALPHA

Upload an 8–12 second clean Thai voice clip with its literal transcript, and the model will speak your Thai text in that voice. Voice cloning is currently Thai-only.

试用 AI 演示

登录或创建免费账户来使用此 AI 服务演示并探索我们强大的 API。

⚠️ Voice Cloning Demo (Thai-only, Alpha): Clone any Thai voice from a short reference clip. FREE to use until 31 May 2026.

📋 How to use this demo

  1. Step 1: Record yourself (max 10 seconds) speaking any short Thai sentence — OR upload an existing Thai audio clip.
  2. Step 2: Type the exact Thai words you spoke into the "Reference Transcript" box. (Word-for-word — not a description.)
  3. Step 3: Type the new Thai text you want the cloned voice to say.
  4. Step 4: Click Generate Cloned Voice.
00 / 10s

💡 Speak a natural Thai sentence such as: "สวัสดีครับ ผมชื่อไข่ต้ม วันนี้อากาศดีมาก". Recording will stop automatically at 10 seconds.

— or upload a file —

⚠️ This must match your recording word-for-word. Do not write a description like "เสียงผู้ชายพูดทักทาย" — write the actual sentence you spoke. The clone quality depends on this matching the audio exactly.

0.5x1.0x1.5x

Natural range: 0.8 – 1.2. Default 1.0.

Getting Started

  1. Prerequisites

    • An API key from iApp Technology
    • Text input in Thai and/or English
    • Maximum text length: 10,000 characters
    • Supported output format: WAV (24 kHz)
  2. Quick Start

    • Fast processing with high-quality output
    • Improved natural speech generation
    • Automatic Thai-English mixed text support
    • No language mode selection required
  3. Key Features

    • Next-generation speech synthesis engine
    • Smart text normalization (numbers, dates, currency)
    • Automatic Thai-English language handling
    • Emoji and special character support
    • Extended 10,000 character limit
  4. Security & Compliance

    • GDPR and PDPA compliant
    • No data retention after processing
How to get API Key?

Please visit API Key Management page to view your existing API key or request a new one.

Previous Versions
  • V2: Looking for the POST-based API with language mode selection? See Text-to-Speech V2
  • V1: Looking for the legacy GET-based API with Kaitom V1 or Cee voice? See Text-to-Speech V1

API Endpoints

EndpointMethodContent-TypeDescriptionCost (Alpha)
/v3/store/audio/ttsPOSTapplication/jsonDefault voice (Kaitom) — Thai/English mixed textFREE until 31 May 2026
/v3/store/audio/tts/clonePOSTmultipart/form-dataVoice cloning — synthesize Thai text in a custom voiceFREE until 31 May 2026

Quick Example

Default Voice — Sample Request

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3", "speed": 1.0}' \
--output 'output.pcm'

Voice Clone — Sample Request

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียง' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'

Sample Response

The response body is raw signed 16-bit little-endian PCM, mono, 24 kHz, streamed as application/octet-stream. Wrap it in a WAV header to play or save as .wav:

ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav

API Reference

1. Default Voice Endpoint (Kaitom)

  • Endpoint: POST https://api.iapp.co.th/v3/store/audio/tts
  • Content-Type: application/json
  • Headers:
    • apikey: Your API key (required)

Request Body

FieldTypeRequiredDefaultNotes
textstringyesUp to ~1,000 Thai characters per call. Longer text is auto-chunked server-side.
speedfloatno1.0Natural range 0.81.2. Lower = slower.
{
"text": "สวัสดีครับ ยินดีต้อนรับสู่ iApp",
"speed": 1.0
}

2. Voice Cloning Endpoint (Thai-only)

  • Endpoint: POST https://api.iapp.co.th/v3/store/audio/tts/clone
  • Content-Type: multipart/form-data
  • Headers:
    • apikey: Your API key (required)

Form Fields

FieldTypeRequiredDefaultNotes
textstringyesThai text to synthesize
speedfloatno1.0Speech rate
ref_textstringyesLiteral Thai transcript of ref_audio (not a description)
ref_audiofileyesWAV or MP3, 8–12 s of clean mono Thai speech

Constraints:

  1. Reference clip must be ≤ 15 seconds. Longer clips are silently trimmed; if ref_text describes the trimmed portion, output speeds up and distorts.
  2. ref_text must accurately match what is spoken in ref_audio, word-for-word.
  3. Voice clone requests are processed serially per server. Expect queued latency under concurrent load.
  4. Voice cloning currently supports Thai language only.

Response (both endpoints)

  • Content-Type: application/octet-stream
  • Body: raw signed 16-bit little-endian PCM, mono, 24 kHz — streamed as bytes arrive
  • Compute duration: duration_seconds = byte_length / 48000
  • To save as a playable file, wrap with a WAV header (see Browser example below) or use ffmpeg -f s16le -ar 24000 -ac 1 -i out.pcm out.wav

Code Examples

Python

import requests
import json

url = "https://api.iapp.co.th/v3/store/audio/tts"
headers = {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
}
data = {
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
}

response = requests.post(url, headers=headers, json=data)
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")

JavaScript (Node.js)

const axios = require("axios")
const fs = require("fs")

const url = "https://api.iapp.co.th/v3/store/audio/tts"
const config = {
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
responseType: "arraybuffer"
}
const data = {
text: "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
}

axios.post(url, data, config)
.then((response) => {
fs.writeFileSync("output.wav", response.data)
console.log("Audio saved to output.wav")
})
.catch((error) => console.error(error))

JavaScript (Fetch API)

The endpoint streams raw PCM. Wrap it in a WAV header before playback:

async function playThaiTTS(text) {
const resp = await fetch("https://api.iapp.co.th/v3/store/audio/tts", {
method: "POST",
headers: {
"apikey": "YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({ text, speed: 1.0 })
});
if (!resp.ok) throw new Error(`TTS failed: ${resp.status}`);
const pcm = new Uint8Array(await resp.arrayBuffer());
const wav = pcmToWav(pcm, 24000, 1);
const url = URL.createObjectURL(new Blob([wav], { type: "audio/wav" }));
new Audio(url).play();
}

function pcmToWav(pcm, sampleRate, channels) {
const byteRate = sampleRate * channels * 2;
const buf = new ArrayBuffer(44 + pcm.byteLength);
const v = new DataView(buf);
const write = (o, s) => [...s].forEach((c, i) => v.setUint8(o + i, c.charCodeAt(0)));
write(0, "RIFF"); v.setUint32(4, 36 + pcm.byteLength, true);
write(8, "WAVE"); write(12, "fmt ");
v.setUint32(16, 16, true); v.setUint16(20, 1, true);
v.setUint16(22, channels, true); v.setUint32(24, sampleRate, true);
v.setUint32(28, byteRate, true); v.setUint16(32, channels * 2, true);
v.setUint16(34, 16, true); write(36, "data");
v.setUint32(40, pcm.byteLength, true);
new Uint8Array(buf, 44).set(pcm);
return buf;
}

Python — Voice Cloning

import requests, wave

with open("reference.wav", "rb") as f:
files = {"ref_audio": ("reference.wav", f, "audio/wav")}
data = {
"text": "สวัสดีครับ วันนี้ทดสอบการโคลนเสียง",
"speed": "1.0",
"ref_text": "ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม",
}
r = requests.post(
"https://api.iapp.co.th/v3/store/audio/tts/clone",
headers={"apikey": "YOUR_API_KEY"},
data=data, files=files, stream=True, timeout=60,
)
r.raise_for_status()

with wave.open("cloned.wav", "wb") as wf:
wf.setnchannels(1); wf.setsampwidth(2); wf.setframerate(24000)
for chunk in r.iter_content(chunk_size=None):
if chunk:
wf.writeframes(chunk)
print("Saved cloned.wav")

PHP

<?php
$curl = curl_init();

$data = json_encode([
"text" => "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3 มาพร้อมเสียงที่เป็นธรรมชาติมากขึ้น"
]);

curl_setopt_array($curl, array(
CURLOPT_URL => 'https://api.iapp.co.th/v3/store/audio/tts',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => $data,
CURLOPT_HTTPHEADER => array(
'apikey: YOUR_API_KEY',
'Content-Type: application/json'
),
));

$response = curl_exec($curl);
curl_close($curl);

file_put_contents("output.wav", $response);
echo "Audio saved to output.wav";
?>

Swift

import Foundation

let url = URL(string: "https://api.iapp.co.th/v3/store/audio/tts")!
var request = URLRequest(url: url, timeoutInterval: Double.infinity)
request.addValue("YOUR_API_KEY", forHTTPHeaderField: "apikey")
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
request.httpMethod = "POST"

let body: [String: Any] = [
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"
]
request.httpBody = try? JSONSerialization.data(withJSONObject: body)

let task = URLSession.shared.dataTask(with: request) { data, response, error in
guard let data = data else {
print(String(describing: error))
return
}
// Save or play audio data
try? data.write(to: URL(fileURLWithPath: "output.wav"))
print("Audio saved to output.wav")
}
task.resume()

Kotlin

val client = OkHttpClient()
val mediaType = "application/json".toMediaType()
val body = """{"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3"}""".toRequestBody(mediaType)

val request = Request.Builder()
.url("https://api.iapp.co.th/v3/store/audio/tts")
.post(body)
.addHeader("apikey", "YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.build()

val response = client.newCall(request).execute()
// Handle audio response

Java

OkHttpClient client = new OkHttpClient().newBuilder().build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType,
"{\"text\": \"สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3\"}");

Request request = new Request.Builder()
.url("https://api.iapp.co.th/v3/store/audio/tts")
.method("POST", body)
.addHeader("apikey", "YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.build();

Response response = client.newCall(request).execute();
// Handle audio response

Dart

import 'dart:convert';
import 'package:http/http.dart' as http;
import 'dart:io';

void main() async {
var url = Uri.parse('https://api.iapp.co.th/v3/store/audio/tts');
var headers = {
'apikey': 'YOUR_API_KEY',
'Content-Type': 'application/json'
};
var body = jsonEncode({
'text': 'สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3'
});

var response = await http.post(url, headers: headers, body: body);

if (response.statusCode == 200) {
File('output.wav').writeAsBytesSync(response.bodyBytes);
print('Audio saved to output.wav');
} else {
print('Error: ${response.statusCode}');
}
}

Go

package main

import (
"bytes"
"encoding/json"
"io"
"net/http"
"os"
)

func main() {
url := "https://api.iapp.co.th/v3/store/audio/tts"

data := map[string]string{
"text": "สวัสดีครับ น้องไข่ต้ม เวอร์ชั่น 3",
}
jsonData, _ := json.Marshal(data)

req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("apikey", "YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

client := &http.Client{}
resp, _ := client.Do(req)
defer resp.Body.Close()

audioData, _ := io.ReadAll(resp.Body)
os.WriteFile("output.wav", audioData, 0644)
}

Features & Capabilities

Smart Text Normalization

V3 automatically normalizes various text elements:

TypeInputOutput (spoken)
Numbers1,234.56"หนึ่งพันสองร้อยสามสิบสี่จุดห้าหก"
Dates27/01/2569"วันที่ยี่สิบเจ็ดมกราคมสองพันห้าร้อยหกสิบเก้า"
Currency฿1,500"หนึ่งพันห้าร้อยบาท"
Time14:30"สิบสี่นาฬิกาสามสิบนาที"
Percentages25%"ยี่สิบห้าเปอร์เซ็นต์"

Automatic Language Handling

V3 automatically detects and handles mixed Thai-English text without requiring language mode selection:

Hello and Welcome! ยินดีต้อนรับสู่ iApp Technology

Error Codes

Status CodeDescriptionSolution
400Bad Request - Invalid JSON formatCheck JSON syntax
402Insufficient CreditsAdd credits
413Payload Too LargeText exceeds 10,000 characters
429Rate Limit ExceededCheck API key limits
503Service UnavailableTemporary issue, retry later

Migration Guide

From V2 to V3

AspectV2V3
Content-Typemultipart/form-dataapplication/json
Request BodyForm dataJSON body
Language ModeRequired (TH / TH_MIX_EN)Auto-detected
Max CharactersNo limit10,000
Audio QualityStandard WAV24 kHz WAV
Endpoint/v3/store/speech/text-to-speech/kaitom/v3/store/audio/tts

V2 Request (Old):

curl -X POST 'https://api.iapp.co.th/v3/store/speech/text-to-speech/kaitom' \
--header 'apikey: YOUR_API_KEY' \
--form 'text="สวัสดีครับ"' \
--form 'language="TH"'

V3 Request (New):

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts' \
--header 'apikey: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{"text": "สวัสดีครับ"}'

From V1 to V3

AspectV1V3
MethodGETPOST
RequestQuery parametersJSON body
Voice OptionsKaitom V1, CeeKaitom V3
Output FormatMP3/WAVWAV (24 kHz)
Endpoint/v3/store/speech/text-to-speech/kaitom/v1/v3/store/audio/tts

Limitations

  • Thai and English language support (default voice); Thai-only for voice cloning
  • Default voice: Kaitom V3 (male). Voice cloning lets you use any reference voice.
  • Recommended single-request length: ~1,000 Thai characters (longer text is auto-chunked, adding latency)
  • Output: raw 16-bit PCM mono @ 24 kHz (wrap into WAV client-side)
  • Voice cloning reference audio: ≤ 15 seconds, recommended 8–12 s mono clean speech
  • Rate limit: 100 requests/sec per API key

Best Practices

  1. Use proper punctuation for natural pauses and intonation
  2. Keep sentences conversational for the most natural output
  3. Test with small text segments before processing large texts
  4. Pre-transliterate brand names or acronyms to Thai script if you need a specific pronunciation (e.g. send ไอแอป instead of iApp)
  5. Monitor character count to stay within the 10,000 character limit

Pricing

AI API Service NameEndpointPrice (Alpha)On-Premise
Thai TTS V3 (Kaitom Voice)/v3/store/audio/ttsFREE until 31 May 2026Contact
Thai Voice Cloning V3/v3/store/audio/tts/cloneFREE until 31 May 2026Contact

See Also