Thai Voice Cloning is Here — Clone Any Thai Voice in 10 Seconds

We're excited to announce Thai Voice Cloning, the newest capability inside our Kaitom Voice V3 TTS API. Hand the model a clean 8–12 second Thai voice clip plus its literal transcript, and the API will speak any Thai text you want — in that exact voice.
This is the first production-grade Thai voice cloning API built specifically for the Thai language, and it is FREE during the alpha period (until 31 May 2026).
Why Thai Voice Cloning Matters
General-purpose voice cloning tools have been around for a while in English, but Thai has always been a stepchild — wrong tones, mispronounced final consonants, robotic prosody, and complete inability to handle Thai-English code mixing.
Kaitom Voice V3's cloning endpoint was trained on Thai natively. That means:
- Correct Thai tones (ไม้เอก, โท, ตรี, จัตวา) preserved per the speaker.
- Final consonants pronounced naturally instead of dropped.
- Numbers, dates, currency read out in Thai conventions automatically.
- Mixed Thai–English content handled in a single utterance.
You get a voice that actually sounds like the person you cloned — not a robot wearing their hat.
How It Works

The reference clip captures the speaker's timbre and prosody; the V3 engine applies that to whatever Thai text you provide.
Try It in 60 Seconds
The fastest path is the in-browser demo — record yourself, type Thai text, hear it back in your own voice:
API Quick Start
cURL
curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียงด้วย AI' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'
Python
import requests
url = "https://api.iapp.co.th/v3/store/audio/tts/clone"
headers = {"apikey": "YOUR_API_KEY"}
with open("reference.wav", "rb") as ref:
files = {"ref_audio": ref}
data = {
"text": "สวัสดีครับ วันนี้ทดสอบการโคลนเสียงด้วย AI",
"ref_text": "ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม",
"speed": "1.0",
}
r = requests.post(url, headers=headers, data=data, files=files)
with open("output.pcm", "wb") as f:
f.write(r.content)
The response is raw signed 16-bit little-endian PCM, mono, 24 kHz streamed as bytes arrive. Wrap it in a 44-byte WAV header on the client, or convert with ffmpeg:
ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
Request Fields
| Field | Type | Required | Notes |
|---|---|---|---|
text | string | yes | Thai text to synthesize |
ref_text | string | yes | Literal transcript of ref_audio (word-for-word, not a description) |
ref_audio | file | yes | WAV or MP3, 8–12 s of clean mono Thai speech |
speed | float | no | 0.8–1.2, default 1.0 |
Tips for a Great Clone
The clone quality is bounded by the reference clip. A few minutes of curating pays off:
- Record clean audio. No background music, no traffic noise, no overlapping voices.
- Use a single speaker. Don't include "uh", coughs, or interjections from someone else.
- Match the transcript exactly.
ref_textmust say whatref_audiosays, character-for-character. Mismatches cause prosody drift and speed-up artifacts. - Stay in the 8–12 s window. Shorter clips lose timbre; clips longer than 15 s are silently trimmed and the trailing portion of
ref_textbecomes garbage. - Use one consistent speaking style. If the reference is calm and the target text is shouted, expect ambiguous results.
What You Can Build
Personalized Audiobooks
Authors can narrate their own books in Thai, then clone the voice to extend coverage to chapters they didn't have time to record.
Brand Voice at Scale
Record a 10-second sample of your brand voice talent once, then synthesize unlimited Thai marketing videos, IVR prompts, and product demos in that exact voice.
Voice Preservation
Help patients with degenerative speech conditions preserve their voice — record a sample now, retain the ability to "speak" later.
Localization for Thai Markets
Bring international content to Thai audiences with the same speaker the audience already recognizes. No casting calls, no studio days.
Accessibility & E-Learning
Generate audio versions of educational content using a teacher's own voice, complete with correct Thai number, date, and currency reading.
Pricing
Voice cloning is FREE until 31 May 2026 during the alpha. After that, pricing for general availability will be announced — alpha users will get a heads-up first.
| Endpoint | Method | Cost (Alpha) |
|---|---|---|
/v3/store/audio/tts (default Kaitom voice) | POST | |
/v3/store/audio/tts/clone (Thai voice cloning) | POST |
Responsible Use
Voice cloning is powerful and easy to misuse. iApp Technology requires that you:
- Have explicit consent from the speaker before cloning their voice.
- Disclose synthetic audio in contexts where authenticity matters (news, public statements, customer service identity).
- Do not impersonate real people for fraud, harassment, or misinformation.
Misuse violates our Terms of Service and Thai law (PDPA, Computer Crime Act). We log all clone requests and cooperate with law enforcement on reported abuse.
Limitations During Alpha
- Thai language only — English and Chinese cloning are on the roadmap.
- Clone requests are processed serially per server. Expect queued latency under heavy concurrent load.
- Maximum reference length: 15 s (longer clips are trimmed).
- Output is PCM (raw); wrap with WAV header on the client to play.
Get Started Today
- Try the interactive demo — record and clone in your browser
- Read the full API docs — complete technical reference
- Get an API key — start building immediately
Feedback
This is alpha — your feedback shapes what GA looks like.
- Discord: discord.gg/kYcpmdEcS2
- Email: sale@iapp.co.th
- Phone: 086-322-5858
Thai Voice Cloning is part of Kaitom Voice V3 and is available now to all iApp API users at no charge during the alpha period.