Skip to main content

Thai Voice Cloning is Here — Clone Any Thai Voice in 10 Seconds

· 6 min read
Kobkrit Viriyayudhakorn
CEO @ iApp Technology

Thai Voice Cloning - Kaitom Voice V3

We're excited to announce Thai Voice Cloning, the newest capability inside our Kaitom Voice V3 TTS API. Hand the model a clean 8–12 second Thai voice clip plus its literal transcript, and the API will speak any Thai text you want — in that exact voice.

This is the first production-grade Thai voice cloning API built specifically for the Thai language, and it is FREE during the alpha period (until 31 May 2026).

Why Thai Voice Cloning Matters

General-purpose voice cloning tools have been around for a while in English, but Thai has always been a stepchild — wrong tones, mispronounced final consonants, robotic prosody, and complete inability to handle Thai-English code mixing.

Kaitom Voice V3's cloning endpoint was trained on Thai natively. That means:

  • Correct Thai tones (ไม้เอก, โท, ตรี, จัตวา) preserved per the speaker.
  • Final consonants pronounced naturally instead of dropped.
  • Numbers, dates, currency read out in Thai conventions automatically.
  • Mixed Thai–English content handled in a single utterance.

You get a voice that actually sounds like the person you cloned — not a robot wearing their hat.

How It Works

Thai Voice Cloning Pipeline by iApp Technology

The reference clip captures the speaker's timbre and prosody; the V3 engine applies that to whatever Thai text you provide.

Try It in 60 Seconds

The fastest path is the in-browser demo — record yourself, type Thai text, hear it back in your own voice:

👉 Open the interactive demo

API Quick Start

cURL

curl -X POST 'https://api.iapp.co.th/v3/store/audio/tts/clone' \
--header 'apikey: YOUR_API_KEY' \
--form 'text=สวัสดีครับ วันนี้ทดสอบการโคลนเสียงด้วย AI' \
--form 'speed=1.0' \
--form 'ref_text=ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม' \
--form 'ref_audio=@reference.wav' \
--output 'output.pcm'

Python

import requests

url = "https://api.iapp.co.th/v3/store/audio/tts/clone"
headers = {"apikey": "YOUR_API_KEY"}

with open("reference.wav", "rb") as ref:
files = {"ref_audio": ref}
data = {
"text": "สวัสดีครับ วันนี้ทดสอบการโคลนเสียงด้วย AI",
"ref_text": "ฮัลโหล สวัสดีครับ ผมชื่อไข่ต้ม",
"speed": "1.0",
}
r = requests.post(url, headers=headers, data=data, files=files)

with open("output.pcm", "wb") as f:
f.write(r.content)

The response is raw signed 16-bit little-endian PCM, mono, 24 kHz streamed as bytes arrive. Wrap it in a 44-byte WAV header on the client, or convert with ffmpeg:

ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav

Request Fields

FieldTypeRequiredNotes
textstringyesThai text to synthesize
ref_textstringyesLiteral transcript of ref_audio (word-for-word, not a description)
ref_audiofileyesWAV or MP3, 8–12 s of clean mono Thai speech
speedfloatno0.81.2, default 1.0

Tips for a Great Clone

The clone quality is bounded by the reference clip. A few minutes of curating pays off:

  1. Record clean audio. No background music, no traffic noise, no overlapping voices.
  2. Use a single speaker. Don't include "uh", coughs, or interjections from someone else.
  3. Match the transcript exactly. ref_text must say what ref_audio says, character-for-character. Mismatches cause prosody drift and speed-up artifacts.
  4. Stay in the 8–12 s window. Shorter clips lose timbre; clips longer than 15 s are silently trimmed and the trailing portion of ref_text becomes garbage.
  5. Use one consistent speaking style. If the reference is calm and the target text is shouted, expect ambiguous results.

What You Can Build

Personalized Audiobooks

Authors can narrate their own books in Thai, then clone the voice to extend coverage to chapters they didn't have time to record.

Brand Voice at Scale

Record a 10-second sample of your brand voice talent once, then synthesize unlimited Thai marketing videos, IVR prompts, and product demos in that exact voice.

Voice Preservation

Help patients with degenerative speech conditions preserve their voice — record a sample now, retain the ability to "speak" later.

Localization for Thai Markets

Bring international content to Thai audiences with the same speaker the audience already recognizes. No casting calls, no studio days.

Accessibility & E-Learning

Generate audio versions of educational content using a teacher's own voice, complete with correct Thai number, date, and currency reading.

Pricing

Voice cloning is FREE until 31 May 2026 during the alpha. After that, pricing for general availability will be announced — alpha users will get a heads-up first.

EndpointMethodCost (Alpha)
/v3/store/audio/tts (default Kaitom voice)POST1 IC per 400 chars (Alpha Free until 2026-05-31)
/v3/store/audio/tts/clone (Thai voice cloning)POST1 IC per 400 chars (Alpha Free until 2026-05-31)

Responsible Use

Voice cloning is powerful and easy to misuse. iApp Technology requires that you:

  • Have explicit consent from the speaker before cloning their voice.
  • Disclose synthetic audio in contexts where authenticity matters (news, public statements, customer service identity).
  • Do not impersonate real people for fraud, harassment, or misinformation.

Misuse violates our Terms of Service and Thai law (PDPA, Computer Crime Act). We log all clone requests and cooperate with law enforcement on reported abuse.

Limitations During Alpha

  • Thai language only — English and Chinese cloning are on the roadmap.
  • Clone requests are processed serially per server. Expect queued latency under heavy concurrent load.
  • Maximum reference length: 15 s (longer clips are trimmed).
  • Output is PCM (raw); wrap with WAV header on the client to play.

Get Started Today

Feedback

This is alpha — your feedback shapes what GA looks like.


Thai Voice Cloning is part of Kaitom Voice V3 and is available now to all iApp API users at no charge during the alpha period.