🗣️ (Alpha) iApp Text-to-Speech (TTS) 🆕
🗣️ AI-powered text-to-speech synthesis API
Welcome to iApp TTSv3 API, a cutting-edge text-to-speech synthesis service that converts text into natural-sounding speech. Our API uses an advanced AI model to generate audio from text input with excellent accuracy and speed.
Try Demo
Getting Started
Prerequisites
- Text input in Thai language
- Maximum tokens: 1400
- Output format: WAV
Quick Start
- Fast processing with GPU acceleration
- Natural speech generation
- High-quality speech output
Key Features
- Natural speech synthesis using state-of-the-art AI
- Advanced voice quality tuning via parameters
- High-speed response times
- Simple REST API interface
API Usage
Endpoints
POST /tts
- Generate speech from text and download as a file
API Request Examples
Using cURL:
# Health check
curl https://api.iapp.co.th/v3/audio/health
# Generate speech and save to file
curl -X POST https://api.iapp.co.th/v3/audio/tts \
-H "Content-Type: application/json" \
-d '{"text":"Hello, this is a test.","temperature":0.2,"top_p":0.95}' \
--output test.wav
Using Python:
import requests
# Text-to-speech request
response = requests.post(
"https://api.iapp.co.th/v3/audio/tts",
json={
"text": "สวัสดีครับ",
"temperature": 0.2,
"top_p": 0.95,
"max_new_tokens": 1400
}
)
# Save the audio response to a file
with open("output.wav", "wb") as f:
f.write(response.content)
Request Parameters
Parameter | Type | Description | Default |
---|---|---|---|
text | string | Text to convert to speech | Required |
temperature | float | Generation temperature (higher = more random) | 0.2 |
top_p | float | Top-p sampling parameter | 0.95 |
max_new_tokens | integer | Maximum number of tokens to generate | 1400 |
Best Practices
- Use proper punctuation for better speech synthesis
- Keep sentences natural and conversational
- For long text, consider breaking it into smaller segments
- Adjust temperature and top_p parameters to control voice style:
- Lower temperature (0.1-0.5): More consistent, stable voice
- Higher temperature (0.6-1.0): More expressive but less predictable