🗣️ (Alpha) iApp Text-to-Speech (TTS) + Voice Cloning 🆕
🗣️ AI-powered text-to-speech synthesis API
Welcome to iApp TTSv3 API, a cutting-edge text-to-speech synthesis service that converts text into natural-sounding speech. Our API uses an advanced AI model to generate audio from text input with excellent accuracy and speed.
Try Demo
Example File (Click to try)
Selected: vc-demo.mp3
Demo key is limited to 10 requests per day per IP
Click here to get your API key
Getting Started
Prerequisites
- Text input in English only
- Maximum tokens: 1400
- Output format: WAV
- Source voice file: WAV format (Optional)
Quick Start
- Fast processing with GPU acceleration
- Natural speech generation
- High-quality speech output
Key Features
- Voice clone from source voice file
- Text input in English only
- Natural speech synthesis using state-of-the-art AI
- Advanced voice quality tuning via parameters
- High-speed response times
- Simple REST API interface
API Usage
Endpoints
POST /tts
- Generate speech from text and download as a file
API Request Examples
Using cURL with source voice file to clone voice:
# Health check
curl http://localhost:8000/health
# Generate speech and save to file
curl -X POST http://localhost:8000/tts \
-H "Content-Type: multipart/form-data" \
-F "text=Hello, this is a test." \
-F "source_vc_text=Transcription of source_voice_clone.wav file" \
-F "temperature=0.7" \
-F "top_p=0.95" \
-F "source_voice_file=@source_voice_clone.wav" \
--output test.wav
Using Python with source voice file to clone voice:
import requests
with open("source_voice_clone.wav", "rb") as f:
# Text-to-speech request
response = requests.post(
"http://localhost:8000/tts",
files={
"source_vc_file": ("source_voice_clone.wav", f, "audio/wav")
},
data={
"text": "Hello, this is a test.",
"temperature": 0.9,
"top_p": 0.95,
"max_new_tokens": 1400,
"source_vc_text": "Transcription of source_voice_clone.wav file"
}
)
# Save the audio response to a file
with open("output.wav", "wb") as f:
f.write(response.content)
Request Parameters (form-data)
Parameter | Type | Description | Default |
---|---|---|---|
text | string | Text to convert to speech | Required |
temperature | float | Generation temperature (higher = more random) | 0.2 |
top_p | float | Top-p sampling parameter | 0.95 |
max_new_tokens | integer | Maximum number of tokens to generate | 1400 |
source_vc_file | file | Source voice file (WAV format) | Optional |
source_vc_text | string | Text to convert to speech for source voice | Required if source_vc_file is provided |
Best Practices
- Use proper punctuation for better speech synthesis
- Keep sentences natural and conversational
- For long text, consider breaking it into smaller segments
- Adjust temperature and top_p parameters to control voice style:
- Lower temperature (0.1-0.5): More consistent, stable voice
- Higher temperature (0.6-1.0): More expressive but less predictable