Skip to main content

🗣️ (Alpha) iApp Text-to-Speech (TTS) + Voice Cloning 🆕

🗣️ AI-powered text-to-speech synthesis API

Version Status New

Welcome to iApp TTSv3 API, a cutting-edge text-to-speech synthesis service that converts text into natural-sounding speech. Our API uses an advanced AI model to generate audio from text input with excellent accuracy and speed.

Try Demo

Example File (Click to try)

Selected: vc-demo.mp3

Demo key is limited to 10 requests per day per IP
Click here to get your API key

Getting Started

Prerequisites

  • Text input in English only
  • Maximum tokens: 1400
  • Output format: WAV
  • Source voice file: WAV format (Optional)

Quick Start

  • Fast processing with GPU acceleration
  • Natural speech generation
  • High-quality speech output

Key Features

  • Voice clone from source voice file
  • Text input in English only
  • Natural speech synthesis using state-of-the-art AI
  • Advanced voice quality tuning via parameters
  • High-speed response times
  • Simple REST API interface

API Usage

Endpoints

  • POST /tts - Generate speech from text and download as a file

API Request Examples

Using cURL with source voice file to clone voice:

# Health check
curl http://localhost:8000/health

# Generate speech and save to file
curl -X POST http://localhost:8000/tts \
-H "Content-Type: multipart/form-data" \
-F "text=Hello, this is a test." \
-F "source_vc_text=Transcription of source_voice_clone.wav file" \
-F "temperature=0.7" \
-F "top_p=0.95" \
-F "source_voice_file=@source_voice_clone.wav" \
--output test.wav

Using Python with source voice file to clone voice:

import requests
with open("source_voice_clone.wav", "rb") as f:
# Text-to-speech request
response = requests.post(
"http://localhost:8000/tts",
files={
"source_vc_file": ("source_voice_clone.wav", f, "audio/wav")
},
data={
"text": "Hello, this is a test.",
"temperature": 0.9,
"top_p": 0.95,
"max_new_tokens": 1400,
"source_vc_text": "Transcription of source_voice_clone.wav file"
}
)

# Save the audio response to a file
with open("output.wav", "wb") as f:
f.write(response.content)

Request Parameters (form-data)

ParameterTypeDescriptionDefault
textstringText to convert to speechRequired
temperaturefloatGeneration temperature (higher = more random)0.2
top_pfloatTop-p sampling parameter0.95
max_new_tokensintegerMaximum number of tokens to generate1400
source_vc_filefileSource voice file (WAV format)Optional
source_vc_textstringText to convert to speech for source voiceRequired if source_vc_file is provided

Best Practices

  • Use proper punctuation for better speech synthesis
  • Keep sentences natural and conversational
  • For long text, consider breaking it into smaller segments
  • Adjust temperature and top_p parameters to control voice style:
    • Lower temperature (0.1-0.5): More consistent, stable voice
    • Higher temperature (0.6-1.0): More expressive but less predictable