跳到主要内容

🇹🇭 泰语语音转文本 (ASR) 基础版

1 IC每60秒
✅ 活跃🎙️ 语音

欢迎使用泰语 ASR 基础版 - 我们的标准泰语自动语音识别服务。此版本提供快速的处理速度,同时为通用用例保持良好的准确性。

iApp Text to Speech API

试用演示

Example File (Click to try)

Selected: 2ppl.wav

入门指南

  1. 先决条件

    • 艾艾普科技的 API 密钥
    • 支持格式的音频文件
    • 支持的格式:MP3、WAV、AAC、M4A
    • 最大文件长度:不超过 30 分钟
    • 最大文件大小:1GB
  2. 快速开始

    • 响应时间 0.3 秒,处理速度快
    • 高准确率(85.48% WER 准确率)
    • 支持泰语
  3. 主要功能

    • 从音频文件中提取文本
    • 声纹分割
    • 快速处理速度
    • 灵活的 JSON 响应格式
  4. 安全与合规

    • 符合 GDPR 和 PDPA
    • 处理后不保留数据
如何获取 API 密钥?

请访问 API 密钥管理 页面查看您现有的 API 密钥或申请新密钥。

API 端点

端点方法描述费用
/v3/store/speech/speech-to-text/base
旧版:/asr/v3
POST将泰语语音转换为文本(基础模型)每 60 秒 1 IC

API 参考

端点

POST https://api.iapp.co.th/v3/store/speech/speech-to-text/base

请求头

  • apikey (必需):用于身份验证的 API 密钥
  • 其他请求头由 FormData 生成

请求参数

参数类型描述
file*文件 (.mp3, .wav, .aac, .m4a)要转录的音频文件(不超过 30 分钟)
chunk_size整数音频分块大小(推荐:7)

代码示例

Curl

curl -X POST https://api.iapp.co.th/v3/store/speech/speech-to-text/base \
-H "apikey: YOUR_API_KEY" \
-F "file=@/path/to/file.jpg"

Python

import requests

url = "https://api.iapp.co.th/v3/store/speech/speech-to-text/base"

payload = {'use_asr_pro': '0', 'chunk_size': '7'}
files=[
('file',('{YOUR_UPLOADED_FILE}',open('{YOUR_UPLOADED_FILE_PATH}','rb'),'application/octet-stream'))
]
headers = {
'apikey': '{YOUR_API_KEY}'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

Javascript

const axios = require("axios")
const FormData = require("form-data")
const fs = require("fs")
let data = new FormData()
data.append("file", fs.createReadStream("YOUR_UPLOADED_FILE"))
data.append("use_asr_pro", "0") // 设置为 '1' 表示 iApp ASR PRO
data.append("chunk_size", "7")

let config = {
method: "post",
maxBodyLength: Infinity,
url: "https://api.iapp.co.th/v3/store/speech/speech-to-text/base",
headers: {
apikey: "{YOUR_API_KEY}",
...data.getHeaders(),
},
data: data,
}

axios
.request(config)
.then((response) => {
console.log(JSON.stringify(response.data))
})
.catch((error) => {
console.log(error)
})

PHP

<?php

$curl = curl_init();

curl_setopt_array($curl, array(
CURLOPT_URL => 'https://api.iapp.co.th/v3/store/speech/speech-to-text/base',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => array('file'=> new CURLFILE('{YOUR_UPLOADED_FILE}'),
'use_asr_pro' => '0',
'chunk_size' => '7'),
CURLOPT_HTTPHEADER => array(
'apikey: {YOUR_API_KEY}'
),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

Swift

let parameters = [
[
"key": "file",
"src": "{YOUR_UPLOADED_FILE}",
"type": "file"
],
[
"key": "use_asr_pro",
"value": "0",
"type": "text"
],
[
"key": "chunk_size",
"value": "7",
"type": "text"
]] as [[String: Any]]

let boundary = "Boundary-\(UUID().uuidString)"
var body = Data()
var error: Error? = nil
for param in parameters {
if param["disabled"] != nil { continue }
let paramName = param["key"]!
body += Data("--\(boundary)\r\n".utf8)
body += Data("Content-Disposition:form-data; name=\"\(paramName)\"".utf8)
if param["contentType"] != nil {
body += Data("\r\nContent-Type: \(param["contentType"] as! String)".utf8)
}
let paramType = param["type"] as! String
if paramType == "text" {
let paramValue = param["value"] as! String
body += Data("\r\n\r\n\(paramValue)\r\n".utf8)
} else {
let paramSrc = param["src"] as! String
let fileURL = URL(fileURLWithPath: paramSrc)
if let fileContent = try? Data(contentsOf: fileURL) {
body += Data("; filename=\"\(paramSrc)\"\r\n".utf8)
body += Data("Content-Type: \"content-type header\"\r\n".utf8)
body += Data("\r\n".utf8)
body += fileContent
body += Data("\r\n".utf8)
}
}
}
body += Data("--\(boundary)--\r\n".utf8);
let postData = body


var request = URLRequest(url: URL(string: "https://api.iapp.co.th/v3/store/speech/speech-to-text/base")!,timeoutInterval: Double.infinity)
request.addValue("{YOUR_API_KEY}", forHTTPHeaderField: "apikey")
request.addValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")

request.httpMethod = "POST"
request.httpBody = postData

let task = URLSession.shared.dataTask(with: request) { data, response, error in
guard let data = data else {
print(String(describing: error))
return
}
print(String(data: data, encoding: .utf8)!)
}

task.resume()

Kotlin

val client = OkHttpClient()
val mediaType = "text/plain".toMediaType()
val body = MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{YOUR_UPLOADED_FILE}",
File("{YOUR_UPLOADED_FILE_PATH}").asRequestBody("application/octet-stream".toMediaType()))
.addFormDataPart("use_asr_pro","0")
.addFormDataPart("chunk_size","7")
.build()
val request = Request.Builder()
.url("https://api.iapp.co.th/v3/store/speech/speech-to-text/base")
.post(body)
.addHeader("apikey", "{YOUR_API_KEY}")
.build()
val response = client.newCall(request).execute()

Java

OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{YOUR_UPLOADED_FILE}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("{YOUR_UPLOADED_FILE_PATH}")))
.addFormDataPart("use_asr_pro","0")
.addFormDataPart("chunk_size","7")
.build();
Request request = new Request.Builder()
.url("https://api.iapp.co.th/v3/store/speech/speech-to-text/base")
.method("POST", body)
.addHeader("apikey", "{YOUR_API_KEY}")
.build();
Response response = client.newCall(request).execute();

Dart

var headers = {
'apikey': '{YOUR_API_KEY}'
};
var request = http.MultipartRequest('POST', Uri.parse('https://api.iapp.co.th/v3/store/speech/speech-to-text/base'));
request.fields.addAll({
'use_asr_pro': '0',
'chunk_size': '7'
});
request.files.add(await http.MultipartFile.fromPath('file', '{YOUR_UPLOADED_FILE'));
request.headers.addAll(headers);

http.StreamedResponse response = await request.send();

if (response.statusCode == 200) {
print(await response.stream.bytesToString());
}
else {
print(response.reasonPhrase);
}

准确率与性能

整体准确率

在 Mozilla Common Voice 数据集的泰语测试集上的基准测试结果。我们评估了 iApp ASR 在两个不同版本的测试集上的性能。

  • 测试条件

    1. 未见过的数据集
    2. 仅限泰语
    3. 说话人多样性:男性、女性、儿童
  • Mozilla Common Voice 16.1 泰语测试集(版本 1)

    在 Hugging Face 上访问数据集

    结果:

    • 测试集大小:11,038 个测试示例
    • 平均词错误率:1.11%
    • 平均词错误率(无空格):0.66%
  • Mozilla Common Voice 17.0 泰语测试集(版本 2)

    在 Hugging Face 上访问数据集

    结果:

    • 测试集大小:11,042 个样本
    • 词错误率 (WER):14.52%
    • 字符错误率 (CER):5.87%
    • 基于 WER 的准确率:85.48%
    • 基于 CER 的准确率:94.13%

处理速度

  • 平均响应时间:0.3 秒
  • 比其他供应商快 15 倍

详细基准测试结果(在 Google 表格中):

iApp ASR 基础版基准测试结果

定价

AI API 服务名称端点每秒 IC本地部署
泰语语音转文本 (ASR) iapp-asr-v3-en [基础模型]1 IC/60 秒联系我们
iapp-asr-v3-th-en [基础模型]1 IC/60 秒