iApp ASR Pro Evaluation on Mozilla Common Voice 17.0
This report presents the evaluation results of the automatic speech recognition system (iApp ASR Pro) on the Mozilla Common Voice 17.0 dataset to compare the performance of various ASR services, both international and domestic providers.
Dataset and Evaluation Metrics
This evaluation was conducted on the Mozilla Common Voice 17.0 dataset, a widely used dataset containing diverse audio samples. It is used to test ASR accuracy. We used Word Error Rate (WER) and Character Error Rate (CER) as the main metrics for evaluating each ASR service.
WER and CER Calculation
We used the jiwer library to calculate WER and CER, which are standard tools for ASR evaluation. Each metric is described as follows:
-
Word Error Rate (WER): Measures the error rate at the word level, calculated from the sum of word substitutions, deletions, and insertions, divided by the total number of words in the reference transcript.
-
Character Error Rate (CER): Measures the error rate at the character level, calculated in the same way as WER but at the character level, providing a more detailed accuracy measurement.
WER and CER calculation was performed using jiwer according to the following steps:
-
Adjust the reference transcript and the transcript obtained from the ASR to be in the same format (e.g., lowercase, remove punctuation).
-
Calculate WER and CER by comparing the results from each ASR with the reference transcript.
Accuracy Evaluation Table
The table below summarizes the word error rate (WER), character error rate (CER), and accuracy of each ASR service.
Important Information:
-
iApp ASR PRO has the highest accuracy with a WER of 92.41% and a CER of 97.81%.
-
Google ASR and Thai Local Competitor showed similar accuracy levels, with WER values of 88.11% and 88.64%, respectively.
-
iApp ASR Base showed reliable performance with a WER of 85.48%, making it a good alternative to the highly accurate iApp ASR PRO.
CER Evaluation
The chart below shows a comparison of the character error rate (CER) between iApp ASR Pro and other ASR services.
Summary:
-
iApp ASR Pro vs Thai Local Competitor: iApp ASR Pro has a win rate in CER of 47.3%, demonstrating superiority in many cases, with a draw of 29.2% and a loss of 23.6%.
-
iApp ASR Pro vs Google ASR: iApp ASR Pro shows a win rate of 44.9%, a draw of 31.0%, and a loss of 24.1%.
-
Further comparisons show the competitiveness among these ASR services.
WER Evaluation
This chart shows a comparison of the word error rate (WER) between iApp ASR Pro and other ASR services.
Notes:
-
iApp ASR Pro vs Thai Local Competitor: iApp ASR Pro has a win rate in WER of 38.2%, with a draw rate of 38.5% and a loss rate of 23.3%.
-
iApp ASR Pro vs Google ASR: iApp ASR Pro has a win rate of 35.5% and a draw rate of 41.1%, with a loss rate of 23.5%.
-
These results show the different performance levels of the various ASR services.
Conclusion
This evaluation shows that iApp ASR PRO has the highest accuracy level. Google ASR and Thai Local Competitor have balanced capabilities and good accuracy, while iApp ASR Base has comparable accuracy. This information can help users choose the appropriate ASR service based on their specific needs.
You can test iApp ASR PRO at https://ai.iapp.co.th/product/speech_to_text_asr