We are thrilled to unveil our latest benchmarking results for Arabic Speech Recognition (SR) services. In our comprehensive evaluation, we compared our Arabic SR solutions with those from providers such as Google, Azure, AWS, Whisper, and Speechmatics. This assessment utilized a publicly available dataset featuring diverse native Arabic speakers and a dataset comprised of customer service representative phone calls.
Creating an effective SR engine demands sophisticated algorithms and models capable of translating complex audio into text. This conversion necessitates an in-depth comprehension of language nuances, including accents, and dialects.
A primary hurdle for SR technology is the variability of regional dialects, especially in Arabic. Systems trained primarily on standardized linguistic data often fail to accurately transcribe speech that diverges from the norm.
While Modern Standard Arabic (MSA) serves as the formal language in most official settings across the Middle East and Northern Africa (MENA), the everyday spoken language differs greatly. Regional dialects vary widely in terms of pronunciation, grammar, and vocabulary. To overcome these variations, SR systems must be trained on extensive datasets encompassing a variety of dialects, enhancing both accuracy and functionality.
Our accuracy tests employed the Word Error Rate (WER) method, a common metric for evaluating SR systems. WER calculates the percentage of discrepancies in the SR output compared to the accurate "ground truth" transcription, factoring in substitutions, deletions, and insertions relative to the total word count of the ground truth. The lower the WER, the better the engine.
The benchmark was conducted using the following datasets:
1. Arabic Mediaspeech Dataset
Context: Publicly available set from A1 Arabiya, France 24 Arabic, BBC News.
Subset: Random 1-hour subset used for tests (results as of April 15, 2024).
Results:
2. Customer-Service Representative Phone Call
Context: Real-life telephone conversations in the Egyptian dialect.
Technique: Fine-tuning was done for a specific domain and customer.
Results:
The following models were utilized for test:
Our test highlights the critical role of fine-tuning in enhancing SR system accuracy. By training on extensive datasets that include a range of dialects and refining acoustic models to better handle these variations, SR systems can improve transcription accuracy for non-standard languages. This is essential for ensuring reliable SR performance in practical applications where audio quality and background noise may vary.
As SESTEK, we have been developing SR engines for different languages over the last 20 years. We have vast expertise in customer service vertical and we are happy with our near-zero error rate for Arabic language.
This benchmark also underscores the substantial benefits that fine-tuning offers for specific dialects, revealing notable variability in accuracy across different SR providers. As we continue to confront the unique complexities of the Arabic language, the need for ongoing technological enhancements remains clear. Through dedicated fine-tuning and advancements, we aim to set new standards in Arabic speech recognition accuracy.
Disclaimer: The speech recognition process includes calculating and optimizing millions of parameters over a vast search space. It is hugely stochastic (a pattern that may be analyzed statistically but not predicted precisely). A vendor’s SR engine can perform better than others for a specific recording, but the same engine can perform differently for other recordings.
Author: Debi Çakar, SESTEK Product Team
ChatGPT has revolutionized the way people interact with technology. It has brought about a new era of personalized and natural language communication.
Read MoreSpeech Recognition (SR), also known as Automatic Speech Recognition (ASR), is a system for processing received sounds with hardware-based techniques and software and converting the sound to text.
Read MoreSESTEK, a global technology company specializing in conversational solutions, today announced that its Voice Biometrics solution is compliant with key Avaya Aura® solutions, authenticating callers within seconds using a state-of-the-art deep neural networ
Read More