Speech Recognition Jun 03 · 2 min read

Speech Recognition Accuracy Test 2024 – Arabic Edition

We are thrilled to unveil our latest benchmarking results for Arabic Speech Recognition (SR) services. In our comprehensive evaluation, we compared our Arabic SR solutions with those from providers such as Google, Azure, AWS, Whisper, and Speechmatics. This assessment utilized a publicly available dataset featuring diverse native Arabic speakers and a dataset comprised of customer service representative phone calls. 

 

The Dialect Challenge in Speech-to-Text

Creating an effective SR engine demands sophisticated algorithms and models capable of translating complex audio into text. This conversion necessitates an in-depth comprehension of language nuances, including accents, and dialects. 

A primary hurdle for SR technology is the variability of regional dialects, especially in Arabic. Systems trained primarily on standardized linguistic data often fail to accurately transcribe speech that diverges from the norm. 

While Modern Standard Arabic (MSA) serves as the formal language in most official settings across the Middle East and Northern Africa (MENA), the everyday spoken language differs greatly. Regional dialects vary widely in terms of pronunciation, grammar, and vocabulary. To overcome these variations, SR systems must be trained on extensive datasets encompassing a variety of dialects, enhancing both accuracy and functionality. 

Our accuracy tests employed the Word Error Rate (WER) method, a common metric for evaluating SR systems. WER calculates the percentage of discrepancies in the SR output compared to the accurate "ground truth" transcription, factoring in substitutions, deletions, and insertions relative to the total word count of the ground truth. The lower the WER, the better the engine.

 

Test Dataset

The benchmark was conducted using the following datasets:

 

1. Arabic Mediaspeech Dataset

Context: Publicly available set from A1 Arabiya, France 24 Arabic, BBC News. 

Subset: Random 1-hour subset used for tests (results as of April 15, 2024). 

Results:

Speech Recognition accuracy rate

 

2. Customer-Service Representative Phone Call

Context: Real-life telephone conversations in the Egyptian dialect. 

Technique: Fine-tuning was done for a specific domain and customer.

Results:

Speech Recognition accuracy rate

 

The following models were utilized for test:

  • AssemblyAi Uni-1 (nano)
  • Google's latest-short
  • Speechmatics enhanced
  • Whisper Large-v3

 

The Impact of Fine-Tuning

Our test highlights the critical role of fine-tuning in enhancing SR system accuracy. By training on extensive datasets that include a range of dialects and refining acoustic models to better handle these variations, SR systems can improve transcription accuracy for non-standard languages. This is essential for ensuring reliable SR performance in practical applications where audio quality and background noise may vary.

 

Wrapping Up

As SESTEK, we have been developing SR engines for different languages over the last 20 years. We have vast expertise in customer service vertical and we are happy with our near-zero error rate for Arabic language. 

This benchmark also underscores the substantial benefits that fine-tuning offers for specific dialects, revealing notable variability in accuracy across different SR providers. As we continue to confront the unique complexities of the Arabic language, the need for ongoing technological enhancements remains clear. Through dedicated fine-tuning and advancements, we aim to set new standards in Arabic speech recognition accuracy. 

 

Disclaimer: The speech recognition process includes calculating and optimizing millions of parameters over a vast search space. It is hugely stochastic (a pattern that may be analyzed statistically but not predicted precisely). A vendor’s SR engine can perform better than others for a specific recording, but the same engine can perform differently for other recordings. 

 

Author: Debi Çakar, SESTEK Product Team

 

Keep Exploring
Speech Recognition Mar 27 · 2 min read
Speech Recognition Accuracy Comparison Test 2023

Speech Recognition (SR), also known as Automatic Speech Recognition (ASR), is a system for processing received sounds with hardware-based techniques and software and converting the sound to text.

Read More
ChatGPT Apr 28 · 5 min read
ChatGPT in Linux CLI

ChatGPT has revolutionized the way people interact with technology. It has brought about a new era of personalized and natural language communication.

Read More
Sestek Jun 16 · 4 min read
SESTEK Helps Deliver Fast and Secure Customer Authentication

SESTEK, a global technology company specializing in conversational solutions, today announced that its Voice Biometrics solution is compliant with key Avaya Aura® solutions, authenticating callers within seconds using a state-of-the-art deep neural networ

Read More

Contact Us

Thank you!

Thank you for your message. We’ll contact you soon.

Application Form

Click here or drop files to upload

Thank you!

All done!


Your application for the - position has been submitted successfully.


Return to Career Page