The Pillar of a Successful Conversational Journey: Speech Recognition

By October 10, 2020Blog Posts
The Pillar of a Successful Conversational Journey Speech Recognition

Conversational technologies transform the customer journey. By allowing customers to use their own words to interact with systems, conversational technologies offer the most natural communication method. And the conversational journey starts with speech recognition technology.

Speech Recognition (SR), also known as automatic speech recognition (ASR), catches spoken words and phrases and converts them to a machine-readable format. This is the first step to let users control devices and systems by speaking instead of using conventional tools such as keystrokes or buttons.

Why is Speech Recognition important?

As the first step, the accuracy of speech recognition is key to a successful conversational journey. If you cannot accurately translate voice into text, you cannot understand what your customers are saying, and you will not be able to solve their problems. The accuracy of SR increases the efficiency of self-service applications and allows companies to deliver improved customer experiences. Since SR is the core technology that empowers conversational solutions, the success of a conversational system depends on the capabilities of its SR technology. In other words, to ensure a smooth conversation between machines and the customers, a comprehensive Speech Recognition solution is crucial.

To offer an effective conversational product, make sure that your SR solution ;

  • has a high recognition accuracy
  • offers advanced natural language support
  • supports multiple languages and accents
  • easily integrates with multiple technologies like AI, natural language processing (NLP), and machine learning (ML)
  • has a flexible structure that supports omnichannel deployment

How Sestek SR stands out

20 Years of Know-How

Sestek SR is the product of Sestek’s 20 years of experience in building highly accurate speech solutions. Since day one, we have been working hard to make our technology more accurate and robust. Empowering Sestek Speech Recognition with the latest technologies like neural network (NN) improves its recognition accuracy and as an R&D company, we have been investing in this for a long time.

End-to-end Conversational Journey

Sestek SR is the core technology behind our main products such as voice IVRs, virtual assistants, and conversational analytics. Moreover, Sestek SR is a component of our omnichannel automation solutions. Meaning when you implement Sestek SR once, you can benefit from the technology at any channel you are willing to build conversational solutions for your customers.

Tailor-Made for Different Verticals

Each business has different priorities when it comes to offering the best customer service. Each business needs specific solutions rather than one-size-fits-all ones to build the right conversational journey.

Sestek Speech Recognition’s highly customizable structure enables us to build a tailor-made conversational solution for each company. The technology can be trained with specific language models according to industry and vertical needs.

Difficult to Build Difficult to Implement

Building highly accurate speech solutions in house might take significant time and effort. Collaborating with experienced vendors saves more than money, it can contribute to the awareness within your organization. But this requires a close relationship with your technology provider. Your technology provider needs to understand your needs fast and offer intelligent guidance with proven processes and advanced tools. Sestek offers end-to-end professional services, including strategy building, application design, deployment, testing, and optimization. Our team’s expertise relies on hands-on experience in speech tech, gained from 20 years of developing conversational solutions. This may be our most significant differentiator to our global competitors’ deploy and forget approach.

SR Accuracy Test

Sestek SR is the product of our continuous R&D efforts. We optimize our product with the latest technologies and methods in a way that increases recognition accuracy.

Lately, we developed a new model where we used a neural network as a technological leap. And to measure the success of this model, we tested the accuracy of our speech-to-text engine. We compared our engine with Google and IBM’s SR engines.

For manual testing, we used two sets of random data from call center recordings, two sets of recordings of medical articles. For automated testing, we used 3 YouTube videos.

In the manual test, recordings were listened to and labeled all the automatic transcribed words/phrases as correct/wrong and calculated final word-error rates within the data set. WER (word-error-rate) is a common metric for SR engines; it is the ratio of the total word of error (substitutions, deletions, and insertions) to the total number of words in the reference. The smaller the ratio, the more accurate the engine.

The first table shows the results of manual calculation, and the second one shows the result calculated automatically using the reference text.  Here are the results:

Manual Measurement

WER (Word Error Rate) Google IBM Sestek Current Sestek New
Agent Conversation 9.0% 11.9% 5.5% 4.0%
Customer Conversation 4.9% 6.5% 5.0% 4.2%
Medical Text 1 3.4% 4.0% 3.4% 2.0%
Medical Text 2 3.2% 3.0% 5.8% 4.3%

Automated Measurement

WER (Word Error Rate) Google IBM Sestek Current Sestek New
Videos from Youtube 18% 11.5% 9.3% 7.1%

As seen above, our NEW approach provides nearly 30% improvement for accuracy.

With these numbers, we are not suggesting that we are certainly better or the rest is certainly worse. The speech recognition process includes calculating and optimizing millions of parameters over a vast search space, and it is hugely stochastic (what we engineers call as the pattern that may be analyzed statistically but may not be predicted precisely). A vendor’s SR engine can perform better than others for a specific recording, but the same engine can perform worse for another one.

We are simply suggesting that our SR technology can easily compete with billion-dollar vendors such as Google and IBM.

Learn More

Speech recognition is among the leading technologies used in conversational automation. The performance of this technology plays a crucial role in the success of conversational customer services. By offering an easy-to-use and advanced conversational system, businesses can improve customer experience. That is why choosing the right speech recognition technology is an important decision to make. Sestek offers an effective solution not only with its advanced technical features and high accuracy rates but also with 20 years of know-how and distinctive professional services. Click here to test our Speech Recognition technology for the following languages; Turkish, English, Flemish, French, Russian, and Turkish.

Author: Aylin Tan, Product Management Specialist, Sestek