Voice: Still the Most Natural, the Most Comfortable and the Safest

Voice: Still the Most Natural, the Most Comfortable and the Safest

As in our previous post (From Single-Use Bots to Intelligent One-for-All Bots), the scope and scale of bots is expanding day by day. Furthermore, bots, whose capabilities have increased and diversified through artificial intelligence and machine learning, also facilitate the provision of more inclusive and accessible services for people of different ages, cultural backgrounds, disabilities, gender, temporary and permanent impairments, race and socioeconomic status.

There is no doubt that algorithms are one of the main actors shaping our present and future experiences. It has effects on our past experiences as well. An artificial intelligence application that successfully deciphers destroyed or missing Ancient Greek inscriptions and seems much more successful than human beings in this task (DeepMind AI beats humans at deciphering damaged ancient Greek tablets) is just one example among many others. It was also an artificial intelligence application that revealed that the work “Samson and Delilah”, which has been on display at the London National Gallery for 40 years, allegedly belonging to Peter Paul Rubens, is fake (Was famed Samson and Delilah really painted by Rubens? No, says AI).

As can be seen from these examples, algorithms are already more adept at distinguishing fake from real than the human beings who created them. On the other hand, bots equipped with superhuman abilities gain the ability to pretend to be human, and this enriches the possibilities of people to pretend and to show oneself different from what they really are. For example, machine learning and artificial intelligence technologies make it possible to clone a person’s voice and imitate their speech by making use of a person’s speech recordings. In this way, it is possible for people who have lost their voice or already passed away to continue to speak via their own voice, albeit through a digital interface. Similarly, an actor can voice different projects at the same time by cloning his own voice and increase his income (dubbing artist and actor Tim Heller explains that by cloning his voice, he can do many things simultaneously, such as animating cartoon characters, voicing books and documentaries, speaking in video games, and voicing in movie trailers). Similarly, it is possible for a singer to have his/her cloned voice sing in languages he/she does not speak.

What if behind the familiar and reliable voice on the phone there is a hand you wouldn’t want to shake?

Every new technique we develop makes us go beyond what we achieved previously, using much less effort and fewer resources. However, the impact of developments is not limited to our good and harmless abilities. The criminal world also (FBI says profits from cybercrime hit $3.5 billion in 2019) develops and diversifies its methods by making use of new techniques. The news about people being deceived or defrauded via phone or e-mail by strangers no longer surprises any of us. However, deepfake technology, which can make the cloned image or voice do the desired movement and say what is wanted, also increases the possibility of encountering a bot with the appearance and voice of someone we already know, and this sounds quite frightening.

If he/she is not nearby, hearing his/her voice over the phone can be considered the surest way to understand that we are communicating with someone we know. The incident of a bank manager is a case in point. In early 2020, a bank manager in Hong Kong received a call from a man whose voice he knew (he was the manager of a company he had spoken to before). The manager had good news: his company was about to make an acquisition, so the bank had to allow some money transfers of $35 million. A lawyer named Martin Zelner was hired to coordinate the procedures, and the bank manager could see emails from the manager and Zelner in his inbox and verify which amount of money was to be sent where. The bank manager started making the transfers, thinking everything looked legitimate. What he didn’t know was that he had been tricked as part of an elaborate scenario where the scammers used “deep fake” technology to clone the company executive’s speech.

The first recorded case, which is thought to have used the voice cloning technique, took place in March 2019. A fraudulent transfer of 220,000 EUR  (Related article from The Wall Street Journal) was requested in an incident where scammers used AI-based software to imitate the voice of a CEO, and what cybercriminal experts described as an unusual case of AI used in hacking.

It may be frightening that criminal world actors come up with more sophisticated and complicated scenarios by taking advantage of new technologies, but it is not a healthy and logical reaction to refrain from using these technologies. It is highly probable that we would still be living in caves if our ancestors had only focused on how unsafe the life outside was and had not thought of developing appropriate protective methods and tools. On the other hand, it is impossible for us to protect personal computers against external attacks by examining the files one by one; nor does it seem possible for us to protect ourselves against increasingly sophisticated fraudulent methods by being more vigilant or by continuing to use outdated security methods. For instance, if a multi-factor authentication system or a voice biometrics solution equipped with passive authentication and effective fraud prevention capabilities, capable of operating independently of language and accent, and of distinguishing whether the voice is reproduced by digital means, were used in the examples above, it would be highly likely that the cases would have been prevented from producing undesirable results.

In fact, voice, like fingerprints or iris, is a uniquely human trait. This paves the way for voice to be used as a powerful authentication tool. Unlike PINs, passwords and answers to challenge questions, voice biometrics can’t be compromised without the knowledge of the voice’s owner. This is one of the factors that makes voice verification much more secure. However, it is not possible to do such an analysis by manual listening. The conversational biometrics solutions we developed as Sestek analyze the voice based on over 100 parameters. The solutions in question have playback manipulation detection functionality. This means that the solution will detect whether the party on the other end of the phone is actually speaking or is playing a voice sample. When a recorded voice sample is played, the technology can detect and report the situation by using the synthetic voice detection feature. The system, which can detect even known fraudsters with its biometric blacklist detection feature, uses a voice change detection algorithm to determine if a user’s voice has changed during a conversation. ING Turkey, one of our customers using this technology, shortened the average call time by 19 seconds for calls requiring identity verification. Thanks to this saving, they reduced operational costs and increased customer and representative satisfaction.

For details on our conversational solutions, please visit

https://www.sestek.com/conversational-biometrics/