It’s a common scenario: standing in the heart of a crowded cocktail party or walking down a busy urban street, you find yourself deep in conversation with whoever is next to you. The brain’s sub-conscious ability to cut out background noise while talking is just one of its many powers. It is this skill that Mitsubishi Electric has managed to emulate with "Maisart", its original artificial intelligence technology (AI). Inspired by the workings of the human brain, this innovative tool separates speech from the background noise – paving the way to a dizzying number of possibilities for the future of voice-activated devices.
Voice-activated devices are steadily making daily life more convenient. The ability to dictate and send text messages, or control music and even surf the web while driving or engaged in other activities saves time and effort. Current speech recognition technologies have a major drawback, However – voice filtering that is suboptimal. For example, if one is in a car and wants to ask a smartphone for directions, everyone else in the car must remain quiet – a tall order for long trips with kids in the back seat. That’s because typical microphones in voice-activated devices hear noises as one conglomeration of sound, including irrelevant sounds and voices in the background.
To address this problem, Mitsubishi Electric engineers created the world’s first technology that separates, then reconstructs, the speech of up to three people talking into a single microphone at the same time with a high degree of accuracy. In a demonstration with two speakers talking at the same time in quiet conditions, speech recognition accuracy was more than 90% compared to a 51% accuracy rate with conventional technologies. Its technology has a clear competitive edge thanks to its accuracy and the fact that it only requires one microphone, unlike voice recognition devices on the market today, which require multiple microphones.
"Cocktail Party Effect"
When developing the technology, Mitsubishi Electric’s researchers took a cue from the human auditory system, which possesses an extraordinary ability to focus on a single conversation in a noisy throng and filter out other stimuli – a phenomenon known as the "cocktail party effect."
The speech-separation technology uses a method exclusive to Mitsubishi Electric called "Deep Clustering." This method first clusters voices together using deep learning, a common AI technique. Deep Clustering then separates mixed voices by identifying their unique qualities, including different gender and spoken language combinations. Finally, each person’s speech is reconstructed by resynthesizing the previously separated speech components.
While its speech-separation technology is yet to be commercialized, Mitsubishi Electric believes its potential to make lives more convenient is enormous. It could make everyday voice-activated devices more precise – no more asking the kids to keep it down when telling a smart speaker to stream a movie. And it could make transcriptions of business meetings a piece of cake, especially the heated ones with multiple people all talking at once. For those with hearing issues, the technology could help them stay on top of fast-moving conversations.
That’s something to look forward to – voice recognition technologies that make sense of people’s busy, and sometimes boisterous, lives.
The content is true and accurate as of the time of publication.Information related to products and services included in this article may differ by country or region.