• Research & Development


TOKYO, February 13, 2019 - Mitsubishi Electric Corporation (TOKYO: 6503) announced today that it has developed what the company believes to be the world's first technology capable of highly accurate multilingual speech recognition without being informed which language is being spoken. The novel technology, Seamless Speech Recognition, incorporates Mitsubishi Electric's proprietary Maisart®* compact AI technology and is built on a single system that can simultaneously identify and understand spoken languages. In tests separately involving 5 and 10 languages, all conducted in low-noise environments, the system achieved recognition with over 90 percent and 80 percent accuracy, respectively, without being informed which language was being spoken. The technology also can understand multiple people speaking either the same or different languages simultaneously.

  1. * Mitsubishi Electric's AI creates the State-of-the- ART in technologyMaisart
Seamless Speech Recognition technology

Seamless Speech Recognition technology

The Seamless Speech Recognition technology uses Mitsubishi Electric's proprietary deep-learning method for unprecedented flexibility and accuracy. Adopting an end-to-end deep-learning framework where a deep network is trained using only input and output samples, the technology builds a single system that simultaneously identifies and understands spoken languages without having to rely on expert knowledge such as phoneme systems and pronunciation lexicons. Simultaneous learning using multilingual speech data increases its robustness.

The new system uses Mitsubishi Electric's proprietary Hybrid CTC/Attention Method for end-to-end speech recognition, which significantly improves the accuracy of the speech recognition process. The method is built on two representative methods for end-to-end speech recognition-connectionist temporal classification (CTC) and attention-based decoding-combining their advantages while alleviating their drawbacks. In particular, the hybrid method benefits from CTC's capability to predict accurate alignments between input speech signals and output characters, and the attention method's capability to consider interdependences across time of the acoustic and language characteristics of speech.

Speech Recognition Accuracy

  Works without spoken language being specified 5 languages 10 languages
New technology Yes >90% >80%
Conventional technology** No 87% 72%
  1. Note: Assumes ideal recording conditions
  2. ** Combination of multiple systems built and trained separately for each language, with manual selection in advance of the spoken language

Speech recognition technology has made it possible to operate devices such as smart phones and car navigation systems by voice. But since conventional speech recognition systems are developed separately for each language, users have to select the language they want to speak beforehand. It is possible to use a language identification method prior to the speech recognition, but this results in a degradation of the usability due to the delay needed for language identification, and an increase of the recognition errors due to language identification errors and sub-optimal speech recognition systems trained with insufficient monolingual data. The accuracy of conventional speech recognition systems also greatly suffers when dealing with overlapped speech by multiple speakers, limiting their applicability.

Mitsubishi Electric's Seamless Speech Recognition technology is expected to help realize speech interfaces that are highly suited to a wide variety of situations, such as a multilingual family using the same home appliance or international travelers querying an airport terminal's guidance system in their mother tongues.Going forward, Mitsubishi Electric will work to further improve the accuracy and applicability of automatic speech recognition in real environments, including cars, homes, public facilities and more.


Media contact

Customer Inquiries

Related articles

  • Research & Development