Baidu Tech Blog

At the NIPS conference today in Montreal, SVAIL unveiled new results for Deep Speech. Results include the ability to accurately recognize both English and Mandarin with a single learning algorithm. 

The Deep Speech system, which was announced last year, initially focused on improving English speech recognition accuracy in noisy environments (for example, restaurants, cars and public transportation).

Over the past year, SVAIL researchers have improved Deep Speech’s performance in English and also trained it to transcribe Mandarin. The Mandarin version achieves high accuracy in many scenarios and is ready to be deployed on a large scale in real-world applications, such as web searches on mobile devices.

Andrew Ng, Chief Scientist at Baidu, said: “SVAIL has demonstrated that our end-to-end deep learning approach can be used to recognize very different languages. Key to our approach is our use of high-performance computing techniques, which resulted in a 7x speedup compared to last year at this time. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly.”

Commenting on Deep Speech’s high-performance computing architecture, Dr. Bill Dally, Chief Scientist, NVIDIA, added: “I am very impressed by the efficiency Deep Speech achieves by using batching to deploy DNNs for speech recognition on GPUs. Deep Speech also achieves remarkable throughput while training RNNs on clusters of 16 GPUs. “

Deep Speech has also made rapid improvement on a range of English accents, including Indian-accented English as well as accents from countries in Europe where English is not the first language.

“I had a glimpse of Deep Speech’s potential when I previewed it in its infancy last year,” said Dr. Ian Lane, Assistant Research Professor of Engineering, Carnegie Mellon University. “Today, after a relatively short time, Deep Speech has made significant progress. Using a single end-to-end system, it handles not only English but Mandarin, and is on its way to being released into production. I’m intrigued by Baidu’s Batch Dispatch process and its capacity to shape the way large deep neural networks are deployed on GPUs in the cloud.”

Relevant Links:

Deep Speech Paper on ArXiv.org

Deep Speech Press Release

A “Behind the Scenes at SVAIL” Video

2017-05-22T04:01:05+00:00 December 9th, 2015|