Baidu Tech Blog

Tech blog for Baidu Research

Deep Speech 3: Even more end-to-end speech recognition

2017-11-01T17:02:11+00:00 October 31st, 2017|

Accurate speech recognition systems are vital to many businesses, whether they are a virtual assistant taking commands, video reviews that understand user feedback, or improve customer service. However, today’s world-class speech recognition systems can only function with user data from third party providers or by recruiting graduates from the world’s top speech and language technology [...]

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

2017-10-24T15:49:14+00:00 October 24th, 2017|

Today, we are excited to announce Deep Voice 3, the latest milestone of Baidu Research’s Deep Voice project. Deep Voice 3 teaches machines to speak by imitating thousands of human voices from people across the globe. The Deep Voice project was started to revolutionize human-technology interactions by applying modern deep learning techniques to artificial speech [...]

Mixed Precision Training

2017-10-11T11:24:22+00:00 October 11th, 2017|

In this blog post, we introduce a new technique to train deep learning models titled, “Mixed Precision Training”. In this joint work with NVIDIA, we train deep learning models using IEEE half precision floating point numbers. Most deep learning models today are trained using 32 bit single precision floating point numbers (FP32). Through this technique, [...]

Globally Normalized Reader

2017-09-25T21:50:36+00:00 September 26th, 2017|

Code Read Paper Type Swaps dataset EMNLP 2017 Presentation (starting at 2:11:29) Jonathan Raiman John Miller We present the Globally Normalized Reader - an approach to extractive question answering with the same performance and significantly lower computational complexity than previous methods. Many popular models like Bidirectional Attention Flow \cite{seo2016bidirectional} use expensive attention mechanisms, and others [...]

A Spatial-Temporal Modeling Framework for Large-scale Video Understanding

2017-08-21T23:20:34+00:00 August 21st, 2017|

By Xiao Liu and Shilei Wen This blog discusses a novel approach to video recognition and classification that won Baidu first place at the ActivityNet Challenge this year.Artificial intelligence technologies are no longer limited to recognizing still, individual images as they can now also identify various activities in videos. Developing an automatic system for activity [...]

Baidu Research Announces Next Generation Open Source Deep Learning Benchmark Tool

2017-06-28T01:41:47+00:00 June 28th, 2017|

Baidu Research today unveiled the next generation of DeepBench, the open source deep learning benchmark that now includes measurement for inference. The announcement was made at the O’Reilly AI Conference in New York. In September of 2016, Baidu released the initial version of DeepBench, which became the first tool to be opened up to the [...]

Learning to Speak via Interaction

2017-06-07T14:54:39+00:00 June 7th, 2017|

In early April, our team at Baidu Research successfully taught an AI agent to navigate a virtual maze using natural language command issued by a virtual teacher. Today, we are excited to announce that our AI agent successfully learned to speak by interacting with a virtual teacher. Speaking, along with other abilities of human beings, [...]

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

2017-05-25T07:25:27+00:00 May 24th, 2017|

In February, Baidu Silicon Valley AI Lab published Deep Voice 1, a system for generating synthetic human voices entirely with deep neural networks. Unlike alternative neural text-to-speech (TTS) systems, Deep Voice 1 runs in real-time, synthesizing audio as fast as it needs to be played – making it usable for interactive applications like media and [...]

An AI agent with human-like language acquisition in a virtual environment

2017-05-22T04:01:04+00:00 March 29th, 2017|

Despite tremendous progress, artificial intelligence is still limited in many ways. For example, in computer games, if an AI agent is not pre-programmed with game rules, it must try millions of times before figuring out the right moves to win. Humans can accomplish the same feat in a much shorter time, because we are good [...]

Introducing SwiftScribe: A Breakthrough in AI-Powered Transcription Software

2017-05-22T04:01:04+00:00 March 13th, 2017|

Today we are proud to announce the beta launch of Baidu’s first AI-powered transcription software, SwiftScribe. We set out to develop SwiftScribe to fix a pain point – the time-consuming process of manually transcribing word-by-word. Now, through the integration of Baidu’s state of the art speech recognition technology and easy editing tools, SwiftScribe will allow people [...]