Baidu Tech Blog

Tech blog for Baidu Research

Neural Voice Cloning with a Few Samples

2018-02-21T11:18:51+00:00 February 20th, 2018|

At Baidu Research, we aim to revolutionize human-machine interfaces with the latest artificial intelligence techniques. Our Deep Voice project was started a year ago , which focuses on teaching machines to generate speech from text that sound more human-like. Beyond single-speaker speech synthesis, we demonstrated that a single system could learn to reproduce thousands of [...]

Baidu Research Showcased at Top Artificial Intelligence Conferences

2018-02-01T02:09:04+00:00 January 31st, 2018|

The AAAI (Association for the Advancement of Artificial Intelligence) is one of the world’s premiere artificial conferences, with annual summits since 1979. This year, Baidu Research will be presenting four papers discussing research on how artificial intelligence is affecting a variety of fields, from language translation to job-skill popularity. AAAI 2018 will take place in [...]

Baidu Research Announces the Hiring of Three World-Renowned AI Scientists

2018-01-18T16:53:02+00:00 January 18th, 2018|

Today, we are excited to announce the hiring of three world-renowned artificial intelligence scientists, Dr. Kenneth Church, Dr. Jun Huan and Dr. Hui Xiong, and the establishment of two additional AI labs, the Business Intelligence Lab and the Robotics and Autonomous Driving Lab, as part of Baidu’s push to strengthen fundamental AI research and development. [...]

PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

2018-01-17T16:20:52+00:00 December 12th, 2017|

Two open source communities—PaddlePaddle, the deep learning framework originated in Baidu, and Kubernetes®, the most famous containerized application scheduler—are announcing the Elastic Deep Learning (EDL) feature in PaddlePaddle’s new release codenamed Fluid. Fluid EDL includes a Kubernetes controller, PaddlePaddle auto-scaler, which changes the number of processes of distributed jobs according to the idle hardware resource in the [...]

Deep Learning Scaling is Predictable, Empirically

2018-01-17T16:20:52+00:00 December 7th, 2017|

Our digital world and data are growing faster today than any time in the past---even faster than our computing power. Deep learning helps us quickly make sense of immense data, and offers users the best AI-powered products and experiences. To continually improve user experience, our challenge, then, is to quickly improve our deep learning models for [...]

Deep Speech 3: Even more end-to-end speech recognition

2018-01-17T16:20:52+00:00 October 31st, 2017|

Accurate speech recognition systems are vital to many businesses, whether they are a virtual assistant taking commands, video reviews that understand user feedback, or improve customer service. However, today’s world-class speech recognition systems can only function with user data from third party providers or by recruiting graduates from the world’s top speech and language technology [...]

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

2018-01-17T16:20:52+00:00 October 24th, 2017|

Today, we are excited to announce Deep Voice 3, the latest milestone of Baidu Research’s Deep Voice project. Deep Voice 3 teaches machines to speak by imitating thousands of human voices from people across the globe. The Deep Voice project was started to revolutionize human-technology interactions by applying modern deep learning techniques to artificial speech [...]

Mixed Precision Training

2018-01-17T16:20:53+00:00 October 11th, 2017|

In this blog post, we introduce a new technique to train deep learning models titled, “Mixed Precision Training”. In this joint work with NVIDIA, we train deep learning models using IEEE half precision floating point numbers. Most deep learning models today are trained using 32 bit single precision floating point numbers (FP32). Through this technique, [...]

Globally Normalized Reader

2018-01-17T16:20:53+00:00 September 26th, 2017|

Code Read Paper Type Swaps dataset EMNLP 2017 Presentation (starting at 2:11:29) Jonathan Raiman John Miller We present the Globally Normalized Reader - an approach to extractive question answering with the same performance and significantly lower computational complexity than previous methods. Many popular models like Bidirectional Attention Flow \cite{seo2016bidirectional} use expensive attention mechanisms, and others [...]

A Spatial-Temporal Modeling Framework for Large-scale Video Understanding

2018-01-17T16:20:53+00:00 August 21st, 2017|

By Xiao Liu and Shilei Wen This blog discusses a novel approach to video recognition and classification that won Baidu first place at the ActivityNet Challenge this year.Artificial intelligence technologies are no longer limited to recognizing still, individual images as they can now also identify various activities in videos. Developing an automatic system for activity [...]