Deep Voice 2: Multi-Speaker Neural Text-to-Speech

2017-05-25T07:25:27+00:00 May 24th, 2017|

In February, Baidu Silicon Valley AI Lab published Deep Voice 1, a system for generating synthetic human voices entirely with deep neural networks. Unlike alternative neural text-to-speech (TTS) systems, Deep Voice 1 runs in real-time, synthesizing audio as fast as it needs to be played – making it usable for interactive applications like media and [...]

An AI agent with human-like language acquisition in a virtual environment

2017-05-22T04:01:04+00:00 March 29th, 2017|

Despite tremendous progress, artificial intelligence is still limited in many ways. For example, in computer games, if an AI agent is not pre-programmed with game rules, it must try millions of times before figuring out the right moves to win. Humans can accomplish the same feat in a much shorter time, because we are good [...]

Introducing SwiftScribe: A Breakthrough in AI-Powered Transcription Software

2017-05-22T04:01:04+00:00 March 13th, 2017|

Today we are proud to announce the beta launch of Baidu’s first AI-powered transcription software, SwiftScribe. We set out to develop SwiftScribe to fix a pain point – the time-consuming process of manually transcribing word-by-word. Now, through the integration of Baidu’s state of the art speech recognition technology and easy editing tools, SwiftScribe will allow people [...]

Gram CTC: Speech Recognition with Word Piece Targets

2017-05-22T04:01:04+00:00 March 2nd, 2017|

Deep Speech presented an end-to-end neural architecture using the CTC loss for speech recognition in multiple languages. Today, we present Gram CTC which extends the CTC loss function to automatically discover and predict word pieces instead of characters. Models using Gram CTC achieve state-of-the-art on the Fisher-Swbd benchmark with single model, demonstrating that end-to-end learning [...]

Deep Voice: Real-Time Neural Text-to-Speech for Production

2017-05-22T04:01:04+00:00 February 28th, 2017|

Baidu Research presents Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. The biggest obstacle to building such a system thus far has been the speed of audio synthesis – previous approaches have taken minutes or hours to generate only a few seconds of speech. We solve this challenge and show that [...]

Bringing HPC Techniques to Deep Learning

2017-05-22T04:01:04+00:00 February 21st, 2017|

Summary: Neural networks have grown in scale over the past several years, and training can require a massive amount of data and computational resources. To provide the required amount of compute power, we scale models to dozens of GPUs using a technique common in high-performance computing (HPC) but underused in deep learning. This technique, the [...]

PaddlePaddle and Kubernetes Join Forces, Helping Developers Efficiently Train Deep Learning Models

2017-05-22T04:01:04+00:00 February 7th, 2017|

Kubernetes community announced today that PaddlePaddle, the open source deep learning framework originally developed by Baidu, is now compatible with Kubernetes, the cluster management system, making PaddlePaddle the only deep learning framework that officially supports Kubernetes to date. The compatibility will allow developers to conveniently train large models on all major global cloud service providers [...]

Baidu’s Melody: AI-Powered Conversational Bot for Doctors and Patients

2017-05-22T04:01:04+00:00 October 18th, 2016|

Baidu has launched Melody, an AI-powered conversational bot designed to provide relevant information to doctors to assist with recommendations and treatment options. Melody incorporates advanced deep learning and natural language processing (NLP) technologies developed by Baidu. Melody integrates with Baidu Doctor, an app that Baidu launched in China in 2015. Andrew Ng, chief scientist, Baidu, said: [...]

SVAIL Tech Notes: Optimizing RNNs with Differentiable Graphs

2017-05-22T04:01:04+00:00 June 15th, 2016|

This week we posted a new Tech Note in which Jesse Engel discusses a new technique for speeding up the training of deep recurrent neural networks. This is Part II of a multi-part series detailing some of the techniques we've used here at Baidu's Silicon Valley AI Lab (SVAIL) to accelerate the training of recurrent neural networks. While Part [...]

Adam Coates Speaks to TechEmergence about Future of Speech Recognition

2017-05-22T04:01:04+00:00 May 6th, 2016|

Adam Coates sat down recently with Daniel Faggella from TechEmergence at our Sunnyvale office for an interview about AI, Speech Recognition and Natural Language Processing. During the interview, Coates, Director of Baidu Silicon Valley AI Lab, talked about Baidu's work in AI. He also shared his thoughts around consumer artificial intelligence applications in terms of its impact, [...]

