Baidu Tech Blog

Tech blog for Baidu Research

A Spatial-Temporal Modeling Framework for Large-scale Video Understanding

2017-08-21T23:20:34+00:00 August 21st, 2017|

By Xiao Liu and Shilei Wen This blog discusses a novel approach to video recognition and classification that won Baidu first place at the ActivityNet Challenge this year.Artificial intelligence technologies are no longer limited to recognizing still, individual images as they can now also identify various activities in videos. Developing an automatic system for activity [...]

Baidu Research Announces Next Generation Open Source Deep Learning Benchmark Tool

2017-06-28T01:41:47+00:00 June 28th, 2017|

Baidu Research today unveiled the next generation of DeepBench, the open source deep learning benchmark that now includes measurement for inference. The announcement was made at the O’Reilly AI Conference in New York. In September of 2016, Baidu released the initial version of DeepBench, which became the first tool to be opened up to the [...]

Learning to Speak via Interaction

2017-06-07T14:54:39+00:00 June 7th, 2017|

In early April, our team at Baidu Research successfully taught an AI agent to navigate a virtual maze using natural language command issued by a virtual teacher. Today, we are excited to announce that our AI agent successfully learned to speak by interacting with a virtual teacher. Speaking, along with other abilities of human beings, [...]

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

2017-05-25T07:25:27+00:00 May 24th, 2017|

In February, Baidu Silicon Valley AI Lab published Deep Voice 1, a system for generating synthetic human voices entirely with deep neural networks. Unlike alternative neural text-to-speech (TTS) systems, Deep Voice 1 runs in real-time, synthesizing audio as fast as it needs to be played – making it usable for interactive applications like media and [...]

An AI agent with human-like language acquisition in a virtual environment

2017-05-22T04:01:04+00:00 March 29th, 2017|

Despite tremendous progress, artificial intelligence is still limited in many ways. For example, in computer games, if an AI agent is not pre-programmed with game rules, it must try millions of times before figuring out the right moves to win. Humans can accomplish the same feat in a much shorter time, because we are good [...]

Introducing SwiftScribe: A Breakthrough in AI-Powered Transcription Software

2017-05-22T04:01:04+00:00 March 13th, 2017|

Today we are proud to announce the beta launch of Baidu’s first AI-powered transcription software, SwiftScribe. We set out to develop SwiftScribe to fix a pain point – the time-consuming process of manually transcribing word-by-word. Now, through the integration of Baidu’s state of the art speech recognition technology and easy editing tools, SwiftScribe will allow people [...]

Gram CTC: Speech Recognition with Word Piece Targets

2017-05-22T04:01:04+00:00 March 2nd, 2017|

Deep Speech presented an end-to-end neural architecture using the CTC loss for speech recognition in multiple languages. Today, we present Gram CTC which extends the CTC loss function to automatically discover and predict word pieces instead of characters. Models using Gram CTC achieve state-of-the-art on the Fisher-Swbd benchmark with single model, demonstrating that end-to-end learning [...]

Deep Voice: Real-Time Neural Text-to-Speech for Production

2017-05-22T04:01:04+00:00 February 28th, 2017|

Baidu Research presents Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. The biggest obstacle to building such a system thus far has been the speed of audio synthesis – previous approaches have taken minutes or hours to generate only a few seconds of speech. We solve this challenge and show that [...]

Bringing HPC Techniques to Deep Learning

2017-05-22T04:01:04+00:00 February 21st, 2017|

Summary: Neural networks have grown in scale over the past several years, and training can require a massive amount of data and computational resources. To provide the required amount of compute power, we scale models to dozens of GPUs using a technique common in high-performance computing (HPC) but underused in deep learning. This technique, the [...]

PaddlePaddle and Kubernetes Join Forces, Helping Developers Efficiently Train Deep Learning Models

2017-05-22T04:01:04+00:00 February 7th, 2017|

Kubernetes community announced today that PaddlePaddle, the open source deep learning framework originally developed by Baidu, is now compatible with Kubernetes, the cluster management system, making PaddlePaddle the only deep learning framework that officially supports Kubernetes to date. The compatibility will allow developers to conveniently train large models on all major global cloud service providers [...]