Baidu Research

PaddlePaddle Receives Full Upgrade to Facilitate Integrated Innovation and Lower Thresholds

2021-05-27

New upgrade encompasses PaddlePaddle v2.1, Large-Scale Graph Query Engine, 4 Pre-trained Models, PaddleFlow, and a 1.5B Initiative

haifeng wang.jpeg

We are excited to announce a major upgrade of Baidu’s deep learning platform PaddlePaddle, adding new features and enhanced capabilities including the PaddlePaddle open-source framework v2.1, a large-scale graph query engine, four pre-trained models, PaddleFlow, and an RMB 1.5 billion initiative.

The latest version of PaddlePaddle was unveiled at Wave Summit 2021, Baidu’s bi-annual deep learning developer conference jointly hosted with the National Engineering Laboratory for Deep Learning Technology and Applications, held on May 20, 2021.

Following its release in 2016, PaddlePaddle has grown into one of the most widely-used deep learning platforms in the world. PaddlePaddle’s developer userbase has grown 70% year over year to 3.2 million, while also accumulating an enterprise userbase of 120,000. Owing to these accomplishments, Baidu is home to the largest AI developer community among Chinese companies.

At Wave Summit 2021, Baidu CTO Haifeng Wang outlined the current top AI trends under two themes: integrated innovation and lower thresholds.

l Deep learning with knowledge graphs has significantly improved the performance and interpretability of models under same-level parameters. Meanwhile, multi-modal semantic understanding across language, speech, and vision is achievable through knowledge graphs and natural language semantics.

l Deep learning platforms are coordinating closely with various hardware and software to meet their diverse collective needs - computing power, power consumption, and latency.

l From an industrial point of view, AI is becoming deeply integrated with the industry.

l With the permeation of AI across various industries, it is critical for platforms to keep lowering their threshold to accelerate intelligent transformation.

PaddlePaddle v2.1

We have made usability and performance improvements to the new PaddlePaddle v2.1 framework, including:

l Optimization of Automatic Mixed Precision: Optimized computational performance of operators (Ops) in mixed precision, allowing the training speed of models like ResNet50 and BERT to triple as a result.

l Inplace Operation: Added inplace operation including 12 inplace APIs to reduce memory usage by 17% and improve the overhead of calling C++ from Python with a 10% uptick in training speed.

l High-level APIs: Added high-level APIs to support data pre-processing, GPU-based computation, mixed precision training, and model sharing.

l Custom operators: Provided a new custom operators solution to simplify its creation and deployment.

Large-scale graph query engine

The adoption of large-scale graph learning has surged in recent years due to its wide-ranging applications across knowledge graphs, search engines, and recommendation systems. One significant new capability of PaddlePaddle is the Large-Scale Graph Query Engine, which can support distributed graph data storage and query processing on a trillion-edge graph and linear extension.

NetEase Cloud Music, a leading freemium music streaming service in China, has applied our technology to its “host recommendation system”. Using our graph query engine and distributed training, NetEase Cloud Music is able to train its 10-trillion-edge graph model, which further improves the effective play rate of recommended hosts.

Four pre-trained NLP models

In 2019, Baidu researchers developed and open-sourced ERNIE, a continual pre-trained framework which incrementally builds and learns pre-trained tasks through constant multi-task learning. Since then, our researchers have developed multiple variants of ERNIE for language generation and multimodal understanding tasks.

PaddlePaddle has open sourced four of Baidu’s home-grown pre-trained models, including

l ERNIE-Gram is an explicit n-gram masking language model to enhance the integration of coarse-grained information into pre-training. ERNIE-Gram significantly outperformed other pre-trained models in five Chinese NLP tasks.

l ERNIE-Doc is a document-level language pre-trained model with the retrospective feed mechanism and enhanced recurrence mechanism to capture the contextual information of a complete document. ERNIE-Doc achieved state-of-the-art results in 13 long document understanding tasks.

l ERNIE-ViL is a model that incorporates structured knowledge obtained from scene graphs to learn joint representations of vision-language. ERNIE-ViL achieved top results on five cross-modal downstream tasks.

l ERNIE-UNIMO is a unified-modal pre-trained architecture which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. ERNIE-UNIMO greatly improved the performance of 13 single-modal and multi-modal downstream tasks.

Enhanced inference toolkits

PaddlePaddle has issued a series of updates in inference toolkits, aiming to help developers better deploy AI models, including:

l PaddleSlim: Optimized pruning compression technology; added tools for unstructured sparsity; will support OFA (Once for All) to maintain the accuracy of the compressed model.

l Paddle Lite: Released LiteKit, a tookit for mobile developers to significantly reduce the development costs of edge AI.

l Paddle Serving: Added a new asynchronous mode called “Pipeline” to better address the challenge of model combination in businesses.

l Paddle.js: Will support multiple backend engines like WebGL, WebGPU and major image segmentation/classification models; added WebGL Pack to improve performance.

PaddlePaddle has also released a deployment tool chain and an inference deployment navigation map, which have verified more than 300 deployment paths.

Hardware

There are 31 categories of processors from semiconductor manufacturers that are already compatible with or are in the process of being compatible with PaddlePaddle.

Other updates

Baidu’s biocomputing framework PaddleHelix 1.0 added ChemRL, a model which ranked first on a well-recognized benchmark leaderboard for molecular property predictions (HIV and PCBA).

Baidu's quantum machine learning toolkit Paddle Quantum achieved an average 20% speedup after synchronizing with PaddlePaddle v2.X, added noisy models for near-term quantum algorithm research, and provided advanced quantum feature extraction methods such as quantum kernel methods.

PaddlePaddle Enterprise: PaddleFlow

AI is leaving laboratories and driving industrial transformation across a variety of sectors. To make it easier for enterprises to utilize deep learning for real world problems, we debuted PaddlePaddle Enterprise last year at Wave Summit 2020.

PaddlePaddle Enterprise consists of EasyDL, a no-code toolkit for AI application developers without programming skills to build customized models with a form interface, as well as Baidu ML (BML), a full-featured AI development platform for AI algorithm developers that offers a one-stop solution for data processing, model training and evaluation, and service deployment.

This time, we are excited to announce the addition of PaddleFlow to our PaddlePaddle Enterprise portfolio as the core engine connecting to EasyDL and BML. PaddleFlow is a cloud-native machine learning system providing necessary functionalities for developers to build AI platforms, including resources management and scheduling, task execution, and service deployment via developer-friendly interfaces like rest API, commend-line client and SDK. Simple deployment and integration are its major selling point.

EasyDL and BML have received major upgrades. EasyDL has been updated with 200+ features ranging from data processing, training and evaluation, inference, and performance optimization. Meanwhile, BML has been supplemented with new development modes like Notebook modeling, visual drag-and-drop modeling, preset model development, and pipeline modeling.

1.5 billion initiative

We are also excited to announce the new “Age of Discovery” initiative. Over the next three years, we are investing RMB 500 million worth of capital and resources to support 500 academic institutions, educate 5,000 AI tutors, and jointly foster 500,000 students with AI expertise. An additional RMB 1 billion will be poured into 100,000 enterprises for intelligent transformation and millions of AI talent personnel training. PaddlePaddle researchers and developers will collaborate with the open source community to build a deep learning open source ecosystem and break the boundaries of AI technology.