Blog
PaddleOCR, an Easy-to-Use and Open-Source OCR System, Rolls out Major Upgrade With Improved Accuracy and New Annotation Functions

2022-05-17

Back to list


图片1.png

 

We are excited to announce that PaddlePaddle's multilingual optical character recognition (OCR) toolkits PaddleOCR have received a major upgrade. PaddleOCR is an easy-to-use and open-source OCR repository that provides ultra-lightweight OCR systems and over 80 types of multi-language recognition models.


New features and improvements of PaddleOCR are listed below:


OCR has become a critical technology enabler by converting printed images into searchable digital files. It has been widely used in various application scenarios, such as office automation (OA) systems, factory automation, online education, and map productions. We have developed and released PaddleOCR to make OCR more accessible in real-world applications. Main features include:

 

As of today, PaddleOCR has racked up over 21K stars on GitHub and continues to grow. We aim to benefit our developer community by providing multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practices.

 

Learn more about PaddleOCR on GitHub: https://github.com/PaddlePaddle/PaddleOCR.

 

PP-OCRv3

 

PP-OCR is an ultra-lightweight OCR system developed by the PaddleOCR team, which is aimed at OCR industry applications, weighing accuracy and speed. PP-OCRv3 is further upgraded on the basis of PP-OCRv2. There are nine optimization strategies for text detection and recognition models in PP-OCRv3.

 

With a comparable speed,


 图片2.png

 

Specifically, the detection network is still optimized based on DBNet, and the base model of the recognition network is replaced from CRNN to SVTR, which has been accepted at IJCAI 2022. The block diagram of the PP-OCRv3 system is as follows (strategies in the pink box are newly introduced in PP-OCRv3).


图片3.png 


There are nine optimization strategies for text detection and recognition models in PP-OCRv3, which are as follows.

 

Text detection:

 

The PP-OCRv3 detection model upgrades the CML (Collaborative Mutual Learning) distillation strategy proposed in PP-OCRv2. As shown in the figure below, the main idea of CML combines 1) the traditional distillation strategy of the Teacher model guiding the Student model and 2) the DML strategy, which allows the student network to learn from each other.

 

PP-OCRv3 further optimizes the effect of the teacher model and the student model, respectively. For the teacher model, a pan module with large receptive field named LK-PAN is proposed and the DML distillation strategy is adopted; for the student model, an FPN module with a residual attention mechanism named RSE-FPN is proposed.


图片4.png 

 

Text recognition:

 

The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm SVTR. RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.

 

The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed is nearly 11 times slower. It takes nearly 100ms to predict a text line on CPUs. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.


图片5.png 

 

More explanations can be found at the link 

 

PPOCRLabelv2

 

PPOCRLabel is a semi-automatic graphic annotation tool suitable for the OCR field, with a built-in PP-OCR model to automatically detect and re-recognize images. The latest released PPOCRLabelv2 includes the following features:

 


1652690403(1).jpg

 

E-book: Dive Into OCR

 

"Dive Into OCR" is a textbook that combines OCR theory and practice, written by the PaddleOCR community. The main features are as follows:



1652690456(1).jpg

Overview of PaddleOCR Features

 

PaddleOCR supports a variety of cutting-edge algorithms related to OCR, develops industrial featured models/solutions PP-OCR and PP-Structure on this basis, and provides the whole process of data production, model training, compression, inference and deployment. 


图片6.png