PaddleOCR, an Easy-to-Use and Open-Source OCR System, Rolls out Major Upgrade With Improved Accuracy and New Annotation Functions


Back to list



We are excited to announce that PaddlePaddle's multilingual optical character recognition (OCR) toolkits PaddleOCR have received a major upgrade. PaddleOCR is an easy-to-use and open-source OCR repository that provides ultra-lightweight OCR systems and over 80 types of multi-language recognition models.

New features and improvements of PaddleOCR are listed below:

OCR has become a critical technology enabler by converting printed images into searchable digital files. It has been widely used in various application scenarios, such as office automation (OA) systems, factory automation, online education, and map productions. We have developed and released PaddleOCR to make OCR more accessible in real-world applications. Main features include:


As of today, PaddleOCR has racked up over 21K stars on GitHub and continues to grow. We aim to benefit our developer community by providing multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practices.


Learn more about PaddleOCR on GitHub:




PP-OCR is an ultra-lightweight OCR system developed by the PaddleOCR team, which is aimed at OCR industry applications, weighing accuracy and speed. PP-OCRv3 is further upgraded on the basis of PP-OCRv2. There are nine optimization strategies for text detection and recognition models in PP-OCRv3.


With a comparable speed,



Specifically, the detection network is still optimized based on DBNet, and the base model of the recognition network is replaced from CRNN to SVTR, which has been accepted at IJCAI 2022. The block diagram of the PP-OCRv3 system is as follows (strategies in the pink box are newly introduced in PP-OCRv3).


There are nine optimization strategies for text detection and recognition models in PP-OCRv3, which are as follows.


Text detection:


The PP-OCRv3 detection model upgrades the CML (Collaborative Mutual Learning) distillation strategy proposed in PP-OCRv2. As shown in the figure below, the main idea of CML combines 1) the traditional distillation strategy of the Teacher model guiding the Student model and 2) the DML strategy, which allows the student network to learn from each other.


PP-OCRv3 further optimizes the effect of the teacher model and the student model, respectively. For the teacher model, a pan module with large receptive field named LK-PAN is proposed and the DML distillation strategy is adopted; for the student model, an FPN module with a residual attention mechanism named RSE-FPN is proposed.



Text recognition:


The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm SVTR. RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.


The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed is nearly 11 times slower. It takes nearly 100ms to predict a text line on CPUs. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.



More explanations can be found at the link 




PPOCRLabel is a semi-automatic graphic annotation tool suitable for the OCR field, with a built-in PP-OCR model to automatically detect and re-recognize images. The latest released PPOCRLabelv2 includes the following features:




E-book: Dive Into OCR


"Dive Into OCR" is a textbook that combines OCR theory and practice, written by the PaddleOCR community. The main features are as follows:


Overview of PaddleOCR Features


PaddleOCR supports a variety of cutting-edge algorithms related to OCR, develops industrial featured models/solutions PP-OCR and PP-Structure on this basis, and provides the whole process of data production, model training, compression, inference and deployment.