2022-05-17
Back to list
We are excited to announce that PaddlePaddle's multilingual optical character recognition (OCR) toolkits PaddleOCR have received a major upgrade. PaddleOCR is an easy-to-use and open-source OCR repository that provides ultra-lightweight OCR systems and over 80 types of multi-language recognition models.
New features and improvements of PaddleOCR are listed below:
PP-OCRv3 with 5~11% improved accuracy on English and multilingual scenarios;
PPOCRLabelv2 with new annotation functions for tables, irregular text images, and key information extraction tasks;
A new interactive e-book, "Dive into OCR."
OCR has become a critical technology enabler by converting printed images into searchable digital files. It has been widely used in various application scenarios, such as office automation (OA) systems, factory automation, online education, and map productions. We have developed and released PaddleOCR to make OCR more accessible in real-world applications. Main features include:
Ultra-lightweight OCR system: detection (3.6M) + direction classifier (1.4M) + recognition (12M) = 17.0M
Support more than 80 kinds of multi-language recognition models, including English, Chinese, French, German, Arabic, Korean, Japanese and so on
Semi-automatic data annotation tool PPOCRLabel: support rectangular boxs, irregular texts, table and key information annotation modes
Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image
Support PIP installation, easy to use
Support Linux, Windows, MacOS and other systems
Apache-2.0 license
As of today, PaddleOCR has racked up over 21K stars on GitHub and continues to grow. We aim to benefit our developer community by providing multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practices.
Learn more about PaddleOCR on GitHub: https://github.com/PaddlePaddle/PaddleOCR.
PP-OCRv3
PP-OCR is an ultra-lightweight OCR system developed by the PaddleOCR team, which is aimed at OCR industry applications, weighing accuracy and speed. PP-OCRv3 is further upgraded on the basis of PP-OCRv2. There are nine optimization strategies for text detection and recognition models in PP-OCRv3.
With a comparable speed,
The precision of English models is improved by 11%,
The precision of Chinese models is further improved by 5% compared with PP-OCRv2,
The average recognition accuracy of 80 multilingual models is improved by more than 5%.
Specifically, the detection network is still optimized based on DBNet, and the base model of the recognition network is replaced from CRNN to SVTR, which has been accepted at IJCAI 2022. The block diagram of the PP-OCRv3 system is as follows (strategies in the pink box are newly introduced in PP-OCRv3).
There are nine optimization strategies for text detection and recognition models in PP-OCRv3, which are as follows.
Text detection:
LK-PAN: A PAN structure with a large receptive field;
DML: Deep Mutual Learning strategy for teacher model;
RSE-FPN: An FPN structure with residual attention mechanism;
The PP-OCRv3 detection model upgrades the CML (Collaborative Mutual Learning) distillation strategy proposed in PP-OCRv2. As shown in the figure below, the main idea of CML combines 1) the traditional distillation strategy of the Teacher model guiding the Student model and 2) the DML strategy, which allows the student network to learn from each other.
PP-OCRv3 further optimizes the effect of the teacher model and the student model, respectively. For the teacher model, a pan module with large receptive field named LK-PAN is proposed and the DML distillation strategy is adopted; for the student model, an FPN module with a residual attention mechanism named RSE-FPN is proposed.
Text recognition:
SVTR_LCNet: A Light-weight text recognition network;
GTC: Guided training of CTC by Attention;
TextConAug: A data augmentation strategy for mining textual context information;
TextRotNet: Self-supervised strategy for a better pretrained model;
UDML: Unified deep mutual learning strategy;
UIM: Unlabeled data mining strategy.
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm SVTR. RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed is nearly 11 times slower. It takes nearly 100ms to predict a text line on CPUs. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
More explanations can be found at the link
PPOCRLabelv2
PPOCRLabel is a semi-automatic graphic annotation tool suitable for the OCR field, with a built-in PP-OCR model to automatically detect and re-recognize images. The latest released PPOCRLabelv2 includes the following features:
New annotation modes for tables, irregular text images (like seal, bend text), and key information extraction tasks;
New functions: box locking, batch processing, image rotation, dataset segmentation;
Improved user experience: support box rotation, install and start through WHL package.
E-book: Dive Into OCR
"Dive Into OCR" is a textbook that combines OCR theory and practice, written by the PaddleOCR community. The main features are as follows:
OCR full-stack technology covering text detection, recognition, and document analysis;
Closely integrate theory and practice, cross the code implementation gap, and support instructional videos;
Jupyter Notebook textbook, flexibly modifying code for instant results
Overview of PaddleOCR Features
PaddleOCR supports a variety of cutting-edge algorithms related to OCR, develops industrial featured models/solutions PP-OCR and PP-Structure on this basis, and provides the whole process of data production, model training, compression, inference and deployment.