current position：Home>20000 + star ultra lightweight OCR system pp-ocrv3 effect increased by 5% - 11%!
20000 + star ultra lightweight OCR system pp-ocrv3 effect increased by 5% - 11%!
2022-05-15 07:42:34【Paddle paddle】
OCR Direction Engineer , I must have heard of it before PaddleOCR This project .
Project accumulation Star The number has exceeded 20000+;
Board frequently GitHub Trending and Paperswithcode Top of the list every month ;
stay Medium And Papers withCode Jointly selected 《Top Trending Libraries of 2021》, Stand out from millions of projects , tops Top10！
stay 《2021 China open source annual report 》 Rated as active in Top5！
PaddleOCR Influence dynamic diagram
PP-OCRv3 Effect diagram 
This time PaddleOCR Latest release , bring Four heavyweight upgrades , Include ：
Release ultra lightweight OCR System PP-OCRv3： Chinese and English 、 The accuracy of pure English and multilingual scenes is further improved 5% - 11%！
Publish semi-automatic annotation tools PPOCRLabelv2： Add table text image 、 Image key information extraction task and irregular text image annotation function .
Release OCR Industrial landing tool set ： Get through 22 A training deployment software and hardware environment and mode , Covering the enterprise 90% Training deployment environment requirements .
Release the industry's first interactive OCR Open source ebooks 《 Do it yourself OCR》： Cover OCR Cutting edge theory and code practice of full stack technology , And supporting teaching video .
// PaddleOCR The latest release portal //
Click the end of the text to read the original text GET！
Let's explain the above upgrades in turn .
PP-OCRv3 Interpretation of optimization strategy
PP-OCR yes PaddleOCR Ultra light weight developed by the team OCR System , oriented OCR Industrial application , Weigh accuracy against speed . In the near future ,PaddleOCR Team directed PP-OCRv2 Detection module and identification module , In total 9 Upgrade in all aspects , Create a new 、 Ultra light weight with better effect OCR System ：PP-OCRv3.
In terms of effect , When the speed is comparable , The accuracy of various scenes has been greatly improved ：
Chinese scene , comparison PP-OCRv2, Chinese model upgrade super 5%;
English digital scene , comparison PP-OCRv2, English digital model improvement 11%;
Multilingual scenes , Optimize 80+ Language recognition effect , Average accuracy improved by more than 5%.
New and upgraded PP-OCRv3 The overall framework of （ In the pink box is PP-OCRv3 New strategy ） Here's the picture . The detection module is still based on DB Algorithm optimization ; The identification module no longer adopts CRNN, Updated to IJCAI 2022 The latest text recognition algorithm SVTR ( Title of thesis ：SVTR: Scene Text Recognition with a SingleVisual Model), And carry out industrial adaptation .
The specific optimization strategy includes the following points ：
1. Detection module
LK-PAN： Feel wild PAN structure
DML： Teacher model mutual learning strategies
RSE-FPN： Residual attention mechanism FPN structure
2. Identification module
SVTR_LCNet： Lightweight text recognition network
GTC：Attention To guide the CTC Training strategy
TextConAug： Data augmentation strategy for mining text context information
TextRotNet： Self supervised pre training model
UDML： Joint mutual learning strategies
UIM： Unlabeled data mining scheme
See the end of the article for the specific interpretation of the optimization strategy .
PPOCRLabelv2 Multiple heavy updates
PPOCRLabel yes The first open source OCR Semi automatic data annotation tool , Significantly reduce developer tagging OCR Time of data .2021 year , Project acquisition Wave Summit 2021 Excellent open source project award 、 Qizhi community excellent project award . After a year of updating and iteration ,PPOCRLabel Combined with the actual landing demand of the industry , Official release PPOCRLabelv2, The update is as follows ：
New annotation type ： Table labels 、 Key information marking 、 Annotation of irregular text image （ The seal 、 Curved text, etc ）
New function ： Lock box 、 Image rotation 、 Data set partitioning 、 Batch processing, etc
Ease of use ： newly added whl Package installation 、 And optimize multiple annotation experience
The table is marked with dynamic diagram 、KIE Marked dynamic diagram 、 Annotation moving picture of irregular text image 、 Image rotation 、 The batch 、 revoke （ Lateral pull ）
OCR Industrial landing tool set
Considering the various software and hardware environments and different scenario requirements faced by real industrial applications , Based on the integration of propeller training and propulsion, the function is complete , This upgrade release OCR Industrial landing tool set , Get through 22 A training deployment software and hardware environment and mode , Include 3 Training methods 、6 Training environment 、3 A model compression strategy 、 and 10 Two reasoning deployment methods , As shown in the following table ：
The characteristic abilities are as follows ：
1. Distributed training
The propeller distributed training architecture has 4D Hybrid parallel 、 Many characteristic technologies such as end-to-end adaptive distributed training . stay PP-OCRv3 Recognition model training ,4 The acceleration ratio of the engine reaches 3.52 times , Accuracy is almost lossless .
2. The model of compression
Propeller model compression tool PaddleSlim Fully functional , Overlay model clipping 、 quantitative 、 Distillation and NAS.PP-OCR After the model is cut and quantified , The model size ranges from 8.1M Compression - 3.5M, The average prediction time of mobile terminal is reduced 36%.
3. Service deployment
Propeller service deployment engine PaddleServing, Provide superior performance 、 A model set with reliable functions and service capabilities . in the light of PP-OCR Service deployment of the model , Adopt fully asynchronous Pipeline Serving, The throughput can be improved 2 More than times .
4. Mobile / Edge deployment
Propeller lightweight reasoning engine Paddle Lite Adapted 20+ AI To speed up the chip , It can be realized quickly OCR Models on mobile devices 、 Embedded devices and IOT Equipment and other efficient equipment deployment .
5. Flying oars on the clouds
Deployment toolbox for propeller framework and its model suite , Support Docker Standardized deployment and deployment Kubernetes There are two ways of cluster deployment , Meet different scenarios and environments OCR Training deployment requirements of the model .
《 Do it yourself OCR》 e-book
《 Do it yourself OCR》 yes PaddleOCR The team Join hands with Huazhong University of science and Technology / professor ,IAPR Fellow Bai Xiang 、 Chen Zhizhi, a young researcher at Fudan University 、 Huang Wenhui, senior expert in vision field of China Mobile Research Institute 、 Researcher of big data Artificial Intelligence Laboratory of industrial and Commercial Bank of China and other industry university research colleagues , as well as OCR developer A combination created together OCR Teaching materials for cutting-edge theory and code practice . The main features are as follows ：
Cover from text detection and recognition to document analysis OCR Full stack technology
Closely combined with theory and practice , Cross the code implementation gap , And supporting teaching video
Notebook Interactive learning , Flexible code modification , Get immediate results
Join in PaddleOCR
Technology exchange group
Group entry benefits
1. obtain PaddleOCR Links to live classes that explain the content of this upgrade in detail .
2. obtain PaddleOCR Organized by the team 10G blockbuster OCR Learning gift bag , Include ：
《 Do it yourself OCR》 e-book , Supporting explanation videos and notebook project
66 piece OCR Relevant top conference frontier papers will be packaged and released , Include CVPR、AAAI、IJCAI、ICCV etc.
PaddleOCR Video of live classes distributed in previous editions
OCR Community excellent developer project sharing video
Preview package content
SETP1： Wechat scan QR code , Fill in the questionnaire
SETP2： Join the exchange group to receive benefits
If you think it's good , Suggest visiting GitHub
Order one star Focus on collection ~
Official website address ：https://www.paddlepaddle.org.cn
PaddleOCR Project address ：
notes ： The test sample pictures are from the network
PP-OCRv3 Specific interpretation of optimization strategy
1. Detection module optimization strategy
PP-OCRv3 The detection module is right PP-OCRv2 Medium CML（CollaborativeMutual Learning) The text detection strategy of collaborative mutual learning has been upgraded . As shown in the figure below ,CML The core idea of combines the traditional Teacher To guide the Student Standard distillation and Students Between networks DML Mutual learning , It can make Students While learning from each other on the Internet ,Teacher Network guidance .PP-OCRv3 Further effect optimization is carried out for teacher model and student model respectively . among , When optimizing the teacher model , Put forward the concept of large receptive field PAN structure LK-PAN And introduced DML（Deep MutualLearning） Distillation strategy ; When optimizing the student model , The mechanism of residual attention is proposed FPN structure RSE-FPN. Ablation experiments are shown in the table below .
Test environment ：Intel Gold 6148 CPU, When forecasting, turn on MKLDNN Speed up
（1）LK-PAN： Feel wild PAN structure
LK-PAN (Large Kernel PAN) It's a lightweight with a larger receptive field PAN structure , The core is to PAN Structural path augmentation The convolution kernel in 3*3 Change it to 9*9. By increasing the convolution kernel , Enhance the receptive field covered by each position of the feature map , Easier to detect large font text and text with extreme aspect ratio . Use LK-PAN structure , The teacher model can be hmean from 83.2% Upgrade to 85.0%.
（2）DML： Teacher model mutual learning strategies
DML Learn distillation methods from each other , Learn from each other through two models with the same structure , It can effectively improve the accuracy of text detection model . The teacher model adopts DML Strategy , hmean from 85% Upgrade to 86%. take PP-OCRv2 in CML The teacher model is updated to the above higher precision teacher model , The student model is hmean We can go further from 83.2% Upgrade to 84.3%.
（3）RSE-FPN： Residual attention mechanism FPN structure
RSE-FPN（ResidualSqueeze-and-Excitation FPN） Residual structure and channel attention structure are introduced , take FPN The convolution layer in is replaced by a channel attention structure with residual structure RSEConv layer , Further improve the representation ability of feature map . Further will PP-OCRv2 in CML Student model of FPN Structure updated to RSE-FPN, The student model is hmean We can go further from 84.3% Upgrade to 85.4%.
2. Identify module optimization strategies
PP-OCRv3 The recognition module is based on text recognition algorithm SVTR Optimize .SVTR No longer use RNN structure , By introducing Transformers Structure can mine the context information of text line image more effectively , So as to improve the ability of text recognition . Direct will PP-OCRv2 The recognition model of , Replace with SVTR_Tiny, Recognition accuracy from 74.8% Upgrade to 80.1%（+5.3%）, But the prediction speed is nearly 11 times ,CPU Predict the last line of text , nearly 100ms. therefore , As shown in the figure below ,PP-OCRv3 Use the following 6 An optimization strategy is used to identify the model acceleration , Ablation experiments are shown in the table below ：
notes ： When testing speed , experiment 01-03 The input picture size is (3,32,320),04-08 The input picture size is (3,48,320). In the actual prediction , The image is a variable length input , The speed will change . Test environment ：Intel Gold 6148 CPU, When forecasting, turn on MKLDNN Speed up .
（1）SVTR_LCNet： Lightweight text recognition network
SVTR_LCNet For text recognition tasks , take Transformer Network and lightweight CNN The Internet PP-LCNet A lightweight text recognition network based on fusion . Use this network , And normalize the input image from 32 Upgrade to 48, When the predicted speed is comparable , Recognition accuracy reaches 73.98%, near PP-OCRv2 The recognition model effect of distillation strategy .
（2）GTC：Attention To guide the CTC Training strategy
GTC（Guided Training of CTC）, utilize Attention Module and loss , To guide the CTC Loss training , Expression that integrates multiple text features , It is an effective strategy to improve text recognition . Use this strategy , The accuracy of the recognition model is further improved to 75.8%（+1.82%）.
（3）TextConAug： Data augmentation strategy for mining text context information
TextConAug It is a data augmentation strategy for mining text context information , It can enrich the context information of training data , Improve the diversity of training data . Use this strategy , The accuracy of the recognition model is further improved to 76.3%（+0.5%）.
（4）TextRotNet： Self supervised pre training model
TextRotNet Is to use a large number of unmarked text line data , A pre training model trained by self-monitoring . The model can be initialized SVTR_LCNet Initial weight of , So as to help the text recognition model converge to a better position . Use this strategy , The accuracy of the recognition model is further improved to 76.9%（+0.6%）.
（5）UDML： Joint mutual learning strategies
UDML（Unified-Deep Mutual Learning） Joint mutual learning is PP-OCRv2 A very effective strategy for text recognition to improve the effect of the model . stay PP-OCRv3 in , For two different SVTR_LCNet and Attention structure , Between them PP-LCNet Characteristic graph 、SVTR Output and output of the module Attention The output of the module is supervised and trained at the same time . Use this strategy , The accuracy of the recognition model is further improved to 78.4%（+1.5%）.
（6）UIM： Unlabeled data mining scheme
UIM（Unlabeled Images Mining） It is a very simple unmarked data mining scheme . The core idea is to use the high-precision text recognition model to predict the unlabeled data , Get pseudo tags , And select the samples with high prediction confidence as the training data , For training small models . Use this strategy , The accuracy of the recognition model is further improved to 79.4%（+1%）.
After the above text detection and text recognition 9 Two aspects of optimization , Final PP-OCRv3 At comparable speeds , In the Chinese scene, end-to-end Hmean Indicators compared to PP-OCRv2 promote 5%, The effect is greatly improved . The specific indicators are shown in the table below ：
In English digital scene , be based on PP-OCRv3 English digital model trained separately , Compared with PP-OCRv2 Improve the English digital model 11%, As shown in the following table ：
Multi language scene , be based on PP-OCRv3 Training model , In four language families with evaluation set , Compared with PP-OCRv2, The recognition accuracy is improved on average 5% above , As shown in the following table . meanwhile ,PaddleOCR The team is based on PP-OCRv3 Updated supported 80 More than one language recognition model .
Focus on 【 Flying propeller PaddlePaddle】 official account
Get more technical content ~
author[Paddle paddle],Please bring the original link to reprint, thank you.
The sidebar is recommended
- C - no AC this summer
- Thread control - thread waiting, thread termination, thread separation
- Key points of acupuncture and moxibustion
- Module product and problem solution of Luogu p2260 [Tsinghua training 2012]
- Review points of Geodesy
- Summary of review points of Geodesy
- Arrangement of geodetic knowledge points
- Review key points of basic geodesy
- Luogu p2522 [haoi2011] problem B solution
- [app test] summary of test points
guess what you like
Version management tool - SVN
JDBC ~ resultset, use of resultsetmetadata, ORM idea, arbitrary field query of any table (JDBC Implementation)
This article takes you to understand can bus
Gear monthly update April
Gear monthly update April
Convert timestamp to formatted date JS
The time stamp shows how many minutes ago and how many days ago the JS was processed
Luogu p2216 [haoi2007] ideal square problem solution
Miscellaneous questions 
- Which securities company does qiniu school recommend? Is it safe to open an account
- Hyperstyle: complete face inversion using hypernetwork
- What activities are supported by the metauniverse to access reality at this stage?
- P2P swap OTC trading on qredo
- Google | coca: the contrast caption generator is the basic image text model
- SIGIR 2022 | Huawei reloop: self correcting training recommendation system
- Whether you want "melon seed face" or "national character face", the "face changing" technology of Zhejiang University video can be done with one click!
- Sorting of naacl2022 prompt related papers
- Servlet create project
- "Chinese version" Musk was overturned by the original: "if it's true, I want to see him"
- [network security] web security trends and core defense mechanisms
- [intensive reading] object detection series (10) FPN: introducing multi-scale with feature pyramid
- 007. ISCSI server chap bidirectional authentication configuration
- plot_ Importance multi classification, sorting mismatch, image value not displayed
- [intensive reading] object detection series (XI) retinanet: the pinnacle of one stage detector
- How to install MFS environment for ECS
- [intensive reading] the beginning of object detection series (XII) cornernet: anchor free
- Open source sharing -- a record of students passing through time
- MOT：A Higher Order Metric for Evaluating Multi-object Tracking
- How to develop a distributed memory database (1)
- Reverse engineers reverse restore app and code, and localization is like this
- One line command teaches you how to export all the libraries in anaconda
- Bi tools are relatively big. Let's see which one is most suitable for you
- Read the history of database development
- Self cultivation of coder - batterymanager design
- Technology application of swift phantom type phantom in Apple source code learning
- Swiftui advanced skills: what is the use of the technology of swift phantom type phantom
- Swiftui advanced animation Encyclopedia of complex deformation animation is based on accelerate and vector arithmetic (tutorial includes source code)
- What problems remain unsolved in swiftui in 2022
- I'll set the route for fluent
- Flutter drawing process analysis and code practice
- Emoji language commonly used icon collection (interesting Emoji)
- 5.14 comprehensive case 2.0 - automatic induction door
- How to deploy redis service on k8s top?
- Importance of data warehouse specification
- Idea automatically generates serialization ID
- Why is it recommended not to use select * in MySQL?
- Let's talk about why redis needs to store two data structures for the same data type?
- Domain lateral move RDP delivery