current position:Home>20000 + star ultra lightweight OCR system pp-ocrv3 effect increased by 5% - 11%!

20000 + star ultra lightweight OCR system pp-ocrv3 effect increased by 5% - 11%!

2022-05-15 07:42:34Paddle paddle

8759b1ce056aeac34e37a802d7d51a82.gif

OCR Direction Engineer , I must have heard of it before PaddleOCR This project .

  • Project accumulation Star The number has exceeded 20000+;

  • Board frequently GitHub Trending and Paperswithcode Top of the list every month ;

  • stay Medium And Papers withCode Jointly selected 《Top Trending Libraries of 2021》, Stand out from millions of projects , tops Top10!

  • stay 《2021 China open source annual report 》 Rated as active in Top5!

cd5cce6fd4cd27309347b726fde213ea.gif

PaddleOCR Influence dynamic diagram

77e18f7134ee5193099dc1a54b903118.gif

PP-OCRv3 Effect diagram [1]

This time PaddleOCR Latest release , bring Four heavyweight upgrades , Include :

  • Release ultra lightweight OCR System PP-OCRv3: Chinese and English 、 The accuracy of pure English and multilingual scenes is further improved 5% - 11%!

  • Publish semi-automatic annotation tools PPOCRLabelv2: Add table text image 、 Image key information extraction task and irregular text image annotation function .

  • Release OCR Industrial landing tool set : Get through 22 A training deployment software and hardware environment and mode , Covering the enterprise 90% Training deployment environment requirements .

  • Release the industry's first interactive OCR Open source ebooks 《 Do it yourself OCR》: Cover OCR Cutting edge theory and code practice of full stack technology , And supporting teaching video .

// PaddleOCR The latest release portal //

Click the end of the text to read the original text GET!

https://github.com/PaddlePaddle/PaddleOCR

Let's explain the above upgrades in turn .

PP-OCRv3 Interpretation of optimization strategy

PP-OCR yes PaddleOCR Ultra light weight developed by the team OCR System , oriented OCR Industrial application , Weigh accuracy against speed . In the near future ,PaddleOCR Team directed PP-OCRv2 Detection module and identification module , In total 9 Upgrade in all aspects , Create a new 、 Ultra light weight with better effect OCR System :PP-OCRv3.

In terms of effect , When the speed is comparable , The accuracy of various scenes has been greatly improved

  • Chinese scene , comparison PP-OCRv2, Chinese model upgrade super 5%;

  • English digital scene , comparison PP-OCRv2, English digital model improvement 11%;

  • Multilingual scenes , Optimize 80+ Language recognition effect , Average accuracy improved by more than 5%.

fe3ee18b0969bbdc0b69911cc343d7e7.png

New and upgraded PP-OCRv3 The overall framework of ( In the pink box is PP-OCRv3 New strategy ) Here's the picture . The detection module is still based on DB Algorithm optimization ; The identification module no longer adopts CRNN, Updated to IJCAI 2022 The latest text recognition algorithm SVTR ( Title of thesis :SVTR: Scene Text Recognition with a SingleVisual Model), And carry out industrial adaptation .

499b738472531f3cf095748e16646822.png

The specific optimization strategy includes the following points :

1. Detection module

  • LK-PAN: Feel wild PAN structure

  • DML: Teacher model mutual learning strategies

  • RSE-FPN: Residual attention mechanism FPN structure

2. Identification module

  • SVTR_LCNet: Lightweight text recognition network

  • GTC:Attention To guide the CTC Training strategy

  • TextConAug: Data augmentation strategy for mining text context information

  • TextRotNet: Self supervised pre training model

  • UDML: Joint mutual learning strategies

  • UIM: Unlabeled data mining scheme

See the end of the article for the specific interpretation of the optimization strategy .

PPOCRLabelv2 Multiple heavy updates

PPOCRLabel yes The first open source OCR Semi automatic data annotation tool , Significantly reduce developer tagging OCR Time of data .2021 year , Project acquisition Wave Summit 2021 Excellent open source project award 、 Qizhi community excellent project award . After a year of updating and iteration ,PPOCRLabel Combined with the actual landing demand of the industry , Official release PPOCRLabelv2, The update is as follows :

  • New annotation type : Table labels 、 Key information marking 、 Annotation of irregular text image ( The seal 、 Curved text, etc )

  • New function : Lock box 、 Image rotation 、 Data set partitioning 、 Batch processing, etc

  • Ease of use : newly added whl Package installation 、 And optimize multiple annotation experience

61c27cbaa87de0a76723944225346420.gif

1a95eeef668828beb15568ffb7564c46.gif

a02c0b72f5e8e109be67a7f6482a8e4c.gif

9392d12eef13447fff27466f7632cb28.gif

The table is marked with dynamic diagram 、KIE Marked dynamic diagram 、 Annotation moving picture of irregular text image 、 Image rotation 、 The batch 、 revoke ( Lateral pull )

OCR Industrial landing tool set

Considering the various software and hardware environments and different scenario requirements faced by real industrial applications , Based on the integration of propeller training and propulsion, the function is complete , This upgrade release OCR Industrial landing tool set , Get through 22 A training deployment software and hardware environment and mode , Include 3 Training methods 、6 Training environment 、3 A model compression strategy 、 and 10 Two reasoning deployment methods , As shown in the following table :

8d47c6741da0ab448c4fb8fdd37a6682.png

The characteristic abilities are as follows :

1. Distributed training

The propeller distributed training architecture has 4D Hybrid parallel 、 Many characteristic technologies such as end-to-end adaptive distributed training . stay PP-OCRv3 Recognition model training ,4 The acceleration ratio of the engine reaches 3.52 times , Accuracy is almost lossless .

2. The model of compression

Propeller model compression tool PaddleSlim Fully functional , Overlay model clipping 、 quantitative 、 Distillation and NAS.PP-OCR After the model is cut and quantified , The model size ranges from 8.1M Compression - 3.5M, The average prediction time of mobile terminal is reduced 36%.

3. Service deployment

Propeller service deployment engine PaddleServing, Provide superior performance 、 A model set with reliable functions and service capabilities . in the light of PP-OCR Service deployment of the model , Adopt fully asynchronous Pipeline Serving, The throughput can be improved 2 More than times .

4. Mobile / Edge deployment

Propeller lightweight reasoning engine Paddle Lite Adapted 20+ AI To speed up the chip , It can be realized quickly OCR Models on mobile devices 、 Embedded devices and IOT Equipment and other efficient equipment deployment .

5. Flying oars on the clouds

Deployment toolbox for propeller framework and its model suite , Support Docker Standardized deployment and deployment Kubernetes There are two ways of cluster deployment , Meet different scenarios and environments OCR Training deployment requirements of the model .

《 Do it yourself OCR》 e-book

《 Do it yourself OCR》 yes PaddleOCR The team Join hands with Huazhong University of science and Technology / professor ,IAPR Fellow Bai Xiang 、 Chen Zhizhi, a young researcher at Fudan University 、 Huang Wenhui, senior expert in vision field of China Mobile Research Institute 、 Researcher of big data Artificial Intelligence Laboratory of industrial and Commercial Bank of China and other industry university research colleagues , as well as OCR developer A combination created together OCR Teaching materials for cutting-edge theory and code practice . The main features are as follows :

  • Cover from text detection and recognition to document analysis OCR Full stack technology

  • Closely combined with theory and practice , Cross the code implementation gap , And supporting teaching video

  • Notebook Interactive learning , Flexible code modification , Get immediate results

913bad8ba6352c7b10f25eff7a91dc61.png

Join in PaddleOCR

Technology exchange group

Group entry benefits

1. obtain PaddleOCR Links to live classes that explain the content of this upgrade in detail .

2. obtain PaddleOCR Organized by the team 10G blockbuster OCR Learning gift bag , Include :

  • 《 Do it yourself OCR》 e-book , Supporting explanation videos and notebook project

  • 66 piece OCR Relevant top conference frontier papers will be packaged and released , Include CVPR、AAAI、IJCAI、ICCV etc.

  • PaddleOCR Video of live classes distributed in previous editions

  • OCR Community excellent developer project sharing video

0e166eb5c7b4c7fd2ca412e338febd1d.png

Preview package content

Entry mode

  • SETP1: Wechat scan QR code , Fill in the questionnaire

  • SETP2: Join the exchange group to receive benefits

45163781389badeee9c8e966ca99e969.png

If you think it's good , Suggest visiting GitHub

Order one star Focus on collection ~

https://github.com/PaddlePaddle/PaddleOCR

da5daad0eac0077442b65316f7825497.png

Read more

  • Official website address :https://www.paddlepaddle.org.cn

  • PaddleOCR Project address :

    GitHub: https://github.com/PaddlePaddle/PaddleOCR

    Gitee: https://gitee.com/paddlepaddle/PaddleOCR

  • notes :[1] The test sample pictures are from the network

PP-OCRv3 Specific interpretation of optimization strategy

1. Detection module optimization strategy

PP-OCRv3 The detection module is right PP-OCRv2 Medium CML(CollaborativeMutual Learning) The text detection strategy of collaborative mutual learning has been upgraded . As shown in the figure below ,CML The core idea of combines the traditional Teacher To guide the Student Standard distillation and Students Between networks DML Mutual learning , It can make Students While learning from each other on the Internet ,Teacher Network guidance .PP-OCRv3 Further effect optimization is carried out for teacher model and student model respectively . among , When optimizing the teacher model , Put forward the concept of large receptive field PAN structure LK-PAN And introduced DML(Deep MutualLearning) Distillation strategy ; When optimizing the student model , The mechanism of residual attention is proposed FPN structure RSE-FPN. Ablation experiments are shown in the table below .

563cdb63adf83f0907e34cf2acdd5da0.png

2d11eb3ae42ed9888ec03ef205501465.png

Test environment :Intel Gold 6148 CPU, When forecasting, turn on MKLDNN Speed up

(1)LK-PAN: Feel wild PAN structure

LK-PAN (Large Kernel PAN) It's a lightweight with a larger receptive field PAN structure , The core is to PAN Structural path augmentation The convolution kernel in 3*3 Change it to 9*9. By increasing the convolution kernel , Enhance the receptive field covered by each position of the feature map , Easier to detect large font text and text with extreme aspect ratio . Use LK-PAN structure , The teacher model can be hmean from 83.2% Upgrade to 85.0%.

(2)DML: Teacher model mutual learning strategies

DML Learn distillation methods from each other , Learn from each other through two models with the same structure , It can effectively improve the accuracy of text detection model . The teacher model adopts DML Strategy , hmean from 85% Upgrade to 86%. take PP-OCRv2 in CML The teacher model is updated to the above higher precision teacher model , The student model is hmean We can go further from 83.2% Upgrade to 84.3%.

(3)RSE-FPN: Residual attention mechanism FPN structure

RSE-FPN(ResidualSqueeze-and-Excitation FPN) Residual structure and channel attention structure are introduced , take FPN The convolution layer in is replaced by a channel attention structure with residual structure RSEConv layer , Further improve the representation ability of feature map . Further will PP-OCRv2 in CML Student model of FPN Structure updated to RSE-FPN, The student model is hmean We can go further from 84.3% Upgrade to 85.4%.

2. Identify module optimization strategies

PP-OCRv3 The recognition module is based on text recognition algorithm SVTR Optimize .SVTR No longer use RNN structure , By introducing Transformers Structure can mine the context information of text line image more effectively , So as to improve the ability of text recognition . Direct will PP-OCRv2 The recognition model of , Replace with SVTR_Tiny, Recognition accuracy from 74.8% Upgrade to 80.1%(+5.3%), But the prediction speed is nearly 11 times ,CPU Predict the last line of text , nearly 100ms. therefore , As shown in the figure below ,PP-OCRv3 Use the following 6 An optimization strategy is used to identify the model acceleration , Ablation experiments are shown in the table below :

4abe7a231298fdf53648febf4ba6812e.png

notes : When testing speed , experiment 01-03 The input picture size is (3,32,320),04-08 The input picture size is (3,48,320). In the actual prediction , The image is a variable length input , The speed will change . Test environment :Intel Gold 6148 CPU, When forecasting, turn on MKLDNN Speed up .

898e01912aa4da7532d1bd46b14183c3.png

(1)SVTR_LCNet: Lightweight text recognition network

SVTR_LCNet For text recognition tasks , take Transformer Network and lightweight CNN The Internet PP-LCNet A lightweight text recognition network based on fusion . Use this network , And normalize the input image from 32 Upgrade to 48, When the predicted speed is comparable , Recognition accuracy reaches 73.98%, near PP-OCRv2 The recognition model effect of distillation strategy .

(2)GTC:Attention To guide the CTC Training strategy

GTC(Guided Training of CTC), utilize Attention Module and loss , To guide the CTC Loss training , Expression that integrates multiple text features , It is an effective strategy to improve text recognition . Use this strategy , The accuracy of the recognition model is further improved to 75.8%(+1.82%).

(3)TextConAug: Data augmentation strategy for mining text context information

TextConAug It is a data augmentation strategy for mining text context information , It can enrich the context information of training data , Improve the diversity of training data . Use this strategy , The accuracy of the recognition model is further improved to 76.3%(+0.5%).

(4)TextRotNet: Self supervised pre training model

TextRotNet Is to use a large number of unmarked text line data , A pre training model trained by self-monitoring . The model can be initialized SVTR_LCNet Initial weight of , So as to help the text recognition model converge to a better position . Use this strategy , The accuracy of the recognition model is further improved to 76.9%(+0.6%).

(5)UDML: Joint mutual learning strategies

UDML(Unified-Deep Mutual Learning) Joint mutual learning is PP-OCRv2 A very effective strategy for text recognition to improve the effect of the model . stay PP-OCRv3 in , For two different SVTR_LCNet and Attention structure , Between them PP-LCNet Characteristic graph 、SVTR Output and output of the module Attention The output of the module is supervised and trained at the same time . Use this strategy , The accuracy of the recognition model is further improved to 78.4%(+1.5%).

(6)UIM: Unlabeled data mining scheme

UIM(Unlabeled Images Mining) It is a very simple unmarked data mining scheme . The core idea is to use the high-precision text recognition model to predict the unlabeled data , Get pseudo tags , And select the samples with high prediction confidence as the training data , For training small models . Use this strategy , The accuracy of the recognition model is further improved to 79.4%(+1%).

After the above text detection and text recognition 9 Two aspects of optimization , Final PP-OCRv3 At comparable speeds , In the Chinese scene, end-to-end Hmean Indicators compared to PP-OCRv2 promote 5%, The effect is greatly improved . The specific indicators are shown in the table below :

6ce0236991ed40fd94a4664ed78a3d1c.png

In English digital scene , be based on PP-OCRv3 English digital model trained separately , Compared with PP-OCRv2 Improve the English digital model 11%, As shown in the following table :

5ffbea0a6383f83955cb13427fd916e8.png

Multi language scene , be based on PP-OCRv3 Training model , In four language families with evaluation set , Compared with PP-OCRv2, The recognition accuracy is improved on average 5% above , As shown in the following table . meanwhile ,PaddleOCR The team is based on PP-OCRv3 Updated supported 80 More than one language recognition model .

44ac772d10222586b18598ab94c90856.png

fff3f0dd1b07b99e21aea18d43fb7411.gif

Focus on 【 Flying propeller PaddlePaddle】 official account

Get more technical content ~

copyright notice
author[Paddle paddle],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/131/202205102127486880.html

Random recommended