current position:Home>[intensive reading] object detection series (XI) retinanet: the pinnacle of one stage detector
[intensive reading] object detection series (XI) retinanet: the pinnacle of one stage detector
2022-05-15 07:34:46【chaibubble】
Target detection series : object detection (object detection) series ( One ) R-CNN:CNN The pioneering work of target detection object detection (object detection) series ( Two ) SPP-Net: Let convolution calculations share object detection (object detection) series ( 3、 ... and ) Fast R-CNN:end-to-end Happy training object detection (object detection) series ( Four ) Faster R-CNN: Yes RPN Of Fast R-CNN object detection (object detection) series ( 5、 ... and ) YOLO: Another way to open target detection object detection (object detection) series ( 6、 ... and ) SSD: Balance efficiency and accuracy object detection (object detection) series ( 7、 ... and ) R-FCN: Location sensitive Faster R-CNN object detection (object detection) series ( 8、 ... and ) YOLOv2: Better , faster , stronger object detection (object detection) series ( Nine ) YOLOv3: Take the words of a hundred families to grow into a family object detection (object detection) series ( Ten ) FPN: The feature pyramid is used to introduce multi-scale object detection (object detection) series ( 11、 ... and ) RetinaNet:one-stage The pinnacle of the detector object detection (object detection) series ( Twelve ) CornerNet:anchor free The beginning object detection (object detection) series ( 13、 ... and ) CenterNet:no Anchor,no NMS object detection (object detection) series ( fourteen )FCOS: Target detection is processed by image segmentation
Extended series of target detection : object detection (object detection) Extended series ( One ) Selective Search: Selective search algorithm object detection (object detection) Extended series ( Two ) OHEM: Online hard case mining object detection (object detection) Extended series ( 3、 ... and ) Faster R-CNN,YOLO,SSD,YOLOv2,YOLOv3 The difference in the loss function
brief introduction :one-stage The pinnacle of the detector
stay RetinaNet Before , A common phenomenon in the field of target detection is two-stage The method has higher accuracy , But it takes more time , Like the classic Faster R-CNN,R-FCN,FPN etc. , and one-stage The method is more efficient , But the accuracy is worse , Like the classic YOLOv2,YOLOv3 and SSD. This is the general result of the different ideas of the two methods , and RetinaNet Appearance , It has improved this problem to a certain extent , Give Way one-stage The method has more advantages than two-stage Higher accuracy of the method , And it takes less time .RetinaNet My thesis is 《Focal Loss for Dense Object Detection》.
RetinaNet principle
Design concept
two-stage The method will complete the target detection in two steps , First, the production area suggestion box , Then do classification discrimination and regression correction on the box , and one-stage The method completes the classification and discrimination in only one step bbox The return of the . That's because of this difference , It creates the above characteristics , First of all, it's time-consuming and easy to understand , because two-stage There are two steps , The sub network in the second step needs to output repeatedly , So it's inevitably slow . So the result is two-stage The effect is good ,one-stage What is the essential reason for the effect deviation ? Because anchor box It is caused by the serious imbalance between the positive and negative samples , Let's give an example of how unbalanced ,YOLOv2 Of anchor Yes 845 individual ,SSD Of anchor Yes 8732 individual , added anchor Provides more assumptions , Recall rate of the model ( Especially small goals ) It has great significance , But on a normal natural image , There won't be so many goals , This is bound to cause sample imbalance in follow-up tasks , And because of one-stage Of , added anchor The contradiction between the benefits and more serious sample imbalance cannot be solved structurally . So why two-stage The method has no effect ?two-stage There are many or even more methods anchor ah , such as FPN Yes 200k individual , This sum YOLOv2 Of 8k None of them are of the same order of magnitude . the reason being that two-stage Structure , So the area suggestion box can be selected , Choose whatever you want ,FPN Eliminate the serious imbalance between positive and negative samples in three aspects :
- FPN Will choose to work with Ground turth Of IOU>0.7 Make a positive sample , And Ground turth Of IOU<0.3 Make negative samples , This widens the difference between positive and negative samples ;
- FPN Of RPN The final output will be controlled at 1000-2000 Between the two , Control the number of samples ;
- FPN Combine every time you train minibatch, The proportion of positive and negative samples is 1:3.
And just because one-stage There is no way to re screen the sample , Can we always improve in other places , To reduce the impact . This is the loss function , Because the balance of the sample , Ultimately, the impact will fall on loss and optimization . therefore RetinaNet Put forward Focal loss, It solves the problem of target detection when the positive and negative sample areas are extremely unbalanced loss The problem of being easily influenced by a large number of negative samples .
Focal loss
Focal Loss Is an improved cross entropy loss , In general, the cross entropy loss is as long as the second classification :
Put it another way , That's what it's like :
Definition p_{t} :
that CE(p,y)=CE(p_{t})=-log(p_{t}) . The general way to balance the cross entropy loss is to introduce a factor as a positive example \alpha\in[0,1] , So the corresponding , The factor of negative example is 1-\alpha . There's a little bit of caution here , The definition factor is mentioned in this paper \alpha_{t} The way and definition of p_{t} Is similar to that of , That is to say. :
With this \alpha_{t} after , Balanced cross entropy loss can be written as :
When this formula is expanded, it is actually like this :
however \alpha It's a fixed coefficient , It has no way to distinguish which samples are difficult , Which samples are easy to , So on the basis of balancing cross entropy ,Focal loss Improvements have been made. , Get into (1-p_{t})^{\gamma } , among \gamma It's a super parameter , therefore Focal loss The expression for is :
So why is this form better than equilibrium cross entropy ? We assume that \gamma by 1, The coefficient becomes 1-p_{t} , On this basis , Let's expand in the above way FL(p_{t}) , It's actually like this :
Because cross entropy was done before softmax, therefore p
It must be a positive number , This factor plus the sign that will not change the original loss , Then we give an example of how it distinguishes between difficult and easy samples ,p Is the predicted value of the positive sample :
- For a positive example , The model thinks it's simple , that p It will approach to 1,1-p It will approach to 0, The loss will be smaller , The opposite will get bigger .
- For a negative example , The model thinks it's simple , that p It will approach to 0, The loss will be smaller , The opposite will get bigger .
This makes a difference in the difficulty of the model loss value , Besides Focal loss There is also a super parameter \gamma , The factor becomes power to , Because the base number must be less than 1 Number of numbers , The power will increase 1-p_{t} The original linear magnification . such as 1-p_{t} It was 0.1 and 0.9,\gamma=2 when , Will become 0.01 and 0.81,9 Times become 81 times . This suppresses simple samples , Ways to promote difficult samples , Actually sum IOU>0.7 and IOU<0.3 The same is true . After adding this , Hard and easy samples are distinguished , But the problem of too many negative samples doesn't seem to have been solved , therefore Focal loss Finally, the equilibrium cross entropy is added back , Used in the experiment Focal loss The form is :
This experiment shows that \alpha and \gamma selection , stay (a) in , For equilibrium cross entropy loss , stay \alpha=0.75 when , The effect is the best , This is consistent with the results of our analysis above ,\alpha>0.5 Can suppress negative samples , But in Focal loss in ,\alpha=0.25 and \gamma=2 When , The best effect , This may be because (1-p_{t})^{\gamma } The introduction of , Affected \alpha Selection of .Focal loss yes RetinaNet The most important part , Network structure 、Anchor、 Loss and the rest RetinaNet All used before , Let's briefly mention .
Network structure
This is RetinaNet Network structure , Actually, it 's just FPN, But it uses FPN do one-stage structure , Instead of two-stage, So the second stage is omitted . stay YOLO In the article , We talked about RPN and YOLO The difference between , When RPN No longer only do the classification of objects , But to judge what kind of object it is , That one RPN Can complete the whole set of target detection tasks . The idea is RetinaNet Used in ,RetinaNet It is equivalent to abandoning FPN Medium Fast R-CNN, Changed the FPN Medium RPN The network makes category prediction directly . therefore RetinaNet There are also many subnetworks in , Corresponding to the number of layers of the feature pyramid . As for more details , I won't introduce you .
Anchor Box
RetinaNet selection Anchor Box The strategy and FPN be similar , Altogether 5 Feature maps of different scales , Namely 32^{2}-512^{2} , Each floor will have three proportions , therefore FPN Yes 15 Kind of Anchor, however RetinaNet On this basis, another factor is added , That is, there is another scale on the characteristic map of each layer , They are the scales of the characteristic map of this layer {2^{0},2^{\frac{1}{3}},2^{\frac{2}{3}}} , therefore RetinaNet Of Anchor Turned into 45 individual . because Focal loss The introduction of , Give Way Anchor The choice of becomes arbitrary , Just not afraid of more .╮( ̄▽  ̄)╭
RetinaNet Performance evaluation
This is RetinaNet The overall result of , stay backbone choice ResNet-101, The input resolution is 800 when ,RetinaNet Of AP More than the FPN, Although than FPN More slowly , But this is one-stage Use the same resolution as the first input of the model backbone Under the circumstances ,AP Can exceed two-stage. When the resolution becomes 500 When ,RetinaNet It has excellent performance .
copyright notice
author[chaibubble],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/131/202205102135065236.html
The sidebar is recommended
- SQLite3 minimalist Tutorial & go operating data structures using SQLite memory mode
- Penetration test - DNS rebinding
- The pytoch loading model only imports some layer weights, that is, it skips the method of specifying the network layer
- Parameter and buffer in pytoch model
- torch. nn. functional. Interpolate function
- Specify the graphics card during pytorch training
- [paper notes] Dr TANet: dynamic receptive temporary attention network for street scene change detection
- [MQ] achieve mq-08- configuration optimization from scratch fluent
- New signs are taking place in the Internet industry, and a new transformation has begun
- ACL 2022 | visual language pre training for multimodal attribute level emotion analysis
guess what you like
Cvpr2022 | latest progress in small sample behavior recognition strm framework, spatio-temporal relationship modeling is still the top priority
Hallucinations in large models
Is it safe to open an account online? Which of the top ten securities companies are state-owned enterprises?
[encapsulation tips] encapsulation of list processing function
Start with Google sea entrepreneurship accelerator - recruitment and start
Hard core preview in May! Lecture tomorrow night: virtio virtualization technology trend and DPU practice | issue 16
Druid source code reading 1 - get connection and release connection
Graduation summary of actual combat training camp
Public offering "imported products" temporarily hit the reef? The first foreign-funded public offering BlackRock fund has a lot of bad thoughts or a lot of things. It is acclimatized and the performance of the two products is poor
Introduction and installation of selenium module, use of coding platform, use of XPath, use of selenium to crawl JD product information, and introduction and installation of sketch framework
Random recommended
- Financial IT architecture - Analysis of cloud native architecture of digital bank
- [paper notes] lsnet: extreme light weight Siamese network for change detection in remote sensing image
- Mock tool equivalent to Fiddler!
- Write a program, input several integers (separated by commas) and count the number of occurrences of each integer.
- Inventory a voice conversion library
- Technology selection of micro service registration center: which of the five mainstream registration centers is the most popular?
- Summary of root cause analysis ideas and methods | ensuring it system and its stability
- JS custom string trim method
- Web3: the golden age of Creator economy
- Introduction and installation of selenium module, use of coding platform, use of XPath, use of selenium to crawl JD product information, and introduction and installation of sketch framework
- Basics: a 5-makefile example
- Database connection pool Druid source code learning (V)
- Check the box to prevent bubbling
- Click on the box to change color
- Local style and global style of uniapp
- LeetCode. 2233. Maximum product after K increases (math / minimum heap)
- Overview, BGP as, BGP neighbor, BGP update source, BGP TTL, BGP routing table, BGP synchronization
- Routing policy and routing control
- Principle and configuration of IS-IS
- C - no AC this summer
- Basic operation of linked list (complete code)
- Thread control - thread waiting, thread termination, thread separation
- Key points of acupuncture and moxibustion
- Module product and problem solution of Luogu p2260 [Tsinghua training 2012]
- Review points of Geodesy
- Summary of review points of Geodesy
- Arrangement of geodetic knowledge points
- Review key points of basic geodesy
- Luogu p2522 [haoi2011] problem B solution
- [app test] summary of test points
- Version management tool - SVN
- JDBC ~ resultset, use of resultsetmetadata, ORM idea, arbitrary field query of any table (JDBC Implementation)
- This article takes you to understand can bus
- Gear monthly update April
- Gear monthly update April
- Convert timestamp to formatted date JS
- The time stamp shows how many minutes ago and how many days ago the JS was processed
- [untitled]
- Luogu p2216 [haoi2007] ideal square problem solution
- Miscellaneous questions [2]