current position：Home>Looking back at ResNet - a key step in the history of deep learning
Looking back at ResNet - a key step in the history of deep learning
在2020Just touching the deep learning,学习了ResNet的架构.But I didn't pay much attention at the timeResNet,直到后来,真正开始接触CV、NLP、Timing includesGraphThe research project,我才意识到ResNetThe impact on the entire field of deep learning is far-reaching.
ResNet之前,What's wrong with neural networks？
- When the neural network layer is very large,when the depth is deep,It is prone to fatal problems of vanishing gradients or exploding gradients,As a result, the training process of the model cannot proceed.
- When the neural network layer is very large,when the depth is deep,Models may not be able to learn really valid information,The effect of fitting the target is getting worse and worse,farther and farther from the target
上图所示,可以看到：The left image is the previous deep learning model（如VGG）,As the number of model layers increases,In fact, the model has gradually deviated from the target that needs to be fitted,So the training effect is getting worse and worse,Not even as good as what you get with a very small number of layers;右图则是ResNet希望达到的目的,As the number of model layers increases,The model is gradually approaching the target that needs to be fitted,Even if later slowly fitting effect,But there is no problem that the fitting deviation is getting bigger and bigger.
H ( x ) = F ( x ) + x H(x)=F(x)+x H(x)=F(x)+x
其中, H ( x ) H(x) H(x)is the observed output of each layer, F ( x ) F(x) F(x)Is a layer of neural network, x x x是每一层的输入（称为identity）.这个residualconnection becomes skip connection（skip connection）或短路（shortcut）.
Explain from a functional point of viewResNet的有效性
can be explained intuitivelyResNet的有效性：even after F ( x ) F(x) F(x)layer did not learn anything（or even learned something negatively affecting）,Models can also inherit from inputs x x x的信息.The right side of the first picture can be intuitively explained,Each layer of the model can guarantee that it fully contains the information learned by the previous layer model.因此,As the model deepens,The effect does not deviate from the target,At least it will always be on the basis of the previous,进行学习.
Interpretation from a residual point of viewResNet的有效性
explained above from a functional point of viewResNetThe effectiveness of the intuitive,但是在Kaiming He的论文中,not explained that way.因为 H ( x ) = F ( x ) + x H(x)=F(x)+x H(x)=F(x)+x,故 F ( x ) = H ( x ) − x F(x)=H(x)-x F(x)=H(x)−x,这里的 F ( x ) F(x) F(x)is the difference between the observed output and the input of the layer,我们称之为“残差（Residual）”.
那么,Training target is from the original fitting,Converted to fit residuals.Fit residuals are beneficial,即使 F ( x ) F(x) F(x)Can't learn what works（or even learned something negatively affecting）,It will not gradually move away from the target,It can also be said that the model deviation will not increase.同时,This skip connection also avoids the training problems of vanishing or exploding gradients.
（这个思想,very similar to ensemble learningBoosting,如GBDTgradient ascent tree.Both are essentially fit residuals,But there are also differences：GBDTis the fitted labellabel,而ResNetis the fitted feature mapfeature）
The figure below shows the traditional convolutional neural network andResNet的区别：
According to the way of convolution,主要分为以下两种ResNet：
- Connect multiple constant height and widthResNet块（图左）
- The height and width are halvedResNet块（stride=2）,then the number of channels increases to2倍,then it will be introducedConv1x1The number of channels has been transformed,以便最终add在一起（图右）
Design a network here：刚开始是一个conv7x7的卷积,不改变通道数.每个shortcutModule contains twoconv层,每个resnetIn front of the block contains twoshortcut模块.除了conv7x7outside the block,第一个resnetNo channel number to transform.在之后的resnet块中,只有第一个shortcutThe module needs to double the number of channels.In modules with double the number of channels,shorcut需要通过一个conv1x1The convolution transforms the number of channels.
import torch.nn as nn from torch.nn import functional as F import torch # 定义shortcut模块 class Residual(nn.Module): def __init__(self, input_channels, num_channel, use_conv1x1=False, stride=1): super().__init__() self.conv1 = nn.Conv2d( in_channels=input_channels, out_channels=num_channel, kernel_size=3, stride=stride, padding=1 ) self.conv2 = nn.Conv2d( in_channels=num_channel, out_channels=num_channel, kernel_size=3, stride=1, padding=1 ) if use_conv1x1: self.conv3 = nn.Conv2d( in_channels=input_channels, out_channels=num_channel, kernel_size=1, stride=stride ) else: self.conv3 = None self.bn1 = nn.BatchNorm2d(num_features=num_channel) self.bn2 = nn.BatchNorm2d(num_features=num_channel) # batch_normalization有自己的参数,所以不能像reludefine only one self.relu = nn.ReLU(inplace=True) # No need to reopen memory to store variables,更节省内存 def forward(self, x): y = self.conv1(x) y = self.bn1(y) y = self.relu(y) y = self.conv2(y) y = self.bn2(y) if self.conv3: x = self.conv3(x) y += x return F.relu(y) ## residual test resblk1 = Residual(3, 3, use_conv1x1=False, stride=1) x = torch.rand(4, 3, 6, 6) y = resblk1(x) print(y.shape) # 通常feature map长宽减半,通道数翻倍 resblk2 = Residual(3, 6, use_conv1x1=True, stride=2) x = torch.rand(4, 3, 6, 6) y = resblk2(x) print(y.shape) ## residual test # 定义resnet网络块 def resnet_block(input_channels, num_channels, num_residuals, first_block=False): blks =  for i in range(num_residuals): # 是renet块中的第一个（To change the channel number）,At the same time it is not the first block if i == 0 and not first_block: blks.append( Residual(input_channels=input_channels, num_channel=num_channels, use_conv1x1=True, stride=2) ) else: blks.append(Residual(input_channels=num_channels, num_channel=num_channels, use_conv1x1=False, stride=1)) return blks b1 = nn.Sequential( nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True)) b3 = nn.Sequential(*resnet_block(64, 128, 2, first_block=False)) b4 = nn.Sequential(*resnet_block(128, 256, 2, first_block=False)) b5 = nn.Sequential(*resnet_block(256, 512, 2, first_block=False)) # 这里的*是指把list展开 net = nn.Sequential( b1, b2, b3, b4, b5, nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.Linear(512, 10) ) x = torch.rand((1, 1, 224, 224)) for i, layer in enumerate(net): x = layer(x) print('layer:', i, layer.__class__.__name__, 'output shape:', x.shape)
torch.Size([4, 3, 6, 6]) torch.Size([4, 6, 3, 3]) layer: 0 Sequential output shape: torch.Size([1, 64, 55, 55]) layer: 1 Sequential output shape: torch.Size([1, 64, 55, 55]) layer: 2 Sequential output shape: torch.Size([1, 128, 28, 28]) layer: 3 Sequential output shape: torch.Size([1, 256, 14, 14]) layer: 4 Sequential output shape: torch.Size([1, 512, 7, 7]) layer: 5 AdaptiveAvgPool2d output shape: torch.Size([1, 512, 1, 1]) layer: 6 Flatten output shape: torch.Size([1, 512]) layer: 7 Linear output shape: torch.Size([1, 10])
In addition to the comments in the code,Be careful when programming：
- F.relu()是函数调用,一般用在forwardin the final output of the function;nn.ReLU()是模块调用,Typically used when defining a network.
- cosLearning rate is better than fixed learning rate
- Will the accuracy of the test set be higher than the training set?？其实有可能,If the training set does a lot ofdata augmentation,Then the test set may be more accurate,The training set contains noise
author[Mr.zwX],Please bring the original link to reprint, thank you.
The sidebar is recommended
- 2022 Hailiang SC Travel Notes
- dalle2: hierarchical text-conditional image generation with clip
- Tencent Cloud VOD uploads video files to solve the path problem
- LeetCode - 1047. Remove all adjacent duplicates in a string
- 2022-08-05: What does the following go code output?A: 65, string; B: A, string; C: 65, int; D: error.
- LeetCode - 345. The reversal in the string vowels
- Page Loading Animation_Gradient Color Rotating Small Circle
- Card hovering frosted glass effect
- How does the data security law apply to enterprises?
guess what you like
Full screen digital preload animation
Day 16 (Configuration BPDU, TCN BPDU)
ROS error [rospack] Error: package ‘.....‘ not found
Token design scheme under microservice
Combination of Leetcode77.
Native js implements table table
Day 17 (16 day bpdus related knowledge and STP configuration)
Native js implements mouse following to display floating box information
Exchange comprehensive experiment (to be supplemented)
- Detailed explanation of Mysql things (important)
- Linux - several ways to install MySQL
- /var/log/messages is empty
- The 22nd day of the special assault version of the sword offer
- Stone Atom Technology officially joined the openGauss community
- 18 days (link aggregation of configuration, the working process of the VRRP, IPV6 configuration)
- From "prairie cattle" to "digital cattle": Mengniu's digital transformation!
- Summary of the experience of project operation and maintenance work
- WPF - Styles and Templates
- BigEvent Demo
- rain cloud animation
- VS namespace names of different projects of the same solution are unique
- Flashing Neon Text Animation
- ACM common header files
- Free and open source web version of Xshell [Happy New Year to everyone]
- Timed task appears A component required a bean named ‘xxx‘ that could not be found
- Two important self-learning functions in pytorch dir(); help()
- [Mathematical Modeling] Linear Programming
- HCIP 18 days notes
- The web version of Xshell supports FTP connection and SFTP connection
- The values in the array into another array, and capital
- Remember to deduplicate es6 Set to implement common menus
- View the Linux log on the web side, and view the Linux log on the web side
- 21-day Learning Challenge--Pick-in on the third day (dynamically change the app icon)
- Xshell download crack, the history of the most simple tutorial
- How is the LinkedList added?
- Web version Xshell supports FTP connection and SFTP connection [Detailed tutorial] Continue from the previous article
- Usage of torch.utils.data in pytorch ---- Loading Data
- Experiment 9 (Exchange Comprehensive Experiment)
- [Mathematical Modeling] Integer Programming
- "Introduction to nlp + actual combat: Chapter 9: Recurrent Neural Network"
- Expansion mechanism of ArrayList
- (5) BuyFigrines Hd 2022 school training
- [Nanny-level tutorial] How does Tencent Cloud obtain secretId and secretKey, and enable face service
- RL reinforcement learning summary (2)
- ELT.zip 】 【 OpenHarmony chew club - the methodology of academic research paper precipitation series
- Hdu 2022 Multi-School Training (5) Slipper
- Dijkstr heap optimization