current position:Home>[paper notes] lsnet: extreme light weight Siamese network for change detection in remote sensing image
[paper notes] lsnet: extreme light weight Siamese network for change detection in remote sensing image
2022-05-15 07:20:49【m0_ sixty-one million eight hundred and ninety-nine thousand on】
The paper
Thesis title :LSNET: EXTREMELY LIGHT-WEIGHT SIAMESE NETWORK FOR CHANGE DETECTIONOF REMOTE SENSING IMAGE
The delivery :CVPR 2022
Address of thesis :https://arxiv.org/abs/2201.09156
Project address :https://github.com/qaz670756/LSNet
The idea of this paper is relatively simple , Mainly made two modifications , The first is the lightweight of the backbone network , use CGB Modules build twin lightweight backbone networks ; On the other hand, it is the improvement of pyramid feature fusion , stay denseFPN Improve on the basis of , Remove redundant connections , Increase the bottom-up fusion path . The reason why the amount of parameters and calculation of the model are greatly reduced is the lightweight of the backbone network , The depth separable convolution is used to replace the ordinary convolution operation .
experimental result
Official training parameters :
{
"patch_size": 256,
"augmentation": true,
"num_gpus": 1,
"num_workers": 8,
"num_channel": 3,
"EF": false,
"epochs": 101,
"batch_size": 12,
"learning_rate": 1e-3,
"model_name": "denseFPN",
"loss_function": "contra_hybrid",
"dataset_dir": "data/Real/subset/",
"weight_dir": "./outputs/",
"log_dir": "./log/"
}
Abstract
Twin networks have gradually become remote sensing images (remote sensing image,RSI) The mainstream of change detection . But with the structure 、 Complexity of modules and training process , The model is becoming more and more complex , Difficult to apply in practice .
this paper , A method for RSI Ultra lightweight twin network for change detection (Light-Weight Siamese Network,LSNet), The standard convolution is replaced by deep separable convolution and void convolution , And remove redundant dense connections , Only the effective feature flow is retained in the twin feature fusion , It greatly reduces the parameters and calculation . stay CDD On dataset , Compared with the first model ,LSNet The parameters and calculation are reduced respectively 90.35% and 91.34%, The accuracy is only reduced 1.5%.
Introduction
Conventional RSI The change detection method depends on artificial features and time-consuming pre and post-processing , It is difficult to distinguish between semantic change and background noise .
The image pair can be directly input into the twin convolution network , No preprocessing is required , Depending on end-to-end supervised learning, we can separate semantic change regions and invariant regions .
- This paper presents a lightweight twin network LSNet, It's very efficient , Pictured 1. The network backbone adopts the context guidance module (Context Guide Block,CGB) structure , The core components of the module are deep separable hole convolution and global feature aggregation . Compared to using ResNet-50 As the backbone ,LSNet The parameter quantity and calculation quantity of the trunk are only the original 3.97% and 32.56%.
- A characteristic pyramid of difference network is proposed (diffFPN) To carry out progressive feature difference extraction and resolution restoration ( Eliminate redundant connections while maintaining feature flow ), Finally, the changing image region is separated from the constant image region .
Method
LSNet: Including a twin backbone (LightSiamese Backbone) And a differential characteristic pyramid network (diffFPN). The backbone uses the context guidance module (CGB) structure ,diffFPN For effective twin feature pair fusion .
Light-Siamese backbone
Images T1 and T2 Through the twin network backbone with shared weight , The backbone network consists of 4 Composed of two composite layers ( From top to bottom , The composite layer consists of 3/3/8/12 individual CGB Module composition ), Every CGB Equivalent to two levels , So there is 4 Group feature output , in total 52 layer .
Basic components (Context Guide Block)CGB Pictured 2 On the right . Input X After parallel expansion ( inflation ) Convolution , To get different ranges ( Feel the field ) Local context information . The expanded convolution is calculated in a deeply separable manner , That is, all channels are grouped , Convolution operates only in a separate group .( Depth separates the convolution , It can greatly reduce the amount of calculation , But there is an upper limit to speed , The bottleneck of computing power lies in the access bandwidth )
Channel interaction and global information extraction .
Differential feature pyramid network
SUNNet A pyramid feature fusion method with dense connection is proposed , chart 3(a) Shown ,
such denseFPN Structure exists 2 A question :
- Redundant connections .(T_1,0、T_2,0 Such shallow features are repeatedly input into d_1,0、d_2,0、d_3,0 in , inefficiency )
- Unreasonable characteristic flow .(denseFPN in , Output layer d_0,0 and d_1,0 Contains incomplete features from the backbone )
therefore , The paper proposes diffFPPN structure , Remove redundant connections , Added a bottom-up fusion path , Make the three output layers include the complete backbone network features .
Experiment and Results
Dataset and evaluation metrics
Data sets :CDD
Common indicators : precision、recall、F1-score、overall accuracy
Quantitative index :F1-G、F1-F ( The quantization unit parameters and the amount of calculation are right F1 The impact of scores ),F1-Eff( Evaluate the overall efficiency of the model )
Accuracy and efficiency comparison
Comparison of parameters and calculation between the two modules (CDD Data sets ), As can be seen from the table ,
- comparison ResNet-50,LightSiamese-52 The parameter quantity and calculation quantity of the trunk are only the original 1/25 and 1/3.
- denseFPN There is irrationality of characteristic flow in the structure of ,diffFPN Then only lift 0.0709M When the parameter quantity of , The amount of calculation is reduced 1.0884GFLOPs, Cut by more than half .
There are many ways to CDD Performance comparison in data set ,LSNet The performance indexes of the method are still ok, Top three .
Efficiency comparison of various methods , Visible use diffFPN The method has the highest F1-P and F1-G.
Combine tables 2 Sum graph 3, And SNUNet comparison ,LSNet The parameters and calculation are reduced respectively 90.35% and 91.34%, Accuracy only decreases 1.5%.
LSNet Visual results of . The results are relatively accurate , But the edge details need to be further refined .(e) It can be seen that , The edge of the change area has a higher probability than the interior , It shows that the network uses the structure of the region as the identification feature , It improves its robustness to color and texture changes .
Conclusion
In order to effectively detect RSI change , A lightweight twin network is proposed , The network has a context guided module (CGB) Build a lightweight twin trunk (LightSiamese Backbone) And feature pair fusion module (diffFPN). In challenging CCD The results on the data set show that , Compared with other mainstream methods , This method obtains competitive results with limited parameters and calculation , Proved its validity .
Core code
Context Guide Block
class ContextGuidedBlock(nn.Module):
"""Context Guided Block for CGNet.
This class consists of four components: local feature extractor,
surrounding feature extractor, joint feature extractor and global
context extractor.
Args:
in_channels (int): Number of input feature channels.
out_channels (int): Number of output feature channels.
dilation (int): Dilation rate for surrounding context extractor.
Default: 2.
reduction (int): Reduction for global context extractor. Default: 16.
skip_connect (bool): Add input to output or not. Default: True.
downsample (bool): Downsample the input to 1/2 or not. Default: False.
conv_cfg (dict): Config dict for convolution layer.
Default: None, which means using conv2d.
norm_cfg (dict): Config dict for normalization layer.
Default: dict(type='BN', requires_grad=True).
act_cfg (dict): Config dict for activation layer.
Default: dict(type='PReLU').
with_cp (bool): Use checkpoint or not. Using checkpoint will save some
memory while slowing down the training speed. Default: False.
"""
def __init__(self,
in_channels,
out_channels,
dilation=2,
reduction=16,
skip_connect=True,
downsample=False,
conv_cfg=None,
norm_cfg=dict(type='BN', requires_grad=True),
act_cfg=dict(type='PReLU'),
with_cp=False):
super(ContextGuidedBlock, self).__init__()
self.with_cp = with_cp
self.downsample = downsample
# channels = out_channels if downsample else out_channels // 2
channels = out_channels // 2
if 'type' in act_cfg and act_cfg['type'] == 'PReLU':
act_cfg['num_parameters'] = channels
kernel_size = 3 if downsample else 1
stride = 2 if downsample else 1
padding = (kernel_size - 1) // 2
# self.channel_shuffle = ChannelShuffle(2 if in_channels==in_channels//2*2 else in_channels)
self.conv1x1 = nn.Sequential(
nn.Conv2d(in_channels, channels, kernel_size=kernel_size, stride=stride, padding=padding),
build_norm_layer(channels),
nn.PReLU(num_parameters=channels)
)
self.f_loc = nn.Conv2d(channels, channels, kernel_size=3,
padding=1, groups=channels, bias=False)
self.f_sur = nn.Conv2d(channels, channels, kernel_size=3, padding=dilation,
dilation=dilation, groups=channels, bias=False)
self.bn = build_norm_layer(2 * channels)
self.activate = nn.PReLU(2 * channels)
# original bottleneck in CGNet: A light weight context guided network for segmantic segmentation
# is removed for saving computation amount
# if downsample:
# self.bottleneck = build_conv_layer(
# conv_cfg,
# 2 * channels,
# out_channels,
# kernel_size=1,
# bias=False)
self.skip_connect = skip_connect and not downsample
self.f_glo = GlobalContextExtractor(out_channels, reduction, with_cp)
# self.f_glo = CoordAtt(out_channels,out_channels,groups=reduction)
def forward(self, x):
def _inner_forward(x):
# x = self.channel_shuffle(x)
out = self.conv1x1(x)
loc = self.f_loc(out)
sur = self.f_sur(out)
joi_feat = torch.cat([loc, sur], 1) # the joint feature
joi_feat = self.bn(joi_feat)
joi_feat = self.activate(joi_feat)
if self.downsample:
pass
# joi_feat = self.bottleneck(joi_feat) # channel = out_channels
# f_glo is employed to refine the joint feature
out = self.f_glo(joi_feat)
if self.skip_connect:
return x + out
else:
return out
return _inner_forward(x)
def cgblock(in_ch, out_ch, dilation=2, reduction=8, skip_connect=False):
return nn.Sequential(
ContextGuidedBlock(in_ch, out_ch,
dilation=dilation,
reduction=reduction,
downsample=False,
skip_connect=skip_connect))
light_siamese_backbone
class light_siamese_backbone(nn.Module):
def __init__(self, in_ch=None, num_blocks=None, cur_channels=None,
filters=None, dilations=None, reductions=None):
super(light_siamese_backbone, self).__init__()
norm_cfg = {'type': 'BN', 'eps': 0.001, 'requires_grad': True}
act_cfg = {'type': 'PReLU', 'num_parameters': 32}
self.inject_2x = InputInjection(1) # down-sample for Input, factor=2
self.inject_4x = InputInjection(2) # down-sample for Input, factor=4
# stage 0
self.stem = nn.ModuleList()
for i in range(num_blocks[0]):
self.stem.append(
ContextGuidedBlock(
cur_channels[0], filters[0],
dilations[0], reductions[0],
skip_connect=(i != 0),
downsample=False,
norm_cfg=norm_cfg,
act_cfg=act_cfg) # CG block
)
cur_channels[0] = filters[0]
cur_channels[0] += in_ch
self.norm_prelu_0 = nn.Sequential(
build_norm_layer(cur_channels[0]),
nn.PReLU(cur_channels[0]))
# stage 1
self.level1 = nn.ModuleList()
for i in range(num_blocks[1]):
self.level1.append(
ContextGuidedBlock(
cur_channels[0] if i == 0 else filters[1],
filters[1], dilations[1], reductions[1],
downsample=(i == 0),
norm_cfg=norm_cfg,
act_cfg=act_cfg)) # CG block
cur_channels[1] = 2 * filters[1] + in_ch
self.norm_prelu_1 = nn.Sequential(
build_norm_layer(cur_channels[1]),
nn.PReLU(cur_channels[1]))
# stage 2
self.level2 = nn.ModuleList()
for i in range(num_blocks[2]):
self.level2.append(
ContextGuidedBlock(
cur_channels[1] if i == 0 else filters[2],
filters[2], dilations[2], reductions[2],
downsample=(i == 0),
norm_cfg=norm_cfg,
act_cfg=act_cfg)) # CG block
cur_channels[2] = 2 * filters[2]
self.norm_prelu_2 = nn.Sequential(
build_norm_layer(cur_channels[2]),
nn.PReLU(cur_channels[2]))
# stage 3
self.level3 = nn.ModuleList()
for i in range(num_blocks[3]):
self.level3.append(
ContextGuidedBlock(
cur_channels[2] if i == 0 else filters[3],
filters[3], dilations[3], reductions[3],
downsample=(i == 0),
norm_cfg=norm_cfg,
act_cfg=act_cfg)) # CG block
cur_channels[3] = 2 * filters[3]
self.norm_prelu_3 = nn.Sequential(
build_norm_layer(cur_channels[3]),
nn.PReLU(cur_channels[3]))
def forward(self, x):
# x = torch.cat([xA, xB], dim=0)
# stage 0
inp_2x = x # self.inject_2x(x)
inp_4x = self.inject_2x(x)
for layer in self.stem:
x = layer(x)
x = self.norm_prelu_0(torch.cat([x, inp_2x], 1))
x0_0A, x0_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]
# stage 1
for i, layer in enumerate(self.level1):
x = layer(x)
if i == 0:
down1 = x
x = self.norm_prelu_1(torch.cat([x, down1, inp_4x], 1))
x1_0A, x1_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]
# stage 2
for i, layer in enumerate(self.level2):
x = layer(x)
if i == 0:
down1 = x
x = self.norm_prelu_2(torch.cat([x, down1], 1))
x2_0A, x2_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]
# stage 3
for i, layer in enumerate(self.level3):
x = layer(x)
if i == 0:
down1 = x
x = self.norm_prelu_3(torch.cat([x, down1], 1))
x3_0A, x3_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]
return [x0_0A, x0_0B, x1_0A, x1_0B, x2_0A, x2_0B, x3_0A, x3_0B]
class InputInjection(nn.Module):
"""Downsampling module for CGNet."""
def __init__(self, num_downsampling):
super(InputInjection, self).__init__()
self.pool = nn.ModuleList()
for i in range(num_downsampling):
self.pool.append(nn.AvgPool2d(3, stride=2, padding=1))
def forward(self, x):
for pool in self.pool:
x = pool(x)
return x
def build_norm_layer(ch):
layer = nn.BatchNorm2d(ch, eps=0.01)
for param in layer.parameters():
param.requires_grad = True
return layer
diffFPN
class diffFPN(nn.Module):
def __init__(self, cur_channels=None, mid_ch=None,
dilations=None, reductions=None,
bilinear=True):
super(diffFPN, self).__init__()
# lateral convs for unifing channels
self.lateral_convs = nn.ModuleList()
for i in range(4):
self.lateral_convs.append(
cgblock(cur_channels[i] * 2, mid_ch * 2 ** i, dilations[i], reductions[i])
)
# top_down_convs
self.top_down_convs = nn.ModuleList()
for i in range(3, 0, -1):
self.top_down_convs.append(
cgblock(mid_ch * 2 ** i, mid_ch * 2 ** (i - 1), dilation=dilations[i], reduction=reductions[i])
)
# diff convs
self.diff_convs = nn.ModuleList()
for i in range(3):
self.diff_convs.append(
cgblock(mid_ch * (3 * 2 ** i), mid_ch * 2 ** i, dilations[i], reductions[i])
)
for i in range(2):
self.diff_convs.append(
cgblock(mid_ch * (3 * 2 ** i), mid_ch * 2 ** i, dilations[i], reductions[i])
)
self.diff_convs.append(
cgblock(mid_ch * 3, mid_ch * 2,
dilation=dilations[0], reduction=reductions[0])
)
self.up2x = up(32, bilinear)
def forward(self, output):
tmp = [self.lateral_convs[i](torch.cat([output[i * 2], output[i * 2 + 1]], dim=1))
for i in range(4)]
# top_down_path
for i in range(3, 0, -1):
tmp[i - 1] += self.up2x(self.top_down_convs[3 - i](tmp[i]))
# x0_1
tmp = [self.diff_convs[i](torch.cat([tmp[i], self.up2x(tmp[i + 1])], dim=1)) for i in [0, 1, 2]]
x0_1 = tmp[0]
# x0_2
tmp = [self.diff_convs[i](torch.cat([tmp[i - 3], self.up2x(tmp[i - 2])], dim=1)) for i in [3, 4]]
x0_2 = tmp[0]
# x0_3
x0_3 = self.diff_convs[5](torch.cat([tmp[0], self.up2x(tmp[1])], dim=1))
return x0_1, x0_2, x0_3
LSNet_diffFPN
class LSNet_diffFPN(nn.Module):
# SNUNet-CD with ECAM
def __init__(self, in_ch=3, mid_ch=32, out_ch=2, bilinear=True):
super(LSNet_diffFPN, self).__init__()
torch.nn.Module.dump_patches = True
n1 = 32 # the initial number of channels of feature map
filters = (n1, n1 * 2, n1 * 4, n1 * 8, n1 * 16)
num_blocks = (3, 3, 8, 12)
dilations = (1, 2, 4, 8)
reductions = (4, 8, 16, 32)
cur_channels = [0, 0, 0, 0]
cur_channels[0] = in_ch
self.backbone = light_siamese_backbone(in_ch=in_ch, num_blocks=num_blocks,
cur_channels=cur_channels,
filters=filters, dilations=dilations,
reductions=reductions)
self.head = cam_head(mid_ch=mid_ch,out_ch=out_ch)
self.FPN = diffFPN(cur_channels=cur_channels, mid_ch=mid_ch,
dilations=dilations, reductions=reductions, bilinear=bilinear)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x, debug=False):
output = self.backbone(x)
x0_1, x0_2, x0_3 = self.FPN(output)
out = self.head(x0_1, x0_2, x0_3)
if debug:
print_flops_params(self.backbone, [x], 'backbone')
print_flops_params(self.FPN, [output], 'diffFPN')
print_flops_params(self.head, [x0_1, x0_2, x0_3], 'head')
return (x0_1, x0_2, x0_3, x0_3, out,)
copyright notice
author[m0_ sixty-one million eight hundred and ninety-nine thousand on],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/135/202205142322539306.html
The sidebar is recommended
- Redis -- realize the like function
- Redisson
- What are the problems with NoSQL?
- What aspects does the class database of real-time data warehouse include?
- [MQ] achieve mq-08- configuration optimization from scratch fluent
- Rongyun x dsport: win the first drop of blood of game social "real-time community"
- Druid source code reading 5 -- discuss why Druid doesn't use atomiclong
- Construction practice camp - graduation summary
- [module] environment variable management tool module installation and use
- Kernel heap bypass SMEP, SMAP & modprobe_ path
guess what you like
Getting started with mqtt: online debugging and connecting to alicloud
Leetcode deletes the penultimate node of the linked list
Design of Web rights management (1) -- Analysis of key points of design (4)
Tencent cloud database growth index ranked first
Three ways! Go error handling best practices
Live broadcast at 20:00 tomorrow night | open the box Tencent cloud: play lighthouse, novice friendly and light-weight cloud!
New study conclusion: sleeping more than 2 hours on weekends reduces the risk of depression by 48%
Retinal vessel segmentation based on u-net model
Applet X-axis fade in animation component
Explanation of wechat applet interface call (accesstoken, applet code login)
Random recommended
- Single cell column - how to give orig Ident, change your name
- Fonts best practices
- Wonderful express | April issue of Tencent cloud database
- Illustration: what is the difference between layer 2 and layer 3 switches?
- Activity Notice | timing adjustment of 2022 deterministic network technology and Innovation Summit
- In order to seize the capacity of 5nm chips, AMD will pay an advance payment of US $6.5 billion to TSMC, grofangde and other suppliers; Germany will adopt stricter antitrust rules for Google meta
- It is reported that TSMC will promote the 1.4 nm process next month; Taobaoyuan universe trademark rejected
- Online binary 8-hexadecimal conversion tool
- [paper notes] epsanet: an efficient pyramid sequence attention block on revolutionary neural network
- IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [2], [3]
- What are the development stages of time series database in recent years?
- What are the shortcomings of the data model processed in the first stage of time series database?
- What are the shortcomings of the data model processed in the second stage of time series database?
- What are the development trends of time series database?
- What are the characteristics of cloud native multimode database lindorm?
- What are the functions of cloud native multimode database lindorm?
- Variance, standard deviation, mathematical expectation
- Two dimensional Gaussian distribution
- Collaborative process and channels (CSP: kotlin, golang)
- SQLite3 custom function (UDF)
- SQLite3 minimalist Tutorial & go operating data structures using SQLite memory mode
- Penetration test - DNS rebinding
- The pytoch loading model only imports some layer weights, that is, it skips the method of specifying the network layer
- Parameter and buffer in pytoch model
- torch. nn. functional. Interpolate function
- Specify the graphics card during pytorch training
- [paper notes] Dr TANet: dynamic receptive temporary attention network for street scene change detection
- [MQ] achieve mq-08- configuration optimization from scratch fluent
- New signs are taking place in the Internet industry, and a new transformation has begun
- ACL 2022 | visual language pre training for multimodal attribute level emotion analysis
- Cvpr2022 | latest progress in small sample behavior recognition strm framework, spatio-temporal relationship modeling is still the top priority
- Hallucinations in large models
- Is it safe to open an account online? Which of the top ten securities companies are state-owned enterprises?
- [encapsulation tips] encapsulation of list processing function
- Start with Google sea entrepreneurship accelerator - recruitment and start
- Hard core preview in May! Lecture tomorrow night: virtio virtualization technology trend and DPU practice | issue 16
- Druid source code reading 1 - get connection and release connection
- Graduation summary of actual combat training camp
- Public offering "imported products" temporarily hit the reef? The first foreign-funded public offering BlackRock fund has a lot of bad thoughts or a lot of things. It is acclimatized and the performance of the two products is poor
- Introduction and installation of selenium module, use of coding platform, use of XPath, use of selenium to crawl JD product information, and introduction and installation of sketch framework