current position:Home>Hyperstyle: complete face inversion using hypernetwork
Hyperstyle: complete face inversion using hypernetwork
2022-05-15 07:32:05【Ericam_】
paper
HyperStyle
: StyleGAN Inversion with HyperNetworks for Real Image Editing
- 2022 CVPR
- StyleGan Inversion correlation
Abstract:
Invert the real image to StyleGAN Of latent space It is a well studied problem . However , The effect of existing methods applied to real-world scenes is still general , This is because there is an inherent trade-off between image reconstruction and editability : The potential spatial region that can accurately represent the real image is usually degraded by the influence of semantic control .
Recently, some work has made a trade-off by adding the target image to the area with good image performance and good editing in the potential space through the fine-tuning generator . But this fine-tuning scheme requires long training for new pictures .
In this work , We introduce this method to the encoder The field of inversion , Put forward HyperStyle. This is a way to Learning adjustment StyleGan Weighted hypernetworks , Simple modulation requires training a parameter over 30 Billion super network , And through careful network design , Reduce parameters to match existing encoder Agreement .HyperStyle The reconstruction effect is similar to latent The reconstruction effect of optimization technology is quite , And encoder Real time reasoning ability of . Finally, we show the use of HyperStyle Effectiveness in several applications outside the inversion task , Including editing images outside the training data domain .
In order to take advantage of the pretrained models, Most of the work avoids changing the weight when the generator performs inversion .
Some work exploration :
- For each image , Adjust the generator for more accurate inversion
- Inversion through random noise vector BigGAN , Select the vector that best matches the real image , And gradually optimize the generator weight
- First, the potential code is obtained through inversion ( It approximately reconstructs the target image ), Then fine tune the generator weights to improve image specific details
But these jobs need a long time to optimize , Usually a few minutes per image . by comparison ,HyperStyle A super network is trained on a large number of images , Used to complete inversion for any given image , And real-time .
summary : More accurate inversion + Good editability + Near real time (Encoder)
Network structure
(1) First, the picture goes through a encoder Generate initial inversion picture
(2) The original Input and Initial Inversion Picture as Input, Send in HyperStyle The Internet ,Input The number of channels is 6, after ResNet Backbone Output is 16*16*512
hyperstyle.py
#x:3*h*w y:3*h*w
x_input = torch.cat([x, y_hat], dim=1)
ResNet BackBone The structure is
# 1 Layer convolution +4 layer resnet34 The network layer
self.conv1 = nn.Conv2d(opts.input_nc, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = BatchNorm2d(64)
self.relu = PReLU(64)
resnet_basenet = resnet34(pretrained=True)
blocks = [
resnet_basenet.layer1,
resnet_basenet.layer2,
resnet_basenet.layer3,
resnet_basenet.layer4
]
(3)Refinement Block Is to optimize vector parameters , But there are too many parameters in the training process , To solve this problem , Introduced Shared Refinement Block, As shown in the figure below :
stay 3*3*512*512 Non - toRGB Two fully connected layers are shared between layers . This effectively saves the amount of training parameters .
p.s: 3*3*512*512 Signification :kernel size * kernel size * input depth * output depth
The specific layer can be seen in :blocks
Why not toRGB Layers share weights ?
Because the experiment found that this would affect GAN Ability to edit .
(4)Generator Divided into 3 A level :coarse,medium,fine, Different levels control different levels of the generated image . because initial inversion Tends to capture coarse details , So we send network input into medium and fine generator layer .
(5) Iterative refinement , to update generator The weight , Ensure more accurate inversion .
Training Losses
Because it's similar to training encoder, So we used pixel-wise L2 loss and LPIPS perceptual loss
For facial areas , Further use of Loss of identity based similarity ( A pre trained face recognition network is used )
For non facial areas , Used a MoCo-based Similarity degree loss
L 2 ( x , y ^ ) + λ LPIPS L LPIPS ( x , y ^ ) + λ sim L sim ( x , y ^ ) \mathcal{L}_{2}(x, \hat{y})+\lambda_{\text {LPIPS }} \mathcal{L}_{\text {LPIPS }}(x, \hat{y})+\lambda_{\text {sim }} \mathcal{L}_{\text {sim }}(x, \hat{y}) L2(x,y^)+λLPIPS LLPIPS (x,y^)+λsim Lsim (x,y^)
summary
We introduce a novel StyleGan Inversion method :HyperStyle, Using the latest development of hypernetworks to realize approximation encoder Reasoning time optimization-level Optimize . In a sense ,HyperStyle Use the given target image to continuously learn and adjust 、 The generator is effectively optimized . This can reduce the cost of reconstruction - The tradeoff between editability , And can effectively use the existing editing technology on a wide range of input . Besides ,HyperStyle Very well generalized , Even extraterritorial images that are not in the training set , Both the super network and the generator can invert very well . Looking forward to the future , It is highly desirable to further expand generalization from the field of training . This includes robustness to misaligned images and unstructured domains . The former may pass StyleGAN3 Be solved , The latter may need to be trained on a richer image set .
Generate the image
Comparative display of inversion ability :
Comparative display of editability :
copyright notice
author[Ericam_],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/135/202205142330300645.html
The sidebar is recommended
- Single cell column - how to give orig Ident, change your name
- Fonts best practices
- Wonderful express | April issue of Tencent cloud database
- Illustration: what is the difference between layer 2 and layer 3 switches?
- Activity Notice | timing adjustment of 2022 deterministic network technology and Innovation Summit
- In order to seize the capacity of 5nm chips, AMD will pay an advance payment of US $6.5 billion to TSMC, grofangde and other suppliers; Germany will adopt stricter antitrust rules for Google meta
- It is reported that TSMC will promote the 1.4 nm process next month; Taobaoyuan universe trademark rejected
- Online binary 8-hexadecimal conversion tool
- [paper notes] epsanet: an efficient pyramid sequence attention block on revolutionary neural network
- IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [2], [3]
guess what you like
What are the development stages of time series database in recent years?
What are the shortcomings of the data model processed in the first stage of time series database?
What are the shortcomings of the data model processed in the second stage of time series database?
What are the development trends of time series database?
What are the characteristics of cloud native multimode database lindorm?
What are the functions of cloud native multimode database lindorm?
Variance, standard deviation, mathematical expectation
Two dimensional Gaussian distribution
Collaborative process and channels (CSP: kotlin, golang)
SQLite3 custom function (UDF)
Random recommended
- SQLite3 minimalist Tutorial & go operating data structures using SQLite memory mode
- Penetration test - DNS rebinding
- The pytoch loading model only imports some layer weights, that is, it skips the method of specifying the network layer
- Parameter and buffer in pytoch model
- torch. nn. functional. Interpolate function
- Specify the graphics card during pytorch training
- [paper notes] Dr TANet: dynamic receptive temporary attention network for street scene change detection
- [MQ] achieve mq-08- configuration optimization from scratch fluent
- New signs are taking place in the Internet industry, and a new transformation has begun
- ACL 2022 | visual language pre training for multimodal attribute level emotion analysis
- Cvpr2022 | latest progress in small sample behavior recognition strm framework, spatio-temporal relationship modeling is still the top priority
- Hallucinations in large models
- Is it safe to open an account online? Which of the top ten securities companies are state-owned enterprises?
- [encapsulation tips] encapsulation of list processing function
- Start with Google sea entrepreneurship accelerator - recruitment and start
- Hard core preview in May! Lecture tomorrow night: virtio virtualization technology trend and DPU practice | issue 16
- Druid source code reading 1 - get connection and release connection
- Graduation summary of actual combat training camp
- Public offering "imported products" temporarily hit the reef? The first foreign-funded public offering BlackRock fund has a lot of bad thoughts or a lot of things. It is acclimatized and the performance of the two products is poor
- Introduction and installation of selenium module, use of coding platform, use of XPath, use of selenium to crawl JD product information, and introduction and installation of sketch framework
- Financial IT architecture - Analysis of cloud native architecture of digital bank
- [paper notes] lsnet: extreme light weight Siamese network for change detection in remote sensing image
- Mock tool equivalent to Fiddler!
- Write a program, input several integers (separated by commas) and count the number of occurrences of each integer.
- Inventory a voice conversion library
- Technology selection of micro service registration center: which of the five mainstream registration centers is the most popular?
- Summary of root cause analysis ideas and methods | ensuring it system and its stability
- JS custom string trim method
- Web3: the golden age of Creator economy
- Introduction and installation of selenium module, use of coding platform, use of XPath, use of selenium to crawl JD product information, and introduction and installation of sketch framework
- Basics: a 5-makefile example
- Database connection pool Druid source code learning (V)
- Check the box to prevent bubbling
- Click on the box to change color
- Local style and global style of uniapp
- LeetCode. 2233. Maximum product after K increases (math / minimum heap)
- Overview, BGP as, BGP neighbor, BGP update source, BGP TTL, BGP routing table, BGP synchronization
- Routing policy and routing control
- Principle and configuration of IS-IS
- Basic operation of linked list (complete code)