DCGAN
Title: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Abstract
CNN has been used in mutiple supervised tasks, but rarely used in unsupervised learning.
In comination with GAN, this article introduced DCGANs, showing convincing evidence of the power in generating data.
Introduction
- Target(In a nutshell): Build good image representations
- Methods: Training GANs and later reusing parts of the generator and discriminator networks as future extractors
- Target(in details):
- valuate a set of constraints on the architectural topology(拓扑结构) of Convolutional GANs that make them stable to train in most settings(DCGAN)
- use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms
- Visualize the filters learnt by GANs, and show specific filters have learned to draw specific objects
- Generators has interesting vector arithmetic properties, which allowing for easy manipulation of semantic(语义) qualities of generated samples
Related Work
Representation Learning from Unlabeled Data
- K-means
- auto-encoder
- Deep belief networks
Generating Natural Images
- Parametric methods
- non-parametric models
- do matching from a database of existing images, often matching patches of images, have been used in texture synthesis, super-resolution, in-painting
- Parameter models
- A variational sampling approach to generating images: suffer from being blurry
- Using an iterative forward diffusion process
- Generative Adversarial Networks
- Laplacian Pyramid Extension to GANs: still suffered from the objects looking wobbly
- A recurrent network and deconvolution network approach
Visualizing the internal of CNNs
using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters
Approach and Model Architecture
HIstorical attempt to training GAN using CNN has been unsuccessful, for unstable training process and other reasons; This article changes the structure of CNN and resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.
All convolutional net
Using all convolutional networks in place of deterministic spatial pooling functions and strided convolutions in generator and discriminator, which allowing all convolutional networks learning its own spatial upsampling.
Trend towards eliminating fully connected layers on top of convolutional features
global average pooling increased model stability but hurt convergence speed, so this article proposed that input and output would directly connected in generator and discriminator
Batch Normalization
Directly apply Batch normalization to all layers would result in sample oscillation and model instability, thus this article avoid adding batch normalization in generator output layer and discriminator input layer
ReLU activation
In contrast to original GAN paper(Using maxout activation), this article used LeakyReLU
in other layer and tanh
in output layer
Training detail
- pre-processing: Scaling input image to range of
tanh
activation funcntion $[-1, 1]$ - batchsize: mini-batch size of 128
- weight initalization: zero-centered normalization distribution with stantard deviation 0.02
- LeakyReLU: negative slope=0.2
- Adam optimizer with $\beta=0.5$
- learning rate = 0.0002
Using GAN as feature extractor(Discriminator)
Omission
Manipulating the generator representation(Generator)
Omission(The adding and substrating in generator model, reducing the data needed when facing a complex data distribution)
Original address: DCGAN