DCGAN

Title: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Abstract

CNN has been used in mutiple supervised tasks, but rarely used in unsupervised learning.
In comination with GAN, this article introduced DCGANs, showing convincing evidence of the power in generating data.

Introduction

Target(In a nutshell): Build good image representations
Methods: Training GANs and later reusing parts of the generator and discriminator networks as future extractors
Target(in details):
- valuate a set of constraints on the architectural topology（拓扑结构） of Convolutional GANs that make them stable to train in most settings(DCGAN)
- use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms
- Visualize the filters learnt by GANs, and show specific filters have learned to draw specific objects
- Generators has interesting vector arithmetic properties, which allowing for easy manipulation of semantic(语义) qualities of generated samples

Representation Learning from Unlabeled Data

K-means
auto-encoder
Deep belief networks
Generating Natural Images
Parametric methods
non-parametric models
- do matching from a database of existing images, often matching patches of images, have been used in texture synthesis, super-resolution, in-painting
Parameter models
- A variational sampling approach to generating images: suffer from being blurry
- Using an iterative forward diffusion process
- Generative Adversarial Networks
- Laplacian Pyramid Extension to GANs: still suffered from the objects looking wobbly
- A recurrent network and deconvolution network approach
  Visualizing the internal of CNNs
  using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters

Approach and Model Architecture

HIstorical attempt to training GAN using CNN has been unsuccessful, for unstable training process and other reasons; This article changes the structure of CNN and resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.

All convolutional net

Using all convolutional networks in place of deterministic spatial pooling functions and strided convolutions in generator and discriminator, which allowing all convolutional networks learning its own spatial upsampling.

Trend towards eliminating fully connected layers on top of convolutional features

global average pooling increased model stability but hurt convergence speed, so this article proposed that input and output would directly connected in generator and discriminator

Batch Normalization

Directly apply Batch normalization to all layers would result in sample oscillation and model instability, thus this article avoid adding batch normalization in generator output layer and discriminator input layer

ReLU activation

In contrast to original GAN paper(Using maxout activation), this article used LeakyReLU in other layer and tanh in output layer

Training detail

pre-processing: Scaling input image to range of tanh activation funcntion $[-1, 1]$
batchsize: mini-batch size of 128
weight initalization: zero-centered normalization distribution with stantard deviation 0.02
LeakyReLU: negative slope=0.2
Adam optimizer with $\beta=0.5$
learning rate = 0.0002

Using GAN as feature extractor(Discriminator)

Omission

Manipulating the generator representation(Generator)

Omission(The adding and substrating in generator model, reducing the data needed when facing a complex data distribution)

Original address: DCGAN