GAN

Abstract

Training target: In the space of arbitrary function $G$ and $D$, a unique solution exists, with $G$ recovering the training data distribution and $D$ equal to $\frac{1}{2}$ everywhere.
Advantage: There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.

Introduction

Difficulties: Deep Generative models have had less of an impact, due to the difficulty of approximating many intractable probablilistic computations that arise in maximum likelihood estimation and related strategies.
Principle:
- Generative: A multilayer perceptron
- Discriminative model: A multilayer perceptron
  Related work
  Generative Model
  Adversarial nets
  Core Function
  $\mathop{min}\limits_{G}\mathop{max}\limits_{D}V(D, G)=\mathbb{E}_{\mathbb{x}\sim p_{data}}[logD(\mathbf{x})]+\mathbb{E}_{\mathbb{z}\sim p_{\mathbf{z}}(\mathbf{z})}[log(1-D(G(\mathbf(\mathbf{z})))]$
  Worth taking note when training
Optimizing D to completion in the inner loop of training is computationally prohibitive, and on finite datasets would result in overfitting.Instead, we alternate betwwen k steps of optimizing D and one step of optimizing G.
At begining, G is so poor that D could easily discriminate pictures, so $log(1-D(G(z)))$ always saturates, wo rather than training G to minimize $log(1-D(G(z)))$, we can train G to maximize $log(D(G(z)))$, providing stronger gradients early in learning.

Theoretical Results

Algorithm

Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k = 1, the least expensive option, in our experiments.

for number of training iterations do
for k steps do
Sample minibatch of m noise sample {z⁽¹⁾, …, z^(m)} from noise prior p_g(z)
Sample minibatch of m examples {x⁽¹⁾, …, x^(m)} from data generating distribution $p_{data}(\mathbf{x})$
Update the dicscriminator by ascending its stochastic gradient:

$\bigtriangledown_{\theta d}\frac{1}{m}\sum\limits_{i=1}^{m}[logD(\mathbf{x}^{i})+log(1-D(G(\mathbf{z}^{i})))]$

end for
sample minibatch of m noise samples {z⁽¹⁾, …, z^(m)} from noise prior $p_g(\mathbf{z})$
Update the generator by descending tis stochastic gradiant

$\bigtriangledown_{\theta g}\frac{1}{m}\sum\limits_{i=1}^{m}log(1-D(G(\mathbf{z}^{i})))$

end for

Global Optimality of p_g=p_data

Conclution 1

For G fixed, the optimal discriminator D is

$D_{G}^{*}(\boldsymbol{x})=\frac{p_{\text {data }}(\boldsymbol{x})}{p_{\text {data }}(\boldsymbol{x})+p_{g}(\boldsymbol{x})}$

Theorem 1

The global minimum of the virtual training criterion C(G) is achieved if and only if $pg = p{data}$. At that point, C(G) achieves the value − log 4.

Proposition 2

If G and D have enough capacity, and at each step of Algorithm 1, the discriminator is allowed to reach its optimum given G, and pg is updated so as to improve the criterion

$\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}}\left[\log D_{G}^{*}(\boldsymbol{x})\right]+\mathbb{E}_{\boldsymbol{x} \sim p_{g}}\left[\log \left(1-D_{G}^{*}(\boldsymbol{x})\right)\right]$

then $p_g$ converges to $p_data$

Experiments

Generator: Mixture of rectifier linear activations and sigmoid activations
Discriminator: maxout activations + Dropout

Original address: GAN

GAN

Abstract

Introduction

Related work

Adversarial nets

Core Function

Worth taking note when training