GAN
Abstract
- Training target: In the space of arbitrary function $G$ and $D$, a unique solution exists, with $G$ recovering the training data distribution and $D$ equal to $\frac{1}{2}$ everywhere.
- Advantage: There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Introduction
- Difficulties: Deep Generative models have had less of an impact, due to the difficulty of approximating many intractable probablilistic computations that arise in maximum likelihood estimation and related strategies.
- Principle:
- Generative: A multilayer perceptron
- Discriminative model: A multilayer perceptron
Related work
Generative ModelAdversarial nets
Core Function
Worth taking note when training
- Optimizing D to completion in the inner loop of training is computationally prohibitive, and on finite datasets would result in overfitting.Instead, we alternate betwwen k steps of optimizing D and one step of optimizing G.
- At begining, G is so poor that D could easily discriminate pictures, so $log(1-D(G(z)))$ always saturates, wo rather than training G to minimize $log(1-D(G(z)))$, we can train G to maximize $log(D(G(z)))$, providing stronger gradients early in learning.
Theoretical Results
Algorithm
Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k = 1, the least expensive option, in our experiments.
for number of training iterations do
for k steps do
Sample minibatch of m noise sample {z(1), …, z(m)} from noise prior pg(z)
Sample minibatch of m examples {x(1), …, x(m)} from data generating distribution $p_{data}(\mathbf{x})$
Update the dicscriminator by ascending its stochastic gradient:
end for
sample minibatch of m noise samples {z(1), …, z(m)} from noise prior $p_g(\mathbf{z})$
Update the generator by descending tis stochastic gradiant
end for
Global Optimality of pg=pdata
Conclution 1
For G fixed, the optimal discriminator D is
Theorem 1
The global minimum of the virtual training criterion C(G) is achieved if and only if $pg = p{data}$. At that point, C(G) achieves the value − log 4.
Proposition 2
If G and D have enough capacity, and at each step of Algorithm 1, the discriminator is allowed to reach its optimum given G, and pg is updated so as to improve the criterion
then $p_g$ converges to $p_data$
Experiments
- Generator: Mixture of rectifier linear activations and sigmoid activations
- Discriminator: maxout activations + Dropout
Original address: GAN