This post summarizes a bunch of connected tricks and methods I explored with the help of my co-authors. Following the previous post, above the stability properties of GANs, the overall aim was to improve our ability to train generative models stably and accurately, but we went through a lot of variations and experiments with different methods on the way. I’ll try to explain why I think these things worked, but we’re still exploring it ourselves as well. The basic problem is that generative neural network models seem to either be stable but fail to properly capture higher-order correlations in the data distribution (which manifests as blurriness in the image domain), or they are very unstable to train due to having to learn both the distribution and the loss function at the same time, leading to issues like non-stationarity and positive feedbacks. The way GANs capture higher order correlations is to say ‘if there’s any distinguishable statistic from real examples, the discriminator will exploit that’. That is, they try to make things individually indistinguishable from real examples, rather than in the aggregate. The cost of that is the instability arising from not having a joint loss function – the discriminator can make a move that disproportionately harms the generator, and vice versa.
via http://www.araya.org/archives/1306