What specific features should visual neurons encode, given the infinity of real-world images and the limited number of neurons available to represent them? We investigated neuronal selectivity in monkey inferotemporal cortex via the vast hypothesis space of a generative deep neural network, avoiding assumptions about features or semantic categories. A genetic algorithm searched this space for stimuli that maximized neuronal firing. This led to the evolution of rich synthetic images of objects with complex combinations of shapes, colors, and textures, sometimes resembling animals or familiar people, other times revealing novel patterns that did not map to any clear semantic category. These results expand our conception of the dictionary of features encoded in the cortex, and the approach can potentially reveal the internal representations of any system whose input can be captured by a generative model.
That something else, call it imagination or call it dreaming, does not require validation with immediate reality. The closest incarnation we have today is the generative adversarial network (GAN). A GAN consists of two networks, a generator and a discriminator. One can consider a discriminator as a neural network that acts in concert with the objective function. That is, it validates an internal generator network with reality. The generator is an automation that recreates an approximation of reality. A GAN works using back-propagation and it does perform unsupervised learning. So perhaps unsupervised learn doesn’t require an objective function, however it may still need back-propagation.
“The video, called “Alternative Face v1.1”, is the work of Mario Klingemann, a German artist. It plays audio from an NBC interview with Ms Conway through the mouth of Ms Hardy’s digital ghost. The video is wobbly and pixelated; a competent visual-effects shop could do much better. But Mr Klingemann did not fiddle with editing software to make it. Instead, he took only a few days to create the clip on a desktop computer using a generative adversarial network (GAN), a type of machine-learning algorithm. His computer spat it out automatically after being force fed old music videos of Ms Hardy. It is a recording of something that never happened.”
This post summarizes a bunch of connected tricks and methods I explored with the help of my co-authors. Following the previous post, above the stability properties of GANs, the overall aim was to improve our ability to train generative models stably and accurately, but we went through a lot of variations and experiments with different methods on the way. I’ll try to explain why I think these things worked, but we’re still exploring it ourselves as well. The basic problem is that generative neural network models seem to either be stable but fail to properly capture higher-order correlations in the data distribution (which manifests as blurriness in the image domain), or they are very unstable to train due to having to learn both the distribution and the loss function at the same time, leading to issues like non-stationarity and positive feedbacks. The way GANs capture higher order correlations is to say ‘if there’s any distinguishable statistic from real examples, the discriminator will exploit that’. That is, they try to make things individually indistinguishable from real examples, rather than in the aggregate. The cost of that is the instability arising from not having a joint loss function – the discriminator can make a move that disproportionately harms the generator, and vice versa.