Introduction to Machine Learning CS 675 Dr. Przemyslaw Musialski Department of Computer Science New Jersey Institute of Technology 56 Generating Data (is exciting) https://arxiv.org/pdf/1708.05509.pdf 57 Generating Data (is exciting) 58 Autoencoder ΰ·π We train the two networks by minimizing the reconstruction loss function: π = ∑ ππ − π x π π₯ΰ· Latent Space Z ENCODER DECODER This is an autoencoder. It gets that name because it automatically finds the best way to encode the input so that the decoded version is as close as possible to the input. 59 60 Outline Motivation for Variational Autoencoders (VAE) Mechanics of VAE Seperability of VAE The math behind everything Generative models GANs: Generative Adversarial Networks 62 Variational Autoencoders 63 VAE Likelihood Neural network z ~ N (0, I) pq (x) = ò pq ( x z)pq (z)dz z q X Difficult to approximate in high dim through sampling For most z values p(x|z) close to 0 64 VAE Likelihood f Another neural net z pq (x) = ò pq ( x z) qf ( z x) dz z q X Proposal distribution: Likely to produce values of x for which p(x|z) is non-zero 65 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 66 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 67 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 68 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 69 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 70 Blending Latent Variables SD s ENCODER DECODER Latent Space Mean m 71 Blending Latent Variables Train with 1st sample again SD s ENCODER DECODER Latent Space Mean m 72 Blending Latent Variables SD s Latent Space ENCODER Latent space starts to re-organize DECODER Train with 1st sample again Mean m 73 Blending Latent Variables SD s Latent Space ENCODER 3 is pushed away DECODER And again… Mean m 74 Blending Latent Variables Many times… SD s ENCODER DECODER Latent Space Mean m 75 Blending Latent Variables Now lets test again SD s ENCODER DECODER Latent Space Mean m 76 Blending Latent Variables Training on 3’s again SD s ENCODER DECODER Latent Space Mean m 77 Blending Latent Variables Many times… SD s ENCODER DECODER Latent Space Mean m 78 VAE Likelihood Neural network z ~ N (0, I) pq (x) = ò pq ( x z)pq (z)dz z q X Difficult to approximate in high dim through sampling For most z values p(x|z) close to 0 80 VAE Likelihood f Another neural net z pq (x) = ò pq ( x z) qf ( z x) dz z q X Proposal distribution: Likely to produce values of x for which p(x|z) is non-zero 81 VAE Architecture Mean m qΟ(z|x) Sample from N (m, s ) Encoder SD s z Decoder pθ(x|z) 82 VAE Loss Reconstruction Loss ( ) Proposal distribution should resemble a Gaussian ( ) -Ez~qf ( z x) log pq ( x z) + KL qf ( z x ) pq (z) 83 VAE Loss Reconstruction Loss ( ) Proposal distribution should resemble a Gaussian ( ) -Ez~qf ( z x) log pq ( x z) + KL qf ( z x ) pq (z) 84 Training VAE Apply stochastic gradient descent Sampling step not differentiable Use a re-parameterization trick – Move sampling to input layer, so that the sampling step is independent of the model 86 Reparametrization Trick π z Encoder Decoder π 87 Reparametrization Trick π z Encoder Decoder π π = π+πβπ 88 Reparametrization Trick π z Encoder Decoder π π = π+πβπ π ∼ π(0, πΌ) 89 Parameter space VAE 91 Parameter space of AE vs VAE 92 Variational Autoencoders 94 Parameter space VAE 95 Outline Review of AE and VAE GANS Motivation Formalism Training Game Theory, minmax Challenges: • Big Samples • Modal collapse 98 Generating Data Generative Adversarial Network, or GAN. It’s based on a smart idea where two different networks are put against one another, with the goal of getting one network to create new samples that are different from the training data but are so close that the other network can’t tell which are synthetic and which belong to the original training set. . 99 Adversarial 2 Athletes: Messi and Ronaldo. Adversaries! 105 Generative Adversarial Networks (GANs) David Spam Filterer Gary Marketer Yes, it is spam Spam No, it is not spam 106 Generative Adversarial Networks (GANs) Discarded a valid email David checks the results Yes, it is spam Spam Spam Spam Spam Spam Spam No, it is not spam Allow some spams 107 Generative Adversarial Networks (GANs) David and Gary learned what went wrong It was spam, for real Spam Spam Spam Spam Spam Spam It was not spam Yes, it is spam No, it is not spam 108 Generative Adversarial Networks (GANs) Gary Find more sophisticated words than spam David The email contains the word spam Yes, it is spam No, it is not spam 109 Generative Adversarial Networks (GANs) Discarded a valid email David and Gary learned what went wrong Yes, it is spam No, it is not spam Allow fewer spams 110 Generative Adversarial Networks (GANs) David learned what went wrong It was spam, for real It was not spam Yes, it is spam No, it is not spam 111 Generative Adversarial Networks (GANs) Understanding confusion matrix through an example label: it was not spam M prediction: yes, it is spam prediction matches label? Yes or True False Positive true false positive false true + negative False Positive = TP FP FN TN No or False SPAM: POSITIVE TRUE/FALSE: If prediction and true match do not match POSITIVE/NEGATIVE: Prediction class 112 Generative Adversarial Networks (GAN) label: it is spam D prediction: yes, it is spam prediction matches label? Yes No action for discriminator. Generator must do better. Show spam email It was spam, for real It was not spam Yes, it is spam True positive (TP): the discriminator see a spam and predicts correctly. No need for further actions for discriminator. Generator must do a better job. No, it is not spam 113 Generative Adversarial Networks (GANs) label: it is spam D show spam email prediction: it is not spam prediction matches label? No It was spam, for real It was not spam Discriminator learn more about spam Yes, it is spam False Negative (FN): the discriminator see an email and predict it as spam even though it is not. The discriminator learn more No, it is not spam 114 Generative Adversarial Networks (GANs) label: it is not spam D show spam email prediction: it is spam prediction matches label? Yes No Discriminator learn more about spam It was spam, for real It was not spam Yes, it is spam False positive (FP): generator try to fool discriminator and the discriminator fails. The generator succeeded and the generator is forced to improve. No, it is not spam 115 Generative Adversarial Networks (GANs) label: it is not a spam D Show spam email prediction: yes, it is not a spam prediction matches label? Yes No action for discriminator and no action for generator. It was spam, for real It was not spam Yes, it is spam True negative (TN): generator try to fool discriminator, however the discriminator predict correctly. The generator learn what was wrong and try something else. No, it is not spam 116 The Discriminator The discriminator is very simple. It takes a sample as input, and its output is a single value that reports the network’s confidence that the input is from the training set, rather than being a fake. confidence that sample is real Discriminator There are not many restrictions on what the discriminator is. sample 117 The Generator The generator takes as input a bunch of random numbers. sample If we build our generator to be deterministic, then the same input will always produce the same output. Generator In that sense we can think of the input values as latent variables. noise But here the latent variables weren’t discovered by analyzing the input, as they were for the VAE. 118 The Generator Is it really noise? Remember our VAE Mean m qΟ(z|x) Sample from N (m, s ) Encoder SD s z Decoder pθ(x|z) 119 The Generator Is it really noise? Remember our Variational Autoencoder. Mean m qΟ(z|x) Sample from N (m, s ) SD s z Decoder pθ(x|z) 120 The Generator Is it really noise? Remember our Variational Autoencoder. Sample from N (m, s ) z Decoder pθ(x|z) 121 The Generator Is it really noise? Remember our Variational Autoencoder. z Decoder pθ(x|z) 122 The Generator Is it really noise? Remember our Variational Autoencoder. z Generator 123 The Generator So the random noise is not “random” but represents (an image in the example below) in the “latent” space. z Generator 124 Learning The process – known as Learning Round - accomplishes three jobs: 1. The discriminator learns to identify features that characterize a real sample 2. The discriminator learns to identify features that reveal a fake sample 3. The generator learns how to avoid including the features that the discriminator has learned to spot 125 126 Training GANs Sample mini-batch of training images x, and generator codes z Update G using backprop Update D using backprop Training the GAN Error Function Update Fake Discriminator Real False negative (I: Real/D: Fake): In this case we feed reals to the discriminator. The Generator is not involved in this step at all. The error function here only involves the Discriminator and if it makes a mistake the error drives a backpropagation step through the discriminator, updating its weights, so that it will get better at recognizing reals. Reals 130 Training the GAN Error Function Fake Update Discriminator Fake Generator noise True negative (I: Fake/D: Fake): • We start with random numbers going into the generator. • The generator’s output is a fake. • The error function gets a large value if this fake is correctly identified as fake. Meaning that the generator got caught. • Backprop, goes through the discriminator (which frozen) to the generator. • Update the generator, so it can better learn how to fool the discriminator. 131 Training the GAN Error Function Update False positives (I:Fake/D:Real): Here we generate a fake and punish the discriminator if it classifies it as real. Real Discriminator Fake Generator noise 132 Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Discriminator (Neural Network) From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Discriminator (Neural Network) From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Adjust weights by π»ππΊ πΏπ· Discriminator (Neural Network) Generator Loss From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Discriminator (Neural Network) From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Discriminator (Neural Network) Adjust weights by π»ππΊ πΏπ· Generator Loss From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Discriminator (Neural Network) From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Adjust weights by π»ππ· πΏπ· Discriminator (Neural Network) Discriminator Loss From Generator From True Distribution Generative Adversarial Networks Generated distribution π(π₯) Generative Model (Neural Network) π(π₯) ΰ· π§~ππππ True distribution Adjust weights by π»ππΊ πΏπ· Discriminator (Neural Network) Generator Loss From Generator From True Distribution Building GANS: DCGAN Deep Convolutional GAN (DCGAN) -Alex Radford et al. 2016 β β β β Eliminate fully connected layers. Max Pooling BAD! Replace all max pooling with convolutional stride Use transposed convolution for upsampling. Use Batch normalization 145 Building GANS: DCGAN DCGAN on MNIST Generated digits 146 Building GANS: DCGAN DCGAN Interpretable Vector Arithmetic Parallels to Autoencoder Case https://arxiv.org/pdf/1511.06434v2.pdf 147 Evolution of GANs 5 Years of Improvement in Artificially Generated Faces https://twitter.com/goodfellow_ian/status/969776035649675265?lang=en 148 Evolution of GANs 149 150 Challenges Also there is no proof that they will converge. GANs do seem to perform very well most of the time when we find the right parameters, but there’s no guarantee beyond that. 151 GAN Lab Play with Generative Adversarial Networks (GANs) in your browser https://poloclub.github.io/ganlab/ 152 https://arxiv.org/pdf/1809.11096.pdf 153