Loss

Moving Lands

This gallery showcases a few examples of the video pieces produced by the LL project. The creations below belong to Phase 1. Phase 2 is currently ongoing. Writings that go deep into these and other visualizations are being prepared.

LATENT visualizes the initial stages of the training of a Wasserstein GP GAN network, trained over the celebA dataset. The first part of the video shows the first 1K steps of the training, and the final part shows the steps from 10K to 11K. The middle part shows part of the loss landscape of the Generator after the first 820 training steps. The morphology and dynamics of the generator’s landscape around the minimizer are very diverse and change quickly, expressing the complexity of the generator’s task and the array of possible routes existing ahead. Loss Landscape generated with real data: wasserstein GP Gan, celebA dataset, sgd-adam, bs=64, train mod, 300k pts, 1 w range, latent space dimensions: 200, generator is sometimes reversed for visual purposes, critic is log scaled (orig loss nums) & vis-adapted.

ICARUS Mode Connectivity. Optima of complex loss functions connected by simple curves over which training and test accuracy are nearly constant. This visualization uses real data and shows the training process that connects two optima through a pathway generated with a bezier curve. To create ICARUS, 15 GPUs were used over more than 2 weeks to produce over 50 million loss values. The entire process end to end took over 4 weeks of work.

Icarus takes its name from greek mythology. Let’s see what Wikipedia says about it: “In Greek mythology, Icarus is the son of the master craftsman Daedalus, the creator of the Labyrinth. Icarus and his father attempt to escape from Crete by means of wings that his father constructed from feathers and wax.”. Now we can establish an analogy. The loss landscape is like another labyrinth where our objective, our escape is to find a low enough valley, the target of our optimizer. Yet this is no ordinary labyrinth. The loss landscape is highly dimensional and unlike in typical labyrinths, in a loss landscape it is indeed possible to find other ways, shortcuts that can link some of those optima. So just as Daedalus and Icarus use special wings to escape Crete, the authors of the paper combine simple curves (in this specific video, a bezier curve) and their custom training process to escape the isolation between the optima, demonstrating that even though straight lines connecting the optima must go through hills with  very high loss values, there are other pathways, other ways to connect the optima, through which training and test accuracy remain nearly constant. In addition, the morphology of the connected optima represented in this video, resembles a set of wings. These wings come to life within the different strategies applied by these modern “Icarus” like scientists as they pursue new ways to escape the isolation of the optima found in these kinds of loss landscapes.

Visualization data generated through a collaboration between Pavel Izmailov (@Pavel_Izmailov), Timur Garipov (@tim_garipov) and Javier Ideami (@ideami). Based on the NeurIPS 2018 paper by Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson: https://arxiv.org/abs/1802.10026 | Creative visualization and artwork produced by Javier Ideami.

UNCERTAIN DESCENTNeurIPS 2019, ARXIV:1902.02476 / swa-gaussian (swag). a simple baseline for bayesian uncertainty in deep learning.

Real Data visualizations using PCA directions. From the authors of the paper: “Machine learning models are used to make decisions, and representing uncertainty is crucial for decision making, especially in safety-critical applications. Deep learning models trained by minimizing the loss on the train dataset tend to provide overconfident and miscalibrated predictions because they ignore uncertainty over the parameters of the model. In Bayesian machine learning we account for this uncertainty: we form a distribution over the weights of the model, known as posterior. This distribution captures different models that all explain train data well, but provide different predictions on the test data. For Neural networks the posterior distribution is very complex: there is no way to compute it exactly and we have to approximate it. A key challenge for approximate inference methods is to capture the geometry of the posterior distribution or, equivalently, the loss landscape.

The idea of our SWAG is to extract the information about the posterior geometry from the SGD trajectory. We start by pre-training a Neural Network with SGD, Adam or any other optimizer, to get a good initial solution. This part is the same as the standard training of the model. Starting from the pre-trained solution, we run SGD with a high constant learning rate. In this setting instead of converging to a single solution, SGD would bounce around different solutions that all explain the train data well. We then construct a Gaussian distribution that captures these different solutions traversed by SGD, and use it as our approximation to the posterior. It turns out that this simple procedure captures the local Geometry of the posterior remarkably well.”

Based on the paper by wesley maddox, timur garipov, pavel izmailov, dmitry vetrov, andrew gordon wilson. Visualization is a collaboration between pavel izmailov, timur garipov and javier ideami@losslandscape.com. NeurIPS 2019, ARXIV:1902.02476 | losslandscape.com..

LR COASTER visualizes a learning rate stress test during the training of a convnet. We ride along the minimizer while exploring its nearby surroundings. I use extreme changes in the learning rate to illustrate how the morphology and dynamics of the loss landscape change in response to the changes in the learning rate. The resolution (300K loss values calculated per frame) allows us to explore the change in morphology. More details and related analysis about this and other visualizations will be published in the future.

SENTINEL visualizes the optimization process of a convnet during training mode, moving from a high loss value through the creation of an edge horizon to the final convexity and minimum. We ride along the minimizer while exploring its nearby surroundings. More details and related analysis about this and other visualizations will be published in the future.

WALTZ-RES visualizes the difference in morphology and dynamics between two resnet-25 networks, one with skip connections and one without. In this fragment of the visualization, we can see the first 2 and a half epochs of the training process. We ride along the minimizer while exploring its nearby surroundings. More details and related analysis about this and other visualizations will be published in the future.

EDGE HORIZON visualizes a loss landscape in extreme resolution, using 1 million loss points captured during the training of a convnet. The morphology of the landscape during the training phase is influenced by the parameters of the network. More details and related analysis about this and other visualizations will be published in the future.

GOBLIN takes us on a journey from above the edge horizon of the loss landscape of a convnet, during its training process, through the edge horizon (laterally) and to the perspective from below its dynamic convexity. We ride along the minimizer while exploring its nearby surroundings. More details and related analysis about this and other visualizations will be published in the future.

DOWN UNDER goes deep below the loss landscape of the training process of a convnet (while training mode is active), giving us a perspective from below, as the minimizer’s dynamics transform the nearby surroundings during its journey towards its final destination. We ride along the minimizer while exploring its nearby surroundings. More details and related analysis about this and other visualizations will be published in the future.

GENTLY follows the gentle change in the surroundings of the minimizer as we follow its gradual descent. We ride along the minimizer while exploring its nearby surroundings. More details and related analysis about this and other visualizations will be published in the future.

Preparation phase

The gallery above will be expanded with more creations and associated writings over time. Before I began creating my own landscapes, there was a preparation phase in which I worked with existing data from other sources. An example of that phase is the landscape right below (which uses data from the excellent paper “Visualizing the Loss Landscape of Neural Nets” by Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein. I also produced simulations like the last video of the gallery, which is a simulation that was created before the Loss Landscape project began. All the Loss Landscape videos use real data and real networks except the very last one of this page, which was also the very first loss landscape video I created.

LL is led by Javier Ideami, A.I researcher, multidisciplinary creative director, engineer and entrepreneur. Contact Ideami on ideami@ideami.com

xyz

Let’s Get Started

Going deep