Demystifying the Lottery Ticket Hypothesis in Deep Learning (2024)

Why lottery tickets are the next big thing in training neural networks

Demystifying the Lottery Ticket Hypothesis in Deep Learning (1)

Published in

Towards Data Science


4 min read


Mar 3, 2022


Training neural networks is expensive. OpenAI’s GPT-3 has been calculated to have a training cost of $4.6M using the lowest-cost cloud GPU on the market. It’s no wonder that Frankle and Carbin’s 2019 Lottery Ticket Hypothesis started a gold rush in research, with attention from top academic minds and tech giants like Facebook and Microsoft. In the paper, they prove the existence of winning (lottery) tickets: subnetworks of a neural network that can be trained to produce performance as good as the original network, with a much smaller size. In the post, I’ll tackle how this works, why it is revolutionary, and the state of research.

Traditional wisdom says that neural networks are best pruned after training, not at the start. By pruning weights, neurons, or other components, the resulting neural network is smaller, faster, and consumes fewer resources during inference. When done right, the accuracy is unaffected while the network size can shrink manifold.

By flipping traditional wisdom on its head, we can consider whether we could have pruned the network before training and achieved the same result. In other words, was the information from the pruned components necessary for the network to learn, even if not to represent its learning?

The Lottery Ticket Hypothesis focuses on pruning weights and offers empirical evidence that certain pruned subnetworks could be trained from the start to achieve similar performance to the entire network. How? Iterative Magnitude Pruning.

When a task like this was tried historically, the pruned networks weights would be reinitialized randomly and the performance would drop off quickly.

The key difference here is that the weights were returned to their original initialization. When trained, the results matched the original performance in the same training time, at high levels of pruning.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (3)

This suggests that these lottery tickets exist, as an intersection of a specific subnetwork and initial weights. They are “winning the lottery,” so to say, as the match of that architecture and those weights perform as well as the entire network. Does this hold for bigger models?

For bigger models, this does not hold true with the same approach. When looking at sensitivity to noise, Frankle and Carbin duplicated the pruned networks and trained them on data ordered differently. IMP succeeds where linear mode connectivity exists, a very rare phenomenon where multiple networks converge to the same local minima. For small networks, this happens naturally. For large networks, it does not. So what to do?

Starting with a smaller learning rate results in IMP working for large models, as sensitivity to initial noise from the data is lessened. The learning rate can be increased over time. The other finding is that rewinding our pruned neural network’s weights to their values at a later training iteration rather than the first iteration works as well. For example, the weights at the 10th iteration in a 1000 iteration training.

These results have held steady across architectures as different as transformers, LSTMs, CNNs, and reinforcement learning architectures.

While this paper proved the existence of these lottery tickets, it does not yet provide a way to identify them. Hence, the gold rush in finding their properties and whether they can be identified before training. They’re also inspiring work in heuristics for pruning early, since our current heuristics are focused on pruning after training.

One Ticket to Win Them All (2019) shows that lottery tickets encode information that is invariant to datatype and optimizers. They are able to successfully transfer lottery tickets between networks trained on different datatypes (e.g. VGG to ImageNet), finding success.

A key indicator was the relative size of the training data for the networks. If the lottery ticket source was trained on a larger dataset than the destination network, it performed better; otherwise, similarly or worse.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (4)

Drawing Early-Bird Tickets (2019): This paper aims to prove that lottery tickets can be found early in training. Each training iteration, they compute a pruning mask. If the mask in the last iteration and this one have a mask distance (using Hamming distance) below a certain threshold, the network stops to prune.

Pruning Neural Networks Without Any Data by Iteratively Conserving Synaptic Flow (2020): This paper focuses on calculating pruning at initialization with no data. It outperforms existing state-of-the-art pruning pruning algorithms at initialization. The technique focuses on maximizing critical compression, the maximum pruning that can occur without impacting performance. To do so, the authors aim to prevent entire layers from being pruned. The network does this by positively scoring keeping layers and reevaluating the score every time the network prunes.

The existence of small subnetworks in neural architectures that can be trained to perform as well as the entire neural network is opening a world of possibilities for efficient training. In the process, researchers are learning a lot about how neural networks learn and what is necessary for learning. And who knows? One day soon we may be able to prune our networks before training, saving time, compute, and energy.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (5)
Demystifying the Lottery Ticket Hypothesis in Deep Learning (2024)


Demystifying the Lottery Ticket Hypothesis in Deep Learning? ›

From a theoretical perspective, the Lottery Ticket Hypothesis suggests that a sufficiently over-parameterized neural network with random weights contains a subnetwork that can achieve roughly the same accuracy or can even acheive better accuracy than the target network, without any further training.

What is the lottery ticket hypothesis neural network? ›

The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation.

What is the lottery hypothesis? ›

The Lottery Ticket Hypothesis (LTH) is the ultimate representation of the 80–20 principle in Deep Learning. It posits that within randomly initialized, dense neural networks lie subnetworks capable of achieving the same performance as the full network after training, but with significantly fewer parameters.

What is winning lottery tickets in deep generative models? ›

The lottery ticket hypothesis suggests that sparse, sub- networks of a given neural network, if initialized properly, can be trained to reach comparable or even better perfor- mance to that of the original network.

What is the machine learning model for lottery? ›

Machine Learning in Number-Drawn Lotteries

By examining past lottery results, machine learning models can identify patterns, frequency of certain numbers, and other statistical anomalies that may provide insights into potential winning combinations.

What is the real purpose of The Lottery theory? ›

The lottery paradox was designed to demonstrate that three attractive principles governing rational acceptance lead to contradiction: It is rational to accept a proposition that is very likely true. It is irrational to accept a proposition that is known to be inconsistent and is jointly inconsistent.

Can Benford's law predict The Lottery? ›

While Benford's Law very accurately predicts the distribution of digits in most large sets of data, it doesn't always work. For example, it can't be used to predict which numbers will help you win the lottery. This is because, in the lottery, each digit in the number is truly random.

What is The Lottery model? ›

The lottery model is a stochastic population model in which juveniles compete for space.

What is The Lottery model evolution? ›

Hence, the lottery model predicted that sexual reproduction would be more common in temporally variable environments, and that asexual reproduction would be more common in stable environments.

How was the problem solved in The Lottery? ›

Tessie is eventually chosen to be killed by the townspeople, which happens by chance but reads as a sort of karmic payback for speaking up against the system. The story is so short that the conflict unfolds fast and ends with Tessie's death at the hands of everyone she knows.

What is deep generative model in deep learning? ›

CVAE is a deep generative model that can provides semi supervised learning through generating outputs based on inputs and auxiliary covariates. From: Engineering Applications of Artificial Intelligence, 2024.

What is the lottery system in Blockchain? ›

For example, a blockchain lottery allows users to purchase lottery tickets in exchange for contributing their cryptocurrency to the prize pool, but your lottery tickets do not expire as they do in traditional lotteries.

How does the lottery method work? ›

A lottery's system typically involves picking a set of numbers, usually between 1-49. Players choose six numbers, either by purchasing tickets with predetermined numbers or by selecting the numbers themselves. Once all tickets have been sold, the lottery host will draw six numbers (known as the winning numbers).

What is the formula for the lottery algorithm? ›

Understand the calculations involved.

To find the odds of winning any lottery, divide the number of winning lottery numbers by the total number of possible lottery numbers. If the numbers are chosen from a set and the order of the numbers doesn't matter, use the formula. r ! ( n − r ) !

Can Chatgpt predict lottery numbers? ›

Of course this is not a magical tool that will give you the winning numbers, but its fun if you don't want to pick the numbers yourself, or don't trust the randomized numbers given by the lottery games.

Can AI be used to predict the lottery? ›

The lottery is designed to be completely random, making it impossible for any program or machine learning technique to consistently predict the winning numbers.

What is The Lottery hypothesis in ecology? ›

Lottery competition in ecology is a model for how organisms compete. It was first used to describe competition in coral reef fish. Under lottery competition, many offspring compete for a small number of sites (e.g., many fry competing for a few territories, or many seedlings competing for a few treefall gaps).

What is lottery theory of probability? ›

The math behind the odds

In a 6/49 lottery, where you must select six numbers from a possible 49, the odds of choosing all six numbers correctly are about 1 in 13.9 million. This is because the order in which the numbers are drawn does not matter, making it a combination problem in probability theory.

What does The Lottery reveal about human nature and society? ›

Jackson examines the basics of human nature in “The Lottery,” asking whether or not all humans are capable of violence and cruelty, and exploring how those natural inclinations can be masked, directed, or emphasized by the structure of society.


Top Articles
Latest Posts
Article information

Author: Amb. Frankie Simonis

Last Updated:

Views: 5777

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Amb. Frankie Simonis

Birthday: 1998-02-19

Address: 64841 Delmar Isle, North Wiley, OR 74073

Phone: +17844167847676

Job: Forward IT Agent

Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.