If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:6:41

when we observe the physical world we find random fluctuations everywhere we can generate truly random numbers by measuring random fluctuations known as noise when we measure this noise known as sampling we can obtain numbers for example if we measure the electric current of TV static over time we will generate a truly random sequence we can visualize this random sequence by drawing a path that changes direction according to each number known as a random walk notice the lack of pattern at all scales at each point in the sequence the next move is always unpredictable random processes are said to be non-deterministic since they are impossible to determine in advance machines on the other hand are deterministic their operation is predictable and repeatable in 1946 John von Neumann was involved in running computations for the military specifically involved in the design of the hydrogen bomb using a computer called the ENIAC he planned to repeatedly calculate approximations of the processes involved in nuclear fission however this required quick access to randomly generated numbers that could be repeated if needed however the ENIAC had very limited internal memory storing long random sequences was not possible so Neumann developed an algorithm to mechanically simulate the scrambling aspect of randomness as follows first select a truly random number called the C's this number can come from the measurement of noise or the current time in milliseconds next this seed is provided as input to a simple calculation multiply the seed by itself and then output the middle of this result then you use this output as the next seed and repeat the process as many times as needed this is known as the middle squares method and is just the first in a long line of pseudo-random number generators the randomness of the sequence is dependent on the randomness of the initial seed only same seed same sequence so what are the differences between a randomly generated versus pseudo randomly generated sequence let's represent each sequence as a random walk [Music] they seem similar until we speed things up the pseudo-random sequence must eventually repeat this occurs when the algorithm reaches a seed it has previously used and the cycle repeats the length before a pseudo-random sequence repeats is called the period the period is strictly limited by the length of the initial seed for example if we use a two-digit seed then an algorithm can produce at most 100 numbers before reusing a seed and repeating the cycle a three-digit seed can expand past a thousand numbers before repeating its cycle and a four-digit seed can expand past 10,000 numbers before repeating though if we use a seed large enough the sequence can expand into trillions and trillions of digits before repeating though the key difference is important when you generate numbers pseudo randomly there are many sequences which cannot occur for example if Alice generates a truly random sequence of twenty shifts it's equivalent to a uniform selection from the stack of all possible sequences of shifts this stack contains 26 to the power 20 pages which is astronomical in size if we stood at the bottom and shined a light upwards a person at the top would not see the light for around 200 million years compare this to Alice generating a 20 digit pseudo-random sequence using a four-digit random seed now this is equivalent to a uniform selection from 10,000 possible initial seeds meaning she can only generate 10,000 different sequences which is a vanishingly small fraction of all possible sequences when we move from random to pseudo-random shifts we shrink the key space into a much much smaller seed space so for a pseudo-random sequence to be indistinguishable from a randomly generated sequence it must be impractical for a computer to try all seeds and look for a match this leads to an important distinction in computer science between what is possible versus what is possible in a reasonable amount of time we use the same logic when we buy a bike lock we know that anybody can simply try all possible combinations until they find a match and it opens but it would take them days to do so though for 8 hours we assume it's practically safe with pseudo-random generators the security increases as the length of the seed increases if the most powerful computer would take hundreds of years to run through all seeds then we safely can assume it's practically secure instead of perfectly secure as computers get faster the seed size must increase accordingly pseudo randomness frees Alice and Bob from having to share their entire random shift sequence in advance instead they share a relatively short random seed and expand it into the same random looking sequence when needed but what happens if they can never meet to share this random seed