If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Origin of Markov chains

Introduction to Markov chains. Created by Brit Cruise.

Want to join the conversation?

  • orange juice squid orange style avatar for user jmullercuber
    Could Markov chains be considered a basis of some (random) cellular automaton? I mean, each Markov chain represents a cell, the state of the cell is that of the chain, and the probabilities of switching a state could be replaced with an algorithm. Then you could arrange lots of chains on a grid, and get an automaton?
    (16 votes)
    Default Khan Academy avatar avatar for user
    • leafers sapling style avatar for user Peter Collingridge
      Interesting idea. Generally cellular automata are deterministic and the state of each cell depends on the state of multiple cells in the previous state, whereas Markov chains are stochastic and each the state only depends on a single previous state (which is why it's a chain).

      You could address the first point by creating a stochastic cellular automata (I'm sure they must exist), or by setting all the probabilities to 1. The second point is a bit more tricky. You could create a state for each possible universe of states (so if you had a 3x3 grid and each cell could be on or off, you'd have 2^9 = 512 states) and then create a Markov to represent the entire universe, but I'm not sure how useful that would be.

      If you created a grid purely of Markov chains as you suggest, then each point in the cellular automata would be independent of each other point, and all the interesting emergent behaviours of cellular automata come from the fact that the states of the cells are dependent on one another.

      I suspect a cellular automata is a more general system than a Markov chain in that each state has multiple inputs (but then it is less general in that all the probabilities are generally equal to 1).
      (15 votes)
  • orange juice squid orange style avatar for user bhatti.anwaar
    How is the "central limit theorem" a dangerous idea and how did it correspond to the idea of eugenics as implied by the paper at ?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user brit cruise
      If you read more into Eugenics you'll see that the binomial distribution underlies (and one could argue justifies) some of the troubling implications of the movement. Though more generally, the topic of free will and determinism is a passionate/touchy subject as it's often tied to ones belief system(s)
      (14 votes)
  • male robot hal style avatar for user David Cian
    I've got a question. I was doing this experiment at home with the 2 glasses, and in each of the two glasses there were 2 red candies and 2 white candies. Except I got to the point where I had 4 red candies in the red box, and 4 white candies in the white box. It was Red's turn, which meant I could only pick red candies, thus messing my probabilities up, since the probability of picking a red candy neared 1. Can someone explain what I'm doing wrong please?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user brit cruise
      Cool experiment! The problem is that you should never move a bead from one cup to another. Watch the video again closely. The bead you choose should always go back into the same cup, and based on the color of that bead you move to another cup.

      For example
      if you were to pick a white candy from glass A, you should immediately put it back, output white, and then move to the cup which represents white.
      (12 votes)
  • blobby green style avatar for user MontDor1
    Claude Shannon's original paper and book (pub one year after), was much more narrowly targeted than today's information theory (actually it was called communication theory by Shannon originally, in fact). Why does this mathematical theory have such a huge range of applications to various academic disciplines today? In other words, did Shannon not see the possible connections, or did he just ignore them (except his three tier (Weaver's actually) heuristic). Did Shannon get it wrong, further, in not giving information a real physical series of values. Electrons do have some mass, after all. And they may be able to "collapse the wave function" when bouncing off, or a photon doing the same, a micro-subatomic "wavicle."
    (4 votes)
    Default Khan Academy avatar avatar for user
  • old spice man green style avatar for user Jonathan Ziesmer
    Why is it called a Markov Chain? Why not a Markov Loop?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Squishy
      Near the end of the video, some more complex Markov chains were shown. These look more like connected chains than loops since a loop might imply moving around the same circle over and over again, but the actual movement is more like moving through a chain. The last Markov chain with the proteins actually had no loops.
      (3 votes)
  • leafers ultimate style avatar for user MassMediaBrainwash
    So, if we found the most dependent variable in the world and found its ratio, we could conceivably understand the workings of the entire universe?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • female robot grace style avatar for user Anna
    Okay but this is only a binary marcov chain. What if you had 3 or 4 or 5 different outcomes if it were independent but it is actually dependent?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user Jeff Lima
    Is there a way to calculate what the probability of each outcome (that is, mathematically and not by running a bunch of trials)?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user Noble Mushtak
      Yes, I've done this and Brit Cruise said that it was right.

      Look at the Markov chain in the exploration.
      The probability of staying at 0 if you're at 0-->s0
      The probability of changing from 1 to 0->c1
      The probability of staying at 1 if you're at 1-->s1
      The probability of changing from 0 to 1->c0
      The ratio of 0s to 1s is
      s0+c1 : s1+c0
      Proof:
      http://www.khanacademy.org/math/applied-math/informationtheory/moderninfotheory/v/markov_chains?qa_expand_key=ag5zfmtoYW4tYWNhZGVteXJqCxIIVXNlckRhdGEiTHVzZXJfaWRfa2V5X2h0dHA6Ly9ub3VzZXJpZC5raGFuYWNhZGVteS5vcmcvYmU3ODJjNTc3OWU2ZGMxMmYyZjVjZjhkNWY0ZjAyZWMMCxIIRmVlZGJhY2sYsrtkDA

      By knowing the ratio of 0s to 1s, we can find the probability of having a 0 and the probability of having a 1.
      Let's say that the ratio of 0s to 1s is 3:2.
      This means for every 5 outcomes, there are 3 0s and 2 1s.
      The probability of having a 0 is 3/5 and the probability of having a 1 is 2/5.
      The probability of having a 0 and the probability of having a 1 if the ratio of 0s to 1s is a:b is a/(a+b) and b/(a+b), respectively.
      We can substitute s0+c1 in for a and s1+c0 in for b.
      Probability of having a 0:
      (s0+c1)/(s0+c1+s1+c0)
      Probability of having a 1:
      (s1+c0)/(s0+c1+s1+c0)

      I hope this helps!
      (5 votes)
  • leaf red style avatar for user Noble Mushtak
    Is the summary (at the end) of this question correct?

    Let's say we have a Markov chain like the one seen in the Markov Chain Exploration.

    Let's say you've set the Markov Chain to have the following probabilities.
    Probability of 0-->1 is c0 (change from 0)
    Probability of 0-->0 is s0 (stay from 0)
    Probability of 1-->0 is c1 (change from 1)
    Probability of 1-->1 is s1 (stay from 1)

    The probability that we start at state 0 is 1/2 and the probability that we start at state 1 is 1/2. Therefore, we need to half the probabilities of everything if we are not told if we are at 0 or 1.

    If we don't know if we are at 0 or 1...
    Probability of 0-->1 is c0/2
    Probability of 0-->0 is s0/2
    Probability of 1-->0 is c1/2
    Probability of 1-->1 is s1/2

    Now, the probability we will get a 0 is the probability that we are moving to a 0. The starting probability will be irrelevant after infinite trials of the probability of 0 being the probability that we are moving to a 0. Also, this is the same case for 1.
    The probability that we are moving to a 0 is s0/2+c1/2.
    The probability that we are moving to a 1 is c0/2+s1/2.
    Because of this, the ratio of a 0 to 1 after infinite trials is the ratio of the probability that we are moving to a 0 to the probability that we are moving to a 1 or
    s0/2+c1/2 : c0/2+s1/2
    Multiply both sides of the ratio by 2.
    s0+c1 : c0+s1
    This ratio might not be fully simplified if you plug in the probabilities, but this is as simplified as we can get algebraically.

    Let's summarize this Question.
    If we are using a Markov Chain of that found in the Exploration and
    the probability of moving from a 0 to a 1 is c0,
    the probability of moving from a 0 to a 0 is s0,
    the probability of moving from a 1 to a 0 is c1, and
    the probability of moving from a 1 to a 1 is s1, then
    the ratio of 0s to 1s after an infinite number of trials is
    s0+c1 : s1+c0
    (3 votes)
    Default Khan Academy avatar avatar for user
  • hopper cool style avatar for user Siddharth Gaur
    Is there any limit of things that could happen on the Markov chain or it is unlimited?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Voiceover: When observing the natural world, many of us notice a somewhat beautiful dichotomy. No two things are ever exactly alike, but they all seem to follow some underlying form. Plato believed that the true forms of the universe were hidden from us. Through observation of the natural world, we could merely acquire approximate knowledge of them. They were hidden blueprints. The pure forms were only accessible through abstract reasoning of philosophy and mathematics. For example, the circle he describes as that which has the distance from its circumference to its center everywhere equal. Yet we will never find a material manifestation of a perfect circle or a perfectly straight line. Though interestingly, Plato speculated that after an uncountable number of years, the universe will reach an ideal state, returning to its perfect form. This Platonic focus on abstract pure forms remained popular for centuries. It wasn't until the 16th century when people tried to embrace the messy variation in the real world and apply mathematics to tease out underlying patterns. Bernoulli refined the idea of expectation. He was focused on a method of accurately estimating the unknown probability of some event based on the number of times the event occurs in independent trials. He uses a simple example. Suppose that without your knowledge, 3,000 light pebbles and 2,000 dark pebbles are hidden in an urn, and that to determine the ratio of white versus black by experiment, you draw one pebble after another, with replacement, and note how many times a white pebble is drawn versus black. He went on to prove that the expected value of white versus black observations will converge on the actual ratio as the number of trials increases, known as the weak law of large numbers. He concluded by saying, "If observations "of all events be continued for the entire infinity, "it will be noticed that everything in the world "is governed by precise ratios "and a constant law of change." This idea was quickly extended as it was noticed that not only did things converge on an expected average, but the probability of variation away from averages also follow a familiar, underlying shape, or distribution. A great example of this is Francis Galton's bean machine. Imagine each collision as a single independent event, such as a coin flip. After 10 collisions or events, the bean falls into a bucket representing the ratio of left versus right deflection, or heads versus tails. This overall curvature, known as the binomial distribution, appears to be an ideal form as it kept appearing everywhere any time you looked at the variation of a large number of random trials. It seems the average fate of these events is somehow predetermined, known today as the central limit theorem. This was a dangerous philosophical idea to some. Pavel Nekrasov, originally a theologian by training, later took up mathematics and was a strong proponent of the religious doctrine of free will. He didn't like the idea of us having this predetermined statistical fate. He made a famous claim that independence is a necessary condition for the law of large numbers, since independence just describes these toy examples using beans or dice, where the outcome of previous events doesn't change the probability of the current or future events. However, as we all can relate, most things in the physical world are clearly dependent on prior outcomes, such as the chance of fire or sun or even our life expectancy. When the probability of some event depends, or is conditional, on previous events, we say they are dependent events, or dependent variables. This claim angered another Russian mathematician, Andrey Markov, who maintained a very public animosity towards Nekrasov. He goes on to say in a letter that "this circumstance "prompts me to explain in a series of articles "that the law of large numbers can apply "to dependent variables," using a construction which he brags Nekrasov cannot even dream about. Markov extends Bernoulli's results to dependent variables using an ingenious construction. Imagine a coin flip which isn't independent, but dependent on the previous outcome, so it has short-term memory of one event. This can be visualized using a hypothetical machine which contains two cups, which we call states. In one state we have a 50-50 mix of light versus dark beads, while in the other state we have more dark versus light. One cup we can call state zero. It represents a dark having previously occurred, and the other state, we can call one, it represents a light bead having previously occurred. To run our machine, we simply start in a random state and make a selection. Then we move to either state zero or one, depending on that event. Based on the outcome of that selection, we output either a zero if it's dark, or a one if it's light. With this two-state machine, we can identify four possible transitions. If we are in state zero and a black occurs, we loop back to the same state and select again. If a light bead is selected, we jump over to state one, which can also loop back on itself, or jump back to state zero if a dark is chosen. The probability of a light versus dark selection is clearly not independent here, since it depends on the previous outcome. But Markov proved that as long as every state in the machine is reachable, when you run these machines in a sequence, they reach equilibrium. That is, no matter where you start, once you begin the sequence, the number of times you visit each state converges to some specific ratio, or a probability. This simple example disproved Nekrasov's claim that only independent events could converge on predictable distributions. But the concept of modeling sequences of random events using states and transitions between states became known as a Markov chain. One of the first and most famous applications of Markov chains was published by Claude Shannon.