Vi Hart visits Khan Academy and talks about the mysteries of Benford's Law with Sal. Created by Sal Khan.
Want to join the conversation?
- I heard about an indigenous culture somewhere (S. America?) in which the counting was logarithmic. Specifically, when asked what number was half-way between "1" and "9", the answer given was "3." Sorry I do not have a source, but it may have been Radiolab or something like that. Very interesting.(38 votes)
- Thanks for that - I looked it up! They are the Mundurucu culture in the Amazon(27 votes)
- So powers of two follow Benford's distribution when expressed in decimal, but if I were to express them in binary (1, 10, 100, 1000, ...) everything falls in the '1' bucket. Question: do the "natural" data (like populations, financial data, physical constants) follow Benford's law under every number base?(22 votes)
- Well, in base 2, powers of 2 become the special case, while powers of 10 would now follow Benford's Law, I think (but expressed in a different way). According to Wikipedia there's a more general version of the rule that applies to any/all bases.
It also has examples where the same data still follows the law even if you use different units.(18 votes)
- Usually, Sal is an excellent teacher for me, but I still have a few questions. I understand that on a logarithmic scale this makes sense, but it doesn't work on a linear scale. There are still the same number of numbers between 1 and 2 as there are between 2 and 3. So, why is there a greater probability for a number to land between 1 and 2 than there a chance to land between 2 and 3? Also, would Benford's distribution also apply to non-exponential sets?(10 votes)
- You're on the right train of thought.
When you ask '[W]hy is there a greater probability for a number to land between 1 and 2 than there a chance to land between 2 and 3?', you need to consider that (assuming we are accepting any arbitrary set of numbers obtained from some real-world observations) the sub-range you refer to as 'between 1 and 2' refers just as much to those numbers between 10 and 20 (also 100 and 200 and so on, hence we are working with logarithmic scales), such that the sub-ranges of numbers (those numbers starting with the same digit) being examined at any given scale may always be considered as having an equal probability of 'being picked' as any of the other sub-ranges, but only when considering that individual scale. In this way you are correct, but your confusion stems from realising this truth, whilst not yet seeing the whole picture.
The next step involves thinking about how these probabilities vary with the scale being considered. As we might be talking about any set of numbers obtained from some real-world observations, the scale (think also about the range or upper limit) that such numbers span is, hopefully intuitively, not at all biased to a neat, round number like 1000. By this I mean that in the natural world there is no reason for, for example, the population of a town to tend more towards that tidy number of 1000 or 10,000; for it to do so would be as strange as saying that populations prefer to stop growing at nice round numbers, for the sake of making equal those probabilities of picking each of our sub-ranges (10 to 20, 200 to 300, etc.). Hopefully this is not too convoluted to follow. It is then this complete lack of bias of 'real-world data scales' toward nice round numbers that is responsible for changing up the probabilities of picking those sets of sub-ranges (I believe this is a direct answer to your question quoted above, which now, hopefully, in sufficient context should make more sense to you. Also, this is my first KA comment, so sorry if it's not so concise.)(24 votes)
- If we look at the behavior of the universe we will see a lot of exponential and logarithmic behavior in it. Physical constants are what determine the 'universe', so then there exist a relation Physical constants - exponential that would make it follow the Benford's Law?(17 votes)
- Probably. Exponents and logarithms are related to Benford's law; constants are also related to the law (see previous Benford video); so there must be a relation between exponents and constants. Which isn't too surprising -- constants with exponents form some important laws of nature, themselves.(5 votes)
- Basically the reason seems obvious, to get to 2 u must pass through 1, to get to 3 you pass through 1 and 2...etc... sotherefore the chances of a random stop being on 1 are greater than 2 because everyone passes through or stops on one, the remainder (slightly less) all pass through or stop on 2. Cute relationship with Fibonnacci but hardly surprising methinks...(4 votes)
- Uh. That reasoning is a bit sketchy. The very first
1at the beginning of a logarithmic scale isn't the one you must magically pass through to get to everything else. In fact, if you go left on the scale the first thing you hit is
9, which you must "pass though" to get down to
8, etc etc all the way down to the first time you hit a one at
- all natural increments follow logarithmic progression. so, a random series based on such a progression is likely to follow the benford's law. Increase in population is natural increment just like Fibonacci sequence or stock market data. Although, i am still confused why physical constants follow this law when the choice of scale lies entirely in our hand. would they follow this law for any choice of scales or is the present scale chosen something special?(2 votes)
- I just can't help but feel this concept is being made more complicated than it needs to be...
All of these examples are in some sense sequential. Population growth, stock market changes, etc. For example: before a country can have a population of 20, it has to have a population of 19, 18, 17, so on and so forth.
And so even on that tiny scale (20), you can immediately see that the chances of starting with a 1 are 11 out of 20. And this persists all the way up to 99 - you have to roll against the 11/20 odds for every number before 90 before you even get a chance at a 9. And once you move into the 100s, 1000s, etc. these odds become even more drastic.
Basic odds (20 marbles, 11 are blue) means that a 1 is more likely. And probability repeated to absurdly high numbers just enforces that trend.(3 votes)
- Does the logarithmic scale have to be based on log_10? I suspect that plotting powers of eight on an octal logarithmic scale would break the Bensford's law.
Is there some non decimal numeric system that doesn't follow Bensford's law for any statistics?(3 votes)
- What is Fibonacci sequence?? I have no clue about them...
Could someone help me with an easy explanation.....
Tnx in advance...
- The Fibonacci sequence is the one that starts 1,1 and then proceeds with the recursive definition Fₙ = Fₙ₋₁ + Fₙ₋₂
So we have:
F₁ = 1
F₂ = 1
F₃ = F₂ + F₁ = 1 + 1 = 2
F₄ = F₃ + F₂ = 2 + 1 = 3
F₅ = F₄ + F₃ = 3 + 2 = 5
Giving 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
The sequence has all sorts of interesting mathematical properties. Not all of them easy to understand. Use your favourite search engine, and you'll be able to find numerous articles on the Fibonacci numbers.(4 votes)
- What do they mean by "most significant digit"?(1 vote)
- The first digit from left-to-right that is not a zero. For example for 879 the most significant digit is 8. For 0.00365 the most significant digit is 3.(4 votes)
SAL: So where we left off in the last video, Vi and myself had posed a mystery to you. We had talked about Benford's law. VI: And we asked, what is up with Benford's law? SAL: This idea that, if you took just random countries and took their population and took the most significant digit in their population and plotted the numbers of countries that their most significant digit is a 1, versus a 2, versus a 3, you just had it was much more likely that it would be a 1. Or that, if you took physical constants of the universe, that they're most likely to have 1 as their most significant digit. VI: I wish we had more graphs, because graphs are fun. SAL: Yes. VI: But if you look at information from the stock market or anything, what's up? SAL: Yes. And it seems to all follow this curve. And what was extremely mysterious-- and this is where we finished off the last video-- was if you look at pure, I would say, compounding phenomenon, like for example, the Fibonacci sequence, or powers of 2, that exactly fits the Benford distribution. It exactly fits this. If you take all the powers of 2, a little bit over 30% of those powers of 2, all of the powers of 2 have 1 as their most significant digit. What is this? 17? Roughly 17% of all of them have 2 as their most significant digit. VI: Yeah. Although in this case, there's an infinite number in every set, so it's harder to graph. SAL: But if you wanted to try it out, you could take the first million powers of 2 and then find the percentage. And that will probably give you a pretty good approximation of things. VI: Yeah. So to me, that's like less mysterious. On the one hand, wow, this fits exactly with mathematics. But that also gives you a really good handle, because you realize, alright, there's something here I can actually take a look at. SAL: You could take a look at it and it starts to become something you can dig deeper in. And we said, in the last video, we wanted you to pause it and think about why this is happening because, frankly, we had to do the same thing. And a big clue for us was when we looked at a logarithmic scale. And we're looking at one right over here. And just to be clear, what's going on in this logarithmic scale is you see equal spaces on this scale are powers of 10. So on a linear scale, this would be a 1. And maybe this would be a 2 and then a 3. Or if we wanted to say that this is a 2, you would say this is a 1, this is a 10, this would be a 20, then would be a 30, so on and so forth. But in a logarithmic scale, equal distances are times 10 or, in this case, if we're taking powers of 10. So this is 1:10, then 10:100, then 100:1,000. And you see how the numbers in between fall out, that the space between 1 and 2 is pretty big. And then 2 and 3 is still pretty big, but a little bit smaller. And then 3 and 4 gets smaller and smaller and smaller, until you get to 10. And that's a pretty big clue about what's going on with Benford's law. VI: Yeah. It seems to match up somehow. So there's a connection here. SAL: And it actually turns out-- and this actually a very big clue-- that this, if you take this area right here as a percentage of this entire area, it's exactly this percentage. It's exactly that percentage there. And if you take this area as a percentage of that entire area, it's exactly this percentage, that roughly 17%, or whatever that number is right over there. So that's a huge clue. VI: Yeah, or at least for powers of 2 or a Fibonacci sequence thing-- for powers, it definitely makes sense. SAL: Yes, for any powers. And so the logic is-- and this is now our biggest clue-- is to actually plot the powers of 2 on a logarithmic scale like this. VI: All right, let's see where they fall. SAL: All right, let's try it out. So 2 to the zeroth power is 1. 2 to the first power is 2. Then you get to 4. Then you get to 8. Then you get to 16, which is going to be someplace around here. Then you want to go to 32, which is going to be someplace around there. That's 30, so this is 32. Then you want to go to 64. And so this is 40, 50, 60. 64 is going to be right over there. And so what you see is, when you plot the powers of 2 on this logarithmic scale, they're equal distance apart. So you keep stepping along. If you were to plot on a linear scale, they'd get farther and farther apart. VI: Yeah. SAL: Actually, twice as far apart every time. But on this scale right over here they are equally spaced. So what's happening is you have something that's just equally stepping along here. You can imagine even just like walking along this. And if your sidewalk is shaped like this logarithmic scale, the probability on any given step, as you do many, many steps, or as you count all the steps, you're going to have many, many more steps that fall into the block that's between 1 and 2, or between 10 and 20, than you will, for example, the block that's between 9 and 10. VI: Yeah, if you just take a random point along here, you're more likely to fall in a area starting with 1. SAL: Right, one of these areas. Exactly, starting with 1, so between 1 and 2, or 10 and 20, or 100 and-- and that's exactly-- VI: So taking equal steps is going to give you that distribution, unless your steps happen to-- because there's special cases, right? So if you're getting-- SAL: Or people walk logarithmically. [LAUGHS] VI: Yeah, if you walk from 1 to 10, if your steps are 10 long-- SAL: Yeah. In special cases, yes. VI: So that's what happens there. SAL: If your steps are 10 long-- VI: You happen to exactly on [INAUDIBLE]. SAL: Right. But if you're anything-- any slight variation away from that exact thing, and then you will get the distribution. VI: Yeah. You're going to end up stepping all over the place. SAL: The Benford's distribution. VI: Benford's distribution. SAL: Even though I think we now understand why, it's still fascinating. VI: Yeah. Well, this explains it for these number series. SAL: Yes. VI: So now we have to somehow figure out how to connect that to the real world information. SAL: The general idea, well, so for populations. And we read up a little about it. And Benford's distribution tends to work for things that grow exponentially. VI: Yes. SAL: Like powers of 2. VI: Like powers of 2. SAL: Like powers of 2. And populations grow exponentially. VI: Yeah. And in finance, a lot of things also grow exponentially. SAL: Yes. Or decline exponentially. Either way. [LAUGHTER] SAL: But it tends to operate exponentially. You keep growing by 10% every year. That's an exponential path. What's fascinating is physical constants. And we actually aren't 100% sure why this is happening. VI: No. This is still crazy to me. SAL: We only have theories here. And the general idea-- because, you know, physical constants is sort of dependent on the units you're dealing with. They're depending on a whole bunch of things. Actually, I have a few very loose theories. But I'll let you all think about that more. VI: OK. SAL: All right? And so, hopefully, you all enjoyed this.