Topic A: Understanding distributions
- [Voiceover] What I want to do in this video is think about all of all the different ways that we can represent data. So right over here, we have a list of, and I'm just using this as one form of data, a list of students' scores on, say, the last test, so Amy got 90 percent right, Bill got 95 percent right, Cam got 100 percent right, Efra also got 100 percent right, and Farah got 80 percent right. This is one way to show data. Remember, data is just recorded information, and it could be numeric like this, it could be quantitative, so you're recording actual numbers, or it could even be things you could record data on how do they like the test, and they could have scored it based on, I really liked it, I kind of liked it, I didn't like it, or they might have rated it on a scale of zero to five, which would have been numbers, but it's numbers that are measuring peoples' opinions, as opposed to, here, we have numbers that are measuring their actual scores. So there's all different types of data, and I don't want to get into all of that, but let's just start thinking about different ways to represent this data. So this is one way, you could view this as a table where you have the name, and then you have the score. So you have your name column, and then you have your score column. And I can construct it as a table, so clearly, it looks like a table. Like that, that's one way, one very common way of representing data, just like that. That's actually how most traditional databases record data, in tables like this. But you could also do it in other ways. So you could record it as a... Often times called a bar graph, or sometimes, a histogram, so you could put score on the vertical axis here, and then you could have your names over here. And let's see the scores, let's see, maybe we'll make this a 50. Actually, let me just mark them off. So this is 10, 20, 30, that's too big. 10, 20, 30, 40, 50, 60, 70, 80, 90, so that's... And then 100, so that's 100. One, two, three, four, five, that would be 50 right over there, and then you can go person by person. So Amy got a 90 on the exam, so the bar will go up to 90. So that is Amy, and then you have Bill, got a 95, so it's going to be between 90 and 100, so it's going to be right over there. Bill got a 95, and so it'll look like this. Bill, so that is Bill. And then you have Cam, who got 100, on the exam. So, make sure, you see I'm hand drawing it, so it's not as precise as if I were to do it on a computer. So this right over there, that is Cam's score. Efra got the same score as Cam, so her score is going to be, let me do that in Efra's color, and that's Efra's score right there. She also got 100. So, Efra... Efra, and then finally, Farah got an 80, so 60, 70, 80, so Farah got an 80. So this is Farah's score right over here. So this is another way of representing the data, and here we see it in visual form, but it has the same information. You can look up someone's name, and then figure out their score. Amy scored a 90, Bill scored a 95, Cam scored 100, Efra also scored 100, Farah scored an 80. And there's even other ways you can have some of this information. In fact sometimes, you might not even know their names, and so then it would be less information but (mumbles) a list of scores. The professor might say, "Hey, here are the five scores "that people got on the exam." And they were listed 90, 95, 100... 100, and 80, now, if it was listed, this was all the data you got, this is less information than the data that's in this bar graph, or this histogram, or the data that's given in this table right over here, because here, not only do we know the scores, but we know who got what score. Here, we only know the list of scores, but there's even other ways, and this is not an exhaustive video of all of the different ways you can represent data. You can also represent data by looking at the frequency of scores. So, the frequency of scores right over here, so instead of writing the people, you could write the scores. So let's see, you could say this is 80, 85, 90, 95, and 100, and then you could record the frequency that people got these scores. So how many times do we have a score of an 80? Well, Farah is the only person with a score of 80, so you put one data point there. No one got an 85, one person got a 90, so you put a data point there. One person got a 95, so you can put that data point right over there. And then two people got 100. So this is one and two. Let's see the other 100 is in this color, so I'll just do it in the color. You wouldn't necessarily have to color code it like this. So this is another way to represent, and this axis, you could just view it as the number. So this tells you how many 80s there were, how many 90s there are, how many 95s, and how many 100s. So this right over here, has the same data as this list of numbers. It's just another way of looking at it. And once you have your data arranged in any of these ways, we can start to ask interesting questions. We can ask ourselves things like, well, what is the range of data? What is the range in the data? And the range is just the spread between the lowest point and the highest point. So the range in this data, is going to be the difference between the highest score, and the highest scores are 100, and the lowest score, an 80, so the range is going to be the difference between the max, minus the min. The maximum score minus the minimum score. So it's going to be 100 minus 80 is equal to 20, so that gives you a sense of things, it kind of gives you a sense of spread. You could also ask yourself, well, how many people scored below 100? And these are just interesting questions. Below 100, and you can actually answer that question, well, actually, you could have answered either of these questions with any of these different ways of looking at the data. If you say, how many people scored below 100? Well, one, two, three? How many people scored below 100? Well, 100 is up here, so it's giong to be one, two, three. How many people scored below 100? One, two, three. How many people scored below 100? One, two, three. And so, any way you look at it, you would have gotten three. And you could also ask yourself, what is the most frequent score? So, most frequent. And once again, you could answer that question with any of these ways of representing this data. You could look at our original table and you say, look, there's only one 90, one 95, one 80, there's two 100s. So you'd say look, the most frequent score is 100. You'd see that over here too, you actually have two 100s, there's only one of each of the other scores. Here you also see the 200s, and here is probably the clearest if you're looking at frequency. Sometimes this might be called a frequency plot. It's often called a frequency plot. And you see here, the most frequent one is the one that has the most dots on it which is 100. So anyway, that's just a very, high level overview of how you can look at data in different ways to represent data, but the one thing I really want you to get from this, is that these are all different ways of representing the same data. And we could probably invent other ways of doing it as well.