If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Analyzing trends in categorical data

Sal solves an example where he is asked to calculate relative frequencies and analyze trends in categorical data.  Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user Yashpal Saini
    I need more explanation and examples on this topic and some more methods to crack such questions? i start good but in between i stumbles and lead to wrong calculation. This row way / column way and Total calculations gets confusing. Help pls
    (40 votes)
    Default Khan Academy avatar avatar for user
    • duskpin tree style avatar for user Ariana Morris
      It basically condenses the conditional distributions while including total percentages. If you just want to look at one row and see the conditional distribution for that row, you look at the row %. If you want to see the conditional distribution for a column, you compare the column % within that column. It also includes the percent of each option out of the total which makes it easy to find the total if you are given a number in a particular category.
      (0 votes)
  • mr pink red style avatar for user daniel
    "because 55.0% of people who get 7 or more hours are minimal computer users" claim makes sense for me. But why does "only 35.8% of all people are minimal computer users" claim support the conclusion?
    (16 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seedling style avatar for user Ninja
      It is saying that 55% of minimal computer users get 7+ hours of sleep. To counter this some people might say that this may be because of the fact that most people taking the survey are minimal computer users. However, to prove this wrong and back up the original claim, it says the majority of people are not minimal computer users, only 35.8% are. So in conclusion the point of this was to back up the point that 55% of people who get 7+ hours of sleep are minimal computer users by showing that this is not because of majority.
      (35 votes)
  • blobby green style avatar for user EduData
    This is seems like a confusing way to present data. I hope a module on "Selecting Appropriate Charts/Tables" will be added in the future. That said, I do understand that data won't always be presented in a way that is the most easily legible to the reader.
    (20 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user nikola23
    @ , when the instructor checks the last answer, I did not think it was supporting his claim which was that there is indeed an associated between minimal computer usage and getting 7+ hours of sleep. Saying that 55% of minimal computer users get 7+ hours of sleep supports his point, but linking it to the data saying x percentage of computer users are minimal users does not suggest a positive association. Am I missing something?

    A better data link to reference would be 18.3% of total users being minimal users that get 7+ hours of sleep, while 15% of total users that are moderate to extreme users get 7+ hours of sleep. 18.3% > 15%, which suggests that since more minimal usage computer users get 7+ hours of sleep by 3.3%, there is a positive associated.
    (18 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user juraad
      This has to do how was the data gathered. In this problem we grabbed a bunch of people, estimated their computer usage patterns and sleep patterns and made a table.

      If on the other hand we grabbed some people in Town A and Town B and estimated their sleep patterns, the last claim (replacing "minimal computer user" with "being from Town A") would not be valid.

      In the Computer usage example we cannot influence how much people will be in each of the usage groups, it simply samples the population. In the Towns example, we decide how many people from each town we sample. (And more importantly, the ratio.)
      (2 votes)
  • male robot hal style avatar for user Dhruva Mukhopadhyay
    I can understand the concept but not the row and columns.
    (13 votes)
    Default Khan Academy avatar avatar for user
    • leafers seed style avatar for user ndande
      This is how I've understood it. It's more like how each group of categories (computer-time and hours-p-night) relate with each other.
      Let's call computer-time group the row-group (the categories of this group are on the rows of the table) and the hours-p-night group the column-group.

      The row-group has Minimal, Moderate, Extreme categories.
      The column-group has 5 or few, 5 - 7, 7 or more categories.

      Because the categories are grouped, we look at the data from the perspective of each group.
      From the perspective of computer-time (for example Minimal) we can say that:
      16.3% of the Minimal Computer users have 5 or few hours of sleep
      32.6% of the Minimal Computer users have 5 to 7 hours of sleep
      51.1% of the Minimal Computer users have 7 or more hours of sleep

      And these values goes to the 'Row %' of each category in the hours-p-night (the column-group). More like, for this row (Minimal is on the row of the table) what are the values for each column (the values of hours-p-hight). Note that 'Row total' doesn't have value for 'Column %'. This because the values on 'Column %' are of another group of categories (hours-p-night) and the total for these are on the 'Column total'.

      Now, looking from the perspective of hours-p-night (for example 5 or few) we can say that:
      17.5% of people that have 5 or few hours of sleep, are Minimal Computer users.
      32.5% of people that have 5 or few hours of sleep, are Moderate Computer users
      50.0% of people that have 5 or few hours of sleep, are Extreme Computer users.

      And these values goes to the 'Column %' of each category in the row-group. for this column (5 or few) what are the values for each category
      This for all the categories on the sleeping-time group.

      Having two groups of categories, without the 'Row %' and 'Column %' to guide us, would've been hard for us to understand everything.
      (13 votes)
  • blobby green style avatar for user Lina Byrne-Dugan
    Khan academy should have included "filling out frequency table for independent events" before teaching "analyzing trends in categorical data.
    (13 votes)
    Default Khan Academy avatar avatar for user
  • winston baby style avatar for user AJ
    What do Row & Column labels mean?
    (10 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Nirish Samuel
    i am unable to read the table that specifies the values of row% column % and total %
    (4 votes)
    Default Khan Academy avatar avatar for user
    • spunky sam blue style avatar for user Nasrullah Sami
      Just leave out everything and notice the topmost row and the leftmost column. That row depicts the time a computer user sleeps. It may be 5 or fewer, 5-7 or 7 or more. So you analyze any kind of row, just remember it is related with the amount of sleep. The column (leftmost) shows type of computer users. It can be minimal,moderate or extreme. So any kind of column you see, the first thought that should cross your mind is that column is related with percentage of people who are using computer.
      So, let's analyze the first row which says minimal computer time. So all of the data that is included in this row will have a prefix of "minimal computer users". Then we see hours per night as the second variable and just below that we see this row%, column% total%. Don't worry about this. It's just formatting and doesn't show any data. The next entry is 5 or fewer. And just below that (in the 1st row), we see 3 entries; 16.3% (row%), 17.5%(column%), 5.8% (total). Now let's see what they mean.
      16.3% (row%): We are dealing with minimal computer users. And this data is for the row% as we can see from the name. Now recall what we discussed earlier, a row is related with sleep. So 16.3% of the minimal computer users sleep 5 or fewer hours.
      17.5%(column%): We are dealing with minimal computer users. And this data is for the column%. Column is related with the users. So 17.5% of the people who sleep 5 or fewer hours are minimal computer users.
      Do you get hunch that : there are 3 categories - minimal , moderate and extreme. If 17.5% OF THE PEOPLE WHO SLEEP 5 OR FEWER are minimal computer users then The column % of minimal+column%of moderate+column%of extreme (ALL of the column of 5 or fewer) will add up to 100%? Because all the people who sleep 5 or fewer means 100% people of this category. Check if this is the case or not.
      5.8%(total%): We are dealing with minimal computer users. So 5.8% of the minimal computer users get 5 or fewer hours of sleep.
      And what % of total people are minimal computer users? its 35.8 , see the rightmost column.
      Try to analyze the rest by yourself.
      (6 votes)
  • blobby green style avatar for user lindamelmer
    I just can't seem to answer word questions for associations but numbers are fine. I don't understand what I'm doing wrong and it's frustrating because I keep practicing over and over and over and the only thing I keep getting wrong is the " is there an association...". I don't know what else to do about it, I'm in an infinite loop.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user mathslover
    I really get confused and not able to solve this kind of questions. I am stucked at one of the practice questions

    Q : Does the table show evidence of an association between not taking piano lessons and not attending an Ivy League school?

    1. No, because 91.7% percent of people who didn't take piano lessons did not attend an Ivy League school. But only 75% of people who did not attend Ivy school dint take piano.

    Why is above answer incorrect ? ONLY 75% of people who did not attend Ivy school dint take piano class; and remaining 25% people who did not get into Ivy did took piano class; which proves there is no association between the 2 classes.
    (3 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Voiceover:The relative frequency table below shows statistics from a study about the relationship between the amount of time a person spends using a computer before bed and the amount that a person sleeps each night. For computer use, each participant was classified as minimal, moderate, or extreme. Let's look at the frequency table below. Let's see. That's the frequency table, and let's see, there's 3 categories of computer time, just like they told us, minimal, moderate, and extreme. This is before they go to bed or at night. Then they have the 3 categories of how much they're sleeping, 5 or a few hours per night, 5 to 7 hours per night, or 7 or more hours. OK, so that's fair enough. Let's see what they want us to do. They tell us, "Suppose there were 17 people in the study "who are both in moderate computer users "and got 5 to 7 hours of sleep." Moderate, 5 to 7. This category right over here, there are 17 people in this category over here. Just to mark that, let me ... I copy and pasted this chart onto my scratch pad so I can write on it. This group, they're telling us this group, and my pen is really acting up. I don't understand what's going on. This group right over here, there are 17 people. That group right over there is 17 people. Now what are they asking us? They're saying, "How many people in the study "were both extreme computer users "and got 5 to 7 hours of sleep? "Round to the nearest whole number." So extreme and got 5 to 7. I'll get my scratch pad out. They are saying how many people are in this bucket, in this bucket right over here? I think I have to replace my pen tablet or something. I don't know why it's getting all splotchy like this. How would we think about this? There are 17 people in this group. How many people are in this group? They tell us that 17 is 34% of the row, of the row total, so I guess you could say 17 is 34.3% of the moderate computer users, or you could say that 17 is 30% of the people who slept 5 to 7 hours each night, or you could say 17 is 10%, is 10% of the total number of people. Let's just go with that. We could figure out the total number of people. So 10%, actually, let me write it this way. So 10% of the total, 10% of total is going to be equal to 17, or that the total, just divide both sides by 10% is equal to 17 divided by 10%, which is the same thing as 17 over 0.1, which, of course, is equal to 170. The total is 170, and they tell us that extreme, extreme computer users represent ... extreme computer users who sleep 5 to 7 hours per night represent 11.7%, 11.7% of the total. So the answer to their question of how many people were extreme computer users who sleep 5 to 7 hours per night, it's 11.7% of 170. Let's go back over here. We actually have a little calculator tool here. It's 11.7%, which is 0.117, times 170, times 170 is, and let me make sure that you can see what I'm doing by scrolling over a little bit, times 170 is equal to 19.89. If we're rounding to the nearest whole, that's going to be 20 people, 20 people. And then they are going to ask us some questions. They say, "Does the table show evidence "of an association between being a minimal computer user "and getting 7 hours of sleep or more?" Let's just look at the chart. An association between being a minimal computer user and getting 7 hours of sleep or more. It looks like minimal computer users, these are the minimal computer users who get 7 or more hours of sleep. There's a couple of ways to read this. You could say that 51% of minimal computer users get 7 or more hours of sleep. You could say that of the people who get 7 or more hours of sleep, 55% are minimal computer users. And of course, this one just says that minimal computer users who get 7 or more hours of sleep represent 18.3% of all of the people who were surveyed. Let's look at the choices. Actually, before I even look at the choices, let's see if there is an association. It does look like if you look at, if you look at minimal computer users, it looks like a small percentage, only 16% get 5 or few hours. A higher percentage gets 5 to 7 hours, and 51%, the highest percentage, gets 7 or more. So it looks like for minimal computer users, it looks like the distribution is definitely weighted towards getting more sleep. For example, if we look at the extreme computer users, it's the opposite trend. It's 47% have 5 or few hours per night, 33% 5 to 7 hours, and then only 19% get 7 or more. It looks like the moderate is someplace in between. Just looking at this, just looking at each of these rows, it looks like there is a trend where if you use a computer for less time, you are more likely to have more sleep, and likewise, if you use a computer more, you're more likely to have less sleep. Another way to think about it, when you look at the people who are getting 7 or more hours of sleep, a majority of them, a majority of them are minimal computer users, and there's 3 categories. So for 55% to be minimal computer users really does feel like the minimal computer users are more ... they're definitely more disproportionately representing the people who are getting 7 ... or disproportionally represented in this category of 7 or more hours of sleep. You see that the extreme computer users in this category, they represent only 20% of this category. So it does look like there is an association between minimal computer use and getting 7 or more hours of sleep. But let's look at the actual choices they give us. Does the table show evidence of association between a minimal computer user and getting 7 hours of sleep or more? Yes, because 35.1% are extreme computer users, and 29.1% of people are moderate. I go with the yes, but this doesn't seem to really back up the claim. This is just giving us some random data about the percentage that are extreme computer users or moderate computer users. No ... well ... I explained why I go with yes, that there does seem to be a trend, and I don't even believe what this statement is because the total column percentages are essentially equal. We see that the total column ... that the column percentages, the column percentages are not equal for the various, for people who are getting 7 hours or more of sleep. We see that right over here, 55, 25, 20, so I won't go with that one either. Yes, because 51% of minimal computer users get 7 or more hours of sleep, and only 33% of all computer users get 7 or more hours. Yeah. I mean that seems pretty good. 51% of minimal computer users get 7 or more hours of sleep, and only 33% of all computer users get 7 or more hours of sleep. That looks like a pretty good explanation, so I'll check that, but let's just review all of them. No. Well, I already said I think it's yes, but because the total percentage of extreme users who get 5 to 7 hours of sleep is the same as the total percentage of moderate computer users who get 5 to 7 hours of sleep. So that doesn't really meet, it's not touching on the point that we're looking for. Yes, because 55% of people who get 75 ... who get 7 or more hours are minimal computer users, and only 35% of all people are minimal computer users. Actually, I'll go with this as well. Oh yeah, this is a multi-select here, so I could select that one as well. This one, we're looking at of the ... so here, we looked at the percentage of minimal computer users who get 7 hours of sleep, and we saw that percentage is higher than for the whole population. Here, we're looking at the people who get 7 or more hours of sleep, and we're saying 55% of them, 55% of them are minimal computer users even though only 35% of all the people are minimal computer users. So I would go with both of these. And so let us check our answer, and we got it right.