Main content

## Statistics and probability

### Course: Statistics and probability > Unit 1

Lesson 2: Two-way tables- Two-way frequency tables and Venn diagrams
- Two-way frequency tables
- Read two-way frequency tables
- Create two-way frequency tables
- Two-way relative frequency tables
- Create two-way relative frequency tables
- Analyze two-way frequency tables
- Interpreting two-way tables
- Interpret two-way tables
- Categorical data example
- Analyzing trends in categorical data
- Trends in categorical data
- Two-way relative frequency tables and associations
- Two-way tables review

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Analyzing trends in categorical data

Sal solves an example where he is asked to calculate relative frequencies and analyze trends in categorical data. Created by Sal Khan.

## Want to join the conversation?

- I need more explanation and examples on this topic and some more methods to crack such questions? i start good but in between i stumbles and lead to wrong calculation. This row way / column way and Total calculations gets confusing. Help pls(35 votes)
- It basically condenses the conditional distributions while including total percentages. If you just want to look at one row and see the conditional distribution for that row, you look at the row %. If you want to see the conditional distribution for a column, you compare the column % within that column. It also includes the percent of each option out of the total which makes it easy to find the total if you are given a number in a particular category.(0 votes)

- "because 55.0% of people who get 7 or more hours are minimal computer users" claim makes sense for me. But why does "only 35.8% of all people are minimal computer users" claim support the conclusion?(12 votes)
- It is saying that 55% of minimal computer users get 7+ hours of sleep. To counter this some people might say that this may be because of the fact that most people taking the survey are minimal computer users. However, to prove this wrong and back up the original claim, it says the majority of people are not minimal computer users, only 35.8% are. So in conclusion the point of this was to back up the point that 55% of people who get 7+ hours of sleep are minimal computer users by showing that this is not because of majority.(24 votes)

- @8:16, when the instructor checks the last answer, I did not think it was supporting his claim which was that there is indeed an associated between minimal computer usage and getting 7+ hours of sleep. Saying that 55% of minimal computer users get 7+ hours of sleep supports his point, but linking it to the data saying x percentage of computer users are minimal users does not suggest a positive association. Am I missing something?

A better data link to reference would be 18.3% of total users being minimal users that get 7+ hours of sleep, while 15% of total users that are moderate to extreme users get 7+ hours of sleep. 18.3% > 15%, which suggests that since more minimal usage computer users get 7+ hours of sleep by 3.3%, there is a positive associated.(18 votes)- This has to do how was the data gathered. In this problem we grabbed a bunch of people, estimated their computer usage patterns and sleep patterns and made a table.

If on the other hand we grabbed some people in Town A and Town B and estimated their sleep patterns, the last claim (replacing "minimal computer user" with "being from Town A") would not be valid.

In the Computer usage example we cannot influence how much people will be in each of the usage groups, it simply samples the population. In the Towns example, we decide how many people from each town we sample. (And more importantly, the ratio.)(2 votes)

- This is seems like a confusing way to present data. I hope a module on "Selecting Appropriate Charts/Tables" will be added in the future. That said, I do understand that data won't always be presented in a way that is the most easily legible to the reader.(15 votes)
- I can understand the concept but not the row and columns.(11 votes)
- This is how I've understood it. It's more like how each group of categories (computer-time and hours-p-night) relate with each other.

Let's call computer-time group the row-group (the categories of this group are on the rows of the table) and the hours-p-night group the column-group.

The row-group has Minimal, Moderate, Extreme categories.

The column-group has 5 or few, 5 - 7, 7 or more categories.

Because the categories are grouped, we look at the data from the perspective of each group.

From the perspective of computer-time (for example Minimal) we can say that:

16.3% of the Minimal Computer users have 5 or few hours of sleep

32.6% of the Minimal Computer users have 5 to 7 hours of sleep

51.1% of the Minimal Computer users have 7 or more hours of sleep

And these values goes to the 'Row %' of each category in the hours-p-night (the column-group). More like,*for this row (Minimal is on the row of the table) what are the values for each column (the values of hours-p-hight)*. Note that 'Row total' doesn't have value for 'Column %'. This because the values on 'Column %' are of another group of categories (hours-p-night) and the total for these are on the 'Column total'.

Now, looking from the perspective of hours-p-night (for example 5 or few) we can say that:

17.5% of people that have 5 or few hours of sleep, are Minimal Computer users.

32.5% of people that have 5 or few hours of sleep, are Moderate Computer users

50.0% of people that have 5 or few hours of sleep, are Extreme Computer users.

And these values goes to the 'Column %' of each category in the row-group.*for this column (5 or few) what are the values for each category*

This for all the categories on the sleeping-time group.

Having two groups of categories, without the 'Row %' and 'Column %' to guide us, would've been hard for us to understand everything.(8 votes)

- What do Row & Column labels mean?(8 votes)
- The Row % is the conditional percent in that particular row or how often that particular outcome appeared in that one row. Column % is the same thing, but downward rather than across.(2 votes)

- Khan academy should have included "filling out frequency table for independent events" before teaching "analyzing trends in categorical data.(7 votes)
- i am unable to read the table that specifies the values of row% column % and total %(4 votes)
- Just leave out everything and notice the topmost row and the leftmost column. That row depicts the time a computer user sleeps. It may be 5 or fewer, 5-7 or 7 or more. So you analyze any kind of row, just remember it is related with the amount of sleep. The column (leftmost) shows type of computer users. It can be minimal,moderate or extreme. So any kind of column you see, the first thought that should cross your mind is that column is related with percentage of people who are using computer.

So, let's analyze the first row which says minimal computer time. So all of the data that is included in this row will have a prefix of "minimal computer users". Then we see hours per night as the second variable and just below that we see this row%, column% total%. Don't worry about this. It's just formatting and doesn't show any data. The next entry is 5 or fewer. And just below that (in the 1st row), we see 3 entries; 16.3% (row%), 17.5%(column%), 5.8% (total). Now let's see what they mean.

16.3% (row%): We are dealing with minimal computer users. And this data is for the row% as we can see from the name. Now recall what we discussed earlier, a row is related with sleep. So 16.3% of the minimal computer users sleep 5 or fewer hours.

17.5%(column%): We are dealing with minimal computer users. And this data is for the column%. Column is related with the users. So 17.5% of the people who sleep 5 or fewer hours are minimal computer users.

Do you get hunch that : there are 3 categories - minimal , moderate and extreme. If 17.5% OF THE PEOPLE WHO SLEEP 5 OR FEWER are minimal computer users then The column % of minimal+column%of moderate+column%of extreme (ALL of the column of 5 or fewer) will add up to 100%? Because all the people who sleep 5 or fewer means 100% people of this category. Check if this is the case or not.

5.8%(total%): We are dealing with minimal computer users. So 5.8% of the minimal computer users get 5 or fewer hours of sleep.

And what % of total people are minimal computer users? its 35.8 , see the rightmost column.

Try to analyze the rest by yourself.(6 votes)

- I just can't seem to answer word questions for associations but numbers are fine. I don't understand what I'm doing wrong and it's frustrating because I keep practicing over and over and over and the only thing I keep getting wrong is the " is there an association...". I don't know what else to do about it, I'm in an infinite loop.(4 votes)
- I really get confused and not able to solve this kind of questions. I am stucked at one of the practice questions

Q : Does the table show evidence of an association between not taking piano lessons and not attending an Ivy League school?

1. No, because 91.7% percent of people who didn't take piano lessons did not attend an Ivy League school. But only 75% of people who did not attend Ivy school dint take piano.

Why is above answer incorrect ? ONLY 75% of people who did not attend Ivy school dint take piano class; and remaining 25% people who did not get into Ivy did took piano class; which proves there is no association between the 2 classes.(3 votes)

## Video transcript

Voiceover:The relative
frequency table below shows statistics from a
study about the relationship between the amount of time a person spends using a computer before bed and the amount that a
person sleeps each night. For computer use, each participant was classified as minimal, moderate, or extreme. Let's look at the frequency table below. Let's see. That's the frequency table, and let's see, there's 3 categories of computer time, just like they told us, minimal, moderate, and extreme. This is before they go to bed or at night. Then they have the 3 categories of how much they're sleeping, 5 or a few hours per night, 5 to 7 hours per night, or 7 or more hours. OK, so that's fair enough. Let's see what they want us to do. They tell us, "Suppose there
were 17 people in the study "who are both in moderate computer users "and got 5 to 7 hours of sleep." Moderate, 5 to 7. This category right over here, there are 17 people in
this category over here. Just to mark that, let me ... I copy and pasted this
chart onto my scratch pad so I can write on it. This group, they're telling us this group, and my pen is really acting up. I don't understand what's going on. This group right over
here, there are 17 people. That group right over there is 17 people. Now what are they asking us? They're saying, "How
many people in the study "were both extreme computer users "and got 5 to 7 hours of sleep? "Round to the nearest whole number." So extreme and got 5 to 7. I'll get my scratch pad out. They are saying how many
people are in this bucket, in this bucket right over here? I think I have to replace
my pen tablet or something. I don't know why it's getting
all splotchy like this. How would we think about this? There are 17 people in this group. How many people are in this group? They tell us that 17 is 34% of the row, of the row total, so I guess you could say 17 is 34.3% of the moderate computer users, or you could say that
17 is 30% of the people who slept 5 to 7 hours each night, or you could say 17 is 10%, is 10% of the total number of people. Let's just go with that. We could figure out the
total number of people. So 10%, actually, let me write it this way. So 10% of the total, 10% of total is going to be equal to 17, or that the total, just divide both sides by 10% is equal to 17 divided by 10%, which is the same thing as 17 over 0.1, which, of course, is equal to 170. The total is 170, and they tell us that extreme, extreme computer users represent ... extreme computer users who
sleep 5 to 7 hours per night represent 11.7%, 11.7% of the total. So the answer to their question of how many people were
extreme computer users who sleep 5 to 7 hours per night, it's 11.7% of 170. Let's go back over here. We actually have a little
calculator tool here. It's 11.7%, which is 0.117, times 170, times 170 is, and let me make sure that
you can see what I'm doing by scrolling over a little bit, times 170 is equal to 19.89. If we're rounding to the nearest whole, that's going to be 20 people, 20 people. And then they are going
to ask us some questions. They say, "Does the table show evidence "of an association between
being a minimal computer user "and getting 7 hours of sleep or more?" Let's just look at the chart. An association between being
a minimal computer user and getting 7 hours of sleep or more. It looks like minimal computer users, these are the minimal computer users who get 7 or more hours of sleep. There's a couple of ways to read this. You could say that 51%
of minimal computer users get 7 or more hours of sleep. You could say that of the people who get 7 or more hours of sleep, 55% are minimal computer users. And of course, this one just says that minimal computer users who
get 7 or more hours of sleep represent 18.3% of all of
the people who were surveyed. Let's look at the choices. Actually, before I even
look at the choices, let's see if there is an association. It does look like if you look at, if you look at minimal computer users, it looks like a small percentage, only 16% get 5 or few hours. A higher percentage gets 5 to 7 hours, and 51%, the highest percentage, gets 7 or more. So it looks like for
minimal computer users, it looks like the distribution
is definitely weighted towards getting more sleep. For example, if we look at
the extreme computer users, it's the opposite trend. It's 47% have 5 or few hours per night, 33% 5 to 7 hours, and then only 19% get 7 or more. It looks like the moderate
is someplace in between. Just looking at this, just looking at each of these rows, it looks like there is a trend where if you use a computer for less time, you are more likely to have more sleep, and likewise, if you use a computer more, you're more likely to have less sleep. Another way to think about it, when you look at the people who are getting 7 or more hours of sleep, a majority of them, a majority of them are
minimal computer users, and there's 3 categories. So for 55% to be minimal computer users really does feel like the
minimal computer users are more ... they're definitely more
disproportionately representing the people who are getting 7 ... or disproportionally
represented in this category of 7 or more hours of sleep. You see that the extreme
computer users in this category, they represent only 20% of this category. So it does look like
there is an association between minimal computer use and getting 7 or more hours of sleep. But let's look at the
actual choices they give us. Does the table show
evidence of association between a minimal computer user and getting 7 hours of sleep or more? Yes, because 35.1% are
extreme computer users, and 29.1% of people are moderate. I go with the yes, but this doesn't seem to really back up the claim. This is just giving us some random data about the percentage that
are extreme computer users or moderate computer users. No ... well ... I explained why I go with yes, that there does seem to be a trend, and I don't even believe
what this statement is because the total column percentages are essentially equal. We see that the total column ... that the column percentages, the column percentages are not equal for the various, for people who are getting
7 hours or more of sleep. We see that right over here, 55, 25, 20, so I won't go with that one either. Yes, because 51% of minimal computer users get 7 or more hours of sleep, and only 33% of all computer users get 7 or more hours. Yeah. I mean that seems pretty good. 51% of minimal computer users get 7 or more hours of sleep, and only 33% of all computer users get 7 or more hours of sleep. That looks like a pretty good explanation, so I'll check that, but let's just review all of them. No. Well, I already said I think it's yes, but because the total
percentage of extreme users who get 5 to 7 hours of sleep is the same as the total percentage of moderate computer users who get 5 to 7 hours of sleep. So that doesn't really meet, it's not touching on the
point that we're looking for. Yes, because 55% of people who get 75 ... who get 7 or more hours
are minimal computer users, and only 35% of all people
are minimal computer users. Actually, I'll go with this as well. Oh yeah, this is a multi-select here, so I could select that one as well. This one, we're looking at of the ... so here, we looked at the percentage of minimal computer users who get 7 hours of sleep, and we saw that percentage is higher than for the whole population. Here, we're looking at the people who get 7 or more hours of sleep, and we're saying 55% of them, 55% of them are minimal computer users even though only 35% of all the people are minimal computer users. So I would go with both of these. And so let us check our answer, and we got it right.