If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 14

Lesson 2: Chi-square tests for relationships

# Filling out frequency table for independent events

Give the row and column totals, Sal fills in the cells of a frequency table so that the events are independent. Created by Sal Khan.

## Want to join the conversation?

• I'm kind of confused why I'm completing this question in the very first section when we actually learn about it much later on :/ • In other words if the pattern/frequency remains the same in every event/condition, then there is nothing in either of those event/conditions that is affecting the pattern or frequency, meaning the events/conditions are independent? • I don't buy it. Why exactly 20%? It could be an approximate value that woul be near to 20%..... • In this video, Sal is basically setting up the first half of a Chi Square Test for Independence. To do the test, you find the expected frequencies for each cell based on what would have happened if there is no relationship between the two events (thus, 20%), and you compare this to the observed (actual) frequencies. Thus 20% is the "null hypothesis," and when you do the Chi Square Test, the result is based on the difference between the observed and expected frequencies. But you are also correct that the 20% is only an approximation. In other words, to determine that there is independence (i.e. that the two events aren't related), the test statistic doesn't have to exactly equal 20%, but if it's close enough to 20% then we can reject the hypothesis that the two events are related.
• What IS "categorical data" and where do I find a discussion/explanation of the term? • Here "data" just means counting.
There are a crazy number of possible categories, if you think about anything for a minute you can easily imagine lots of categories.
Here is a toy example: bicycles.

Let's say you were studying bicycles so you wait at school in the morning to watch who arrives by bicycle. You can count things (categories) about the bicycles: how many "gears" they have (single speed, 3 speed, 10 speed, 16 speed etc), what color they are (red, blue, green etc.), handlebar style (straight, under, over), if they have a water bottle (yes vs no).
Also you can count things about the riders: female vs male, long hair vs short hair, age, teacher vs student. You could also count how many people arrive by bicycle vs. on foot vs. by car and so on.
You could ask the bikers how many days they bike to school (every school day? once per week? twice? etc).

So that is how you get "categorical data" - you just count stuff.
There are a bunch of categories here you could add (did the rider wear glasses? a helmet? a jacket? what kind of tires did the bicycles have? did the bicycle have stickers / decorations? did the riders have a tattoo? It just goes on and on :-) ).

Now... why does anybody care?
Well, maybe you want to persuade more people to ride bicycles.
Since there are so many things you could count this is what researchers study to help decide what things should be counted (where "researcher" is a scientist or statistician or marketing... lots of people get interested in this stuff because they're interested in something else).
Anyway, this was just a toy example but I hope it gives you an idea of "categorical data".
• By the way, I noticed that if Mom is grouchy exactly 1/5 of the time, rain or shine, then the 3 entries of all 3 columns are in the same ratios (perhaps obvious, but maybe worth noting).
(1 vote) • Why does

P(mom grouchy | raining) = P(mom grouchy)

imply that

P(mom grouchy | not raining) = P(mom grouchy)?

It seems like it might be obvious but I can't tell what.

Edit: nevermind guys some algebra proves it.
(1 vote) • why do we calculated the 20% for the raining+ grouchy and not for not raining+grouchy? If the probability is the same it wouldn't matter.
(1 vote) • Can anyone tell me what a frequency polygon is?
(1 vote) • How to find the frequency distribution  