Current time:0:00Total duration:4:45

# Estimating a P-value from a simulation

## Video transcript

- [Instructor] So we have a
question here on p-values. It says Evie read an article that said 6% of teenagers were vegetarians, but she thinks it's higher
for students at her school. To test her theory, Evie
took a random sample of 25 students at her school, and 20% of them were vegetarians. So just from this first paragraph, some interesting things are being said. It's saying that the true
population proportion, if we believe this article, of teenagers that are vegetarian, we could say that is 6%. Now for her school, there
is a null hypothesis that the proportion of
students at her school that are vegetarian, so
this is at her school, that the true proportion,
the null would be as the same as the proportion
of teenagers as a whole. So that would be the null hypothesis. And you can see that she's generating an alternative hypothesis, but she thinks it's higher for
students at her large school. So her alternative hypothesis
would be the proportion, the true population
parameter for her school is greater than 6%. And so to see whether
or not you could reject the null hypothesis, you take a sample, and that's exactly what Evie did. She took a random sample of 25 students, and you calculate the sample proportion. And then you figure out
what is the probability of getting a sample proportion
this high or greater? And if it's lower than a threshold, then you will reject your null hypothesis. And that probability we call the p-value. The p-value is equal to the probability that
your sample proportion, as she's doing this for
students at her school, is going to be greater
than or equal to 20% if you assumed that your
null hypothesis was true. So if you assumed that the true proportion at your school was 6% vegetarians, but you took a sample of 25 students where you got 20%, what is the probability of getting 20% or greater
for a sample of 25? Now there's many ways to approach it but it looks like she
is using a simulation. To see how likely a sample like this was to happen by random chance alone, Evie performed a simulation. She simulated 40 samples
of n equals 25 students from a large population where 6% of the students were vegetarian. She recorded the proportion
of vegetarians in each sample. Here are the sample proportions
from her 40 samples. So what she's doing here
with the simulation, this is an approximation of
the sampling distribution of the sample proportions
if you were to assume that your null hypothesis is true. And it says below, Evie wants to test her null hypothesis which is that the true
proportion at her school is 6% versus the alternative hypothesis that the true proportion at
her school is greater than 6% where p is the true proportion of students who are vegetarian at her school. And then they ask us, based on these simulated results, what is the approximate
p-value of the test? And they say, the sample result, the sample proportion here, was 20%, we saw that right over here. Well if we assumed that this is a reasonably good approximation of our sampling distribution
of our sample proportions, there's 40 data points here, and how many of these samples
do we get a sample proportion that is greater than or equal to 20%? Well you could see this
is 20% right over here, 20 hundredths, and so
you see we have three right over here that meet this constraint. And so that is three out of 40. So if we think this is a
reasonably good approximation, we would say that our p-value is going to be approximately three out of 40, that
if the true population proportion for the school were 6%, if the null hypothesis were true, then approximately three
out of every 40 times you would expect to get a sample with 20% or larger being vegetarians. And so three-fortieths is what? Let's see, if I multiply
both the numerator and denominator by two and a half, this is approximately equal to, I say two and a half 'cause
to go from 40 to 100, and then two and a half
times three would be 7.5. So would say this is approximately 7.5% and this is actually a
multiple choice question and if we scroll down, we do indeed see approximately 7.5% right over there.