Question 1

What do you mean when you say "significant" ? Is it important study? I don't understand. Please give couple very simple examples about it.

Accepted Answer

They should say "statistically significant" to avoid this confusion.  "statistically significant" means the data in question is below some threshold of likeliness to have arisen by chance (by convention this threshold is often set at 5%).

Question 2

I have a question. I've completely understood why we need to re-randomize the datapoints between the groups and recalculate our indicator (the difference between the means), to see how often it can happen that we get our result just by chance. However, now I found myself wondering: how many simulations are needed to correctly estimate the probability of getting a specific value for the indicator just by chance? In other words, how do we know that 150 simulations are enough? Ideally we would want to recalculate the indicator for any possible arrangment of the datapoints in the two groups. But that would be 500 choose 250 in our example if I'm not mistaken, which is 10E+149 combinations ! So how do we know that 150 simulations are enough?

Accepted Answer

You're absolutely right. One datasets become moderately large, it's not really possible to run all of the permutations. Instead, we randomly select a larger number of permutations. The probability (p-value) we calculate is than an estimate of the actual p-value we would have gotten if we ran all of the permutations. One thing that we can then do is to make a confidence interval for the probability, which will depend on the estimated probability, as well as the number of simulations we ran. Once we run enough simulations, this confidence interval will be pretty narrow, so we'll be fairly confident in our result to the degree that we need.

So how many replications do we really need? It depends on what the p-value is and how certain we need to be. Whenever I do tests of this nature, I usually start off with 1000 or 10000, depending on the complexity of the algorithm - some codes run very fast, others take a bit longer.

So how many replications do we really need? It depends on what the p-value is and how certain we need to be. Whenever I do tests of this nature, I usually start off with 1000 or 10000, depending on the complexity of the algorithm - some codes run very fast, others take a bit longer.

Question 3

I don't understand why, during the randomization, Sal creates new groups that have a mix of both kids who saw food videos and game videos.

If we wanted to be significant, shouldn't we have repeated the experiment 1000 times, keep the kids who watched the food commercials in one group and the others in their group, in order to be able to compare the means?

I feel like we're comparing apples with pears which doesn't make sense.

Accepted Answer

It's because we are assuming that the two treatment groups - watching a food commercial or a non-food commercial - has no effect on how much a person eats. Under that assumption, the groups are equivalent, it's only random chance that we got the results we did, and any random shuffling of the kids among the two groups would be equally as likely of an outcome.

Hence, _we_ perform that shuffling, and calculate the difference between the groups each time. In doing so, we get a whole distribution of these differences.

If our assumption is true, then the _actual_ difference that we observed (wasn't it 8 or 10 or something like that?) should be somewhere in the middle of that distribution. It doesn't have to be in the exact middle, but close enough that we don't think the observed result would be unreasonable. On the other hand, if the observed difference is out in the tails of this distribution of differences based on the shuffling, then we would think that our assumption is a very poor one, and therefore we would conclude that the type of commercial _does_ influence how much someone eats.

Question 4

At 5:42, Sal looks at the probability that there would be a 10 gram difference.  At 6:07, he says that there is a 2/150 probability that the results are due to chance, indicating that he is adding the data point from 10 and the point from -10.  This makes sense, as they both are indicative of a 10 point difference (regardless of whether it is +10 or -10).  But in one of the problems (the autism/diet one), one of the hints only includes the data points from the positive side.

So, if the result of THIS experiment was, in fact, an 8 gram difference, would it have been significant?  Would it be a 4/150 (approx 2.7%) chance -- adding from the negative side -- or a 9/150 (6%) -- adding from both sides -- chance that the results are insignificant?

Accepted Answer

Since it's not entirely clear which way they did the subtraction, my recommendation would be to go with the 2-sided test: meaning we add from both directions, so we'd get 9, and hence the probability of an 8g difference assuming the two groups have no difference is 6%.

If the question had given us an indication of direction, we could have used that and gotten 4 or 5 (2.7% or 3.3%) instead. That's certainly legitimate, but the television and snacking example doesn't give us that.

Question 5

We should care about these two points, not just one of them, shouldn't we ?

Accepted Answer

I agree. I think we should care about those 2 points. I did not understand why Sal just took care of just one of them.

Question 6

So, the probability of mean difference being 10 or more is less then 1%, and that makes it statistically significant?

Accepted Answer

Your question is associated with a more advanced unit of the AP Statistics course by the name of "Significant Tests", which is associated with testing the significance of a statistic. You can try visiting this link to obtain a better understanding about significance: https://www.khanacademy.org/math/ap-statistics/xfb5d8e68:inference-categorical-proportions#idea-significance-tests

Hope this helps!

Hope this helps!

Question 7

Lets assume for a moment that watching food commercials makes the kids eat more. Now if we take a treatment group and a control group and perform the experiment a large number of times, shouldn't the mean of treatment group be higher than that of control group in most of the cases?
in my opinion, It should be so if we distribute the kids randomly in treatment and control groups everytime we perform the experiment.

As an analogy, we know a die is biased if it shows the same number most of the times when we throw it, similarly if there is a diff between means of treatment and control groups in most of the trials we should say that it is due to the type of commercials.

Accepted Answer

Dr C, I've read all your answers in this topic explaining the intricacies of re-randomization but it seems to me that I have a brain freeze even more after those answers... to me the situation is rather illogical... imagine, for example, somebody takes two groups - 250 zebras and 250 lions - for study, and gives both of the groups grass (or whatever zebras eat) and meat in the same proportion, and after observation it appears that the zebras have eaten only grass and the lions - only meat... than some statistician says "What the hell! Let's re-randomize this..." and makes 150 groups of zebras and lions just to show that there is no difference between their consumption of grass and meat... putting that to an extreme the statistician posits that there is no difference between the eating habits of zebras and lions... of course, this example is a huge amplification but it seems to me that it's from the same category... so, I still don't understand the aim of this video lesson though I had almost no problems with any on them on this site... and, of course, I can - and should be - wrong somewhere but just don't know where...

Course: AP®︎/College Statistics > Unit 6

Statistical significance of experiment

Want to join the conversation?

Video transcript