Question 1

Sal says here "hopefully we're convinced now why we divide by n-1," but the previous video left off with "next time I'll show you further why we divide by n-1." Is there a video in between that I should be watching, or some other information? I can't help feeling quite confused, and this is not the first time in this course I've felt Sal mentioned something that wasn't explained previously.

Accepted Answer

Here is link to the video I think Sal was referencing
https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance

Question 2

Are there any other ways to obtain an unbiased standard deviation from our sample population, instead of just accepting the fact that the sample variance gives you a biased standard deviation?

Accepted Answer

The short answer is "no"--there is no unbiased estimator of the population standard deviation (even though the sample variance _is_ unbiased). However, for certain distributions there are _correction factors_ that, when multiplied by the sample standard deviation, give you an unbiased estimator. Nevertheless, all of this is definitely beyond the scope of the video and, frankly, not that important in the grand scheme of things (i.e. unless you're a technical mathematician, don't worry about it). But it was a good question!

Question 3

At 8:10, what does 'nonlinear' mean by?

Accepted Answer

Here is a function y = f(x). When you give a input value x, you will have a output value y through some operation. If this function is linear, it means when you change x by Δx, the change of y (Δy) has a fixed ratio to Δx.
Graphically, if you plot values from function y = f(x) and line them up, you will get a straight line.

Nonlinear functions are those, if you change x with Δx, Δy divided by Δx is not a fixed value. Consequently, the if you plot values from that function and line them up, you won't get a straight line. You may get a curve.

Question 4

i didn't get it. how do square root of unbiased sample variance leads to biased standard deviation..? kindly explain.

Accepted Answer

I'm not an expert in statistics, but here's my crack at it.

An unbiased process that outputs some value means that the expected value of the process will match some actual value.  Basically, as you perform the unbiased process on more and more samples the average value will approach the actual value.

But if you have a set of values who's average is some number and then perform a non-linear operation on them (like sqr root) then their new average value is NOT going to match the old average with the same non-linear operation performed on it.

For example, take the following numbers:
2, 2, 2, 2, 12
Their average is 4.
Here is their sqr roots:
1.41,  1.41,  1.41,  1.41,  3.46
the  average value of those sqr roots is 1.82.
But the sqr root of the old average value is 2. 
They don't match!  We've introduced some bias by performing a non-linear operation.

I imagine it's impossible to remove this bias because the magnitude and direction of the bias probably heavily depends on the population data.

Question 5

My boy started to glitch @2:50

Accepted Answer

Sal.exe has stopped working

Question 6

@ 4:02, why do we divide by 7 instead of 8?  I know he says it is the unbiased sample variance, but what exactly does that mean?

Accepted Answer

That means that he is using a better approximation for the variance of the population, given a normal distribution.

Question 7

How would I know to divide by n-1 or n? I know this question has been asked before, but I don't really see the reason. Could someone please give me a simple answer?

Accepted Answer

The reason has been explained in the wikipedia https://en.wikipedia.org/wiki/Variance#Sample_variance.
n-1 correction is called Bessel's correction. Even though I couldn't understand the proof, I did understand that this is the case that you divide by n-1 instead of n when you have a sample and are estimating for the whole population.

Question 8

It's confusing where all of these terms are coming from: "Population mean," "sample mean," "capital 'N'," "lowercase 'n'." Can someone help me with this? What do these mean/represent? None of these concepts were showcased earlier in Unit 1 or in the first video of this unit, "Sample variance."

Accepted Answer

I like to compare statistics verbiage with a biology class. I usually think of "population" as an entire species and "sample" as a test strip you're putting under the microscope for better attention.
 
There are two different means: one of the population and one of the sample. The population is the overall mean, whereas the sample mean is the average of the sample you selected from the population. For instance, if you compared the average weight every person in the world could pull versus the 100 strongest people, there would be two very different means. In this case, the sample is skewed because it isn't a randomized selection, but hopefully this gives you a better idea of the difference between a "population mean" and a "sample mean."

In statistics, consider the population as "big" and the sample as "small." In the same way, "N" and "n" can be determined based on their size and associated with either the population or sample size. (N is to population, and n is to sample)

Question 9

Instead of squaring the difference from the mean and taking the square root of the sum, isn't it more reasonable to take the mean of the absolute value of the difference from the mean? This way we won't require squares and square roots.

Accepted Answer

That is an alternative method, known as the _mean absolute deviation_. To understand why the variance is more popular, I'd suggest taking a read through an old answer that I wrote up here:

https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/variance-of-a-population?qa_expand_key=ag5zfmtoYW4tYWNhZGVteXJVCxIIVXNlckRhdGEiN3VzZXJfaWRfa2V5X2h0dHA6Ly9mYWNlYm9va2lkLmtoYW5hY2FkZW15Lm9yZy84Mjg5OTkzOTIMCxIIRmVlZGJhY2sYgfoBDA

Course: AP®︎/College Statistics > Unit 3

Sample standard deviation and bias

Want to join the conversation?

Video transcript