# Concept check: Standard deviation

## Introduction

Unlike most questions on Khan Academy, some of these questions aren't graded by a computer. You'll learn the most if you try answering each question yourself before clicking "explain".

## The formula (for reference)

The formula for standard deviation (SD) is
$\Large\text{SD} = \sqrt{\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}{n}}$
where sum means "sum of", x is a value in the data set, $\bar{x}$ is the mean of the data set, and n is the number of values in the data set.

## Part 1

Consider the simple data set left brace, 1, comma, 4, comma, start color redD, 7, end color redD, comma, 2, comma, 6, right brace.
How does the standard deviation change when start color redD, 7, end color redD is replaced with start color greenD, 12, end color greenD?
Please choose from one of the following options.

How can we see this in the formula for standard deviation?

### How does the standard deviation change?

The standard deviation increases because the data becomes more spread out:

### How can we see this in the formula?

We can see this in the formula
$\text{SD} = \sqrt{\dfrac{\goldD{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}}{n}}$
because ${{\goldD{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}}{}}$ is the sum of the squares of the distances from each data point to the mean. As the data, gets more spread out, the value of ${{\goldD{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}}{}}$ increases.

### I'm curious, what are the actual standard deviations of the data sets?

The standard deviation of left brace, 1, comma, 2, comma, 4, comma, 6, comma, start color redD, 7, end color redD, right brace is approximately 2, point, 28.
The standard deviation of left brace, 1, comma, 2, comma, 4, comma, 6, comma, start color greenD, 12, end color greenD, right brace is approximately 3, point, 90.

## Part 2

Is it possible to create a data set with 4 data points that has a standard deviation of 0?
Please choose from one of the following options.

If it is possible, do it! Can you create two different data sets? How about three?

### Yes, it's possible!

In fact, there are an infinite number of possible data sets.
Here's one:
5, comma, 5, comma, 5, comma, 5
Here's another:
8, comma, 8, comma, 8, comma, 8
Any data set where all of the data points are the same has a standard deviation of 0 because the distance from each data point to the mean is 0.

### Show me the calculation for $\{ 5,5,5,5 \}$left brace, 5, comma, 5, comma, 5, comma, 5, right brace.

#### Step 1: Find the mean

$\bar{x} = \dfrac{5 + 5 + 5 + 5}{4} = \dfrac{20}{4} = \blueD5$

#### Step 2: Find the square of the distances from each of the data points to the mean

x$\lvert x - \bar{x} \rvert^2$
5open vertical bar, 5, minus, start color blueD, 5, end color blueD, close vertical bar, start superscript, 2, end superscript, equals, 0, start superscript, 2, end superscript, equals, 0
5open vertical bar, 5, minus, start color blueD, 5, end color blueD, close vertical bar, start superscript, 2, end superscript, equals, 0, start superscript, 2, end superscript, equals, 0
5open vertical bar, 5, minus, start color blueD, 5, end color blueD, close vertical bar, start superscript, 2, end superscript, equals, 0, start superscript, 2, end superscript, equals, 0
5open vertical bar, 5, minus, start color blueD, 5, end color blueD, close vertical bar, start superscript, 2, end superscript, equals, 0, start superscript, 2, end superscript, equals, 0

#### Step 3: Apply the formula

\begin{aligned} \text{SD} &= \sqrt{\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}{n}} \\\\\\\\ &= \sqrt{\dfrac{0 + 0 + 0 + 0}{4}} \\\\\\\\ &= \sqrt{\dfrac{{0}}{4}}\\\\\\\\ &= \sqrt{{0}}\\\\\\\\ &= 0\end{aligned}

## Part 3

Can standard deviation be negative?
Please choose from one of the following options.

Why or why not?

### No, standard deviation cannot be negative!

To see why, think about the numerator and denominator inside the radical:
$\Large\text{SD} = \sqrt{\dfrac{\blueD{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}}{\maroonD{n}}}$
Notice how start color maroonD, n, end color maroonD is always positive. It's the number of data points, and we can't have a negative number of data points.
Also notice that $\blueD{\sum\lvert x - \bar{x} \rvert^2}$ involves a quantity getting squared. Whenever we square something, we get a non-negative number.
Since both the denominator and numerator are positive, the entire expression must be positive too.

## Part 4

Standard deviation is a measure of spread of a data distribution.
What do you think deviation means?
In everyday language, deviation is how different something is from what might be considered normal.
In statistics, when discussing measures of spread, deviation is the amount by which a single measurement differs from the mean.

### Part 5

Here are the formulas for standard deviation (SD) and the formula for mean absolute deviation (MAD), both of which are measures of spread:
$\text{SD} = \sqrt{\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}{n}}$
$\text{MAD} = {\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert}}}{n}}$
What are the similarities between the formulas? What are the differences?

### What are the similarities?

The formulas are very similar! They are both based on the distance from each data point to the mean $\lvert x - \bar{x} \rvert$, and they both include dividing by the number of data points n.

### What are the differences?

The difference between the two formulas is that when calculating standard deviation, we square the distance from each data point to the mean, and we take the square root as the last step of the formula.

### Which one is better?

Standard deviation is more complicated, but it has some nice properties that make it statisticians' preferred measure of spread.

### Part 6

Here's the formula that we've been using to calculate standard deviation:
$\sqrt{\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}{n}}$
Here's the formula that statisticians actually use:
$\sqrt{\dfrac{\sum\limits_{}^{}{{( x-\bar{x})^2}}}{n}}$
Are the two formulas equivalent?
Please choose from one of the following options.

### What's the difference between the formulas?

In the formula that we've been using, we take the absolute value of $x - \bar{x}$:
$\sqrt{\dfrac{\sum\limits_{}^{}{{\tealD{\lvert x-\bar{x}\rvert}^2}}}{n}}$
In the formula that statisticians use, they put parentheses around $x - \bar{x}$:
$\sqrt{\dfrac{\sum\limits_{}^{}{{\purpleC{( x-\bar{x})^2}}}}{n}}$

### Are the formulas equivalent?

Yes, both formulas are equivalent!
Statisticians realize that squaring will make the distance positive, so they don't bother using absolute value signs and just use parentheses instead.
For example, let's evaluate $\tealD{\lvert x - \bar{x} \rvert^2}$ and $\purpleC{(x - \bar{x})^2}$ for x, equals, 2 and $\bar{x} = 5$:
$\tealD{\lvert x - \bar{x} \rvert^2} = \lvert 2 - 5 \rvert^2 = \lvert-3\rvert^2 = 3^2 = \greenD9$
$\purpleC{(x - \bar{x})^2} = (2 - 5)^2= (-3)^2 = \greenD9$
They're both positive! They're both start color greenD, 9, end color greenD! They're equivalent!