Z-score introduction

A z-score is an example of a standardized score. A z-score measures how many standard deviations a data point is from the mean in a distribution.

  • blobby green style avatar for user Ryan Giglio
    When I paused to calculate the standard deviation myself, I came up with 1.83, not 1.69. It looks like Sal got 1.69 by taking the sqrt of the biased sample variance instead of the unbiased sample variance, which we were taught to do in the previous videos. Why is this?
    (16 votes)
  • starky seed style avatar for user Nessa Avila
    How did you get the number 1.69?
    (10 votes)
  • aqualine tree style avatar for user farheen.tyiiba
    I have calculated the standard deviation for this video on my own. However, in your answer, you calculated it as if it was the population mean instead of the sample mean. To calculate the sample mean, we have to divide by (n-1); however, here, I see instead of diving by 6, it was divided by n, which was 7.
    I am confused right now.
    (5 votes)
  • aqualine ultimate style avatar for user Mandy Foshee
    When calculating the z score with my Ti-84 calculator for 6, I roughly got 1.7751.. which, when rounded, would be 1.78 am I wrong about rounding it up?
    (7 votes)
  • female robot grace style avatar for user Janice
    Can z-scores be negative if the value is less than the mean? Or would it be the absolute value of the z-score since it's measuring just how many standard deviations the value is away from the mean?
    (4 votes)
  • blobby green style avatar for user nancy
    Why does Sal say the z score of -0.59 is "a little bit more than half a standard deviation" below the mean when a standard deviation is 1.69? wouldn't half be approx. 0.7?
    (3 votes)
    • piceratops ultimate style avatar for user Bekah
      A 1 in a z-score means 1 standard deviation, not 1 unit. So if the standard deviation of the data set is 1.69, a z-score of 1 would mean that the data point is 1.69 units above the mean. In Sal's example, the z-score of the data point is -0.59, meaning the point is approximately 0.59 standard deviations, or 1 unit, below the mean, which we can easily see since the data point is 2 and the mean is 3.
      (5 votes)
  • leaf green style avatar for user Mathstudent
    Why does he do the data point minus the mean? are we not supposed to do the Mean minus the data point?
    (3 votes)
    • blobby green style avatar for user alexihussey1
      TL;DR... It gives us the correct sign.

      Long answer:

      We want the absolute difference between the numbers but also the direction the point is from the mean. When finding the standard deviation this doesn't matter, since we're only interested in the absolute value of the discrepancy between each point and the mean, as standard deviation is an absolute value. If we didn't look at the absolute values, any dataset with both positive and negative data points would be messed up when we find the sum of each difference before dividing by (n) or (n-1) and then finding the square root.

      Here we have a mean of (3), and a data point with a value of (2). When we subtract (3) from (2) to find the difference, that gives us a negative answer, (-1), which we then divide by the standard deviation to see how far the difference between the mean and the data point are, in terms of standard deviations (the definition of a z-score). If we were to subtract the data point from the mean, (which would be (2) from (3), or (3) - (2)), we would get the same absolute difference between the two values but we might come away thinking our z-score is positive, since we'd get a positive difference of (1) before dividing by the standard deviation, which is always positive.

      I will say that-- unless there's a reason that becomes apparent later-- it would probably be better practice to subtract the data point minus the mean when finding standard deviation too, just to be consistent. You'd still get the correct absolute value for each difference as long as you use the absolute value bars.

      But here are some other examples with various negative and positive signs to prove that subtracting the data point minus the mean always works, but that the reverse (mean minus data point) doesn't work, with decimal places just to prove that's not a factor either in case you were curious, as I was):

      1. suppose:

      mean = (-5.9), x = (-2.2)

      Say we try to find the difference between the two by doing mean minus point:

      (-5.9) - (-2.2) = (-3.7)

      This would be the correct absolute difference of (3.7), but the negative symbol also implies that our data point, (-2.2), was below our mean, (-5.9), which of course is not true. If we were taking the absolute value of the difference, this wouldn't matter, but here we want the difference and the direction. If we subtract the mean from the point however:

      (-2.2) - (-5.9) = (3.7)

      We get the correct difference and the correct positive sign.

      Let's do the same thing with different values, one positive and one negative:

      2. suppose:

      mean = (-4.25) , x = (10.75)

      and first we try mean minus point:

      (-4.25) - (10.75) = (-15)

      then point minus mean:

      (10.75) - (-4.25) = (15)

      Here's another, with the positive and negative signs on the opposite side:

      3. suppose:

      mean = (1.5) , x = (-9.5)

      (1.5) - (-9.5) = (11)
      (-9.5) - (1.5) = (-11)

      Lastly, if both values are positive:

      4. suppose:

      mean = (12.46) , x = (7.27)

      (12.46) - (7.27) = (5.19)
      (7.27) - (12.46) = (-5.19)
      (4 votes)
  • blobby green style avatar for user JV
    Sal wrote next to the 'σ' (sigma) some sort of flipped 2. What does that stand for?
    (3 votes)
  • blobby green style avatar for user Dylan
    In the Population Standard Deviation formula, in the denominator, is it N or N-1?
    (3 votes)
  • starky ultimate style avatar for user Lakshmi Bhavana
    How did he get a standard deviation of 1.69? When I did it, I got about 1.8257419... So, what did I do wrong?
    (3 votes)
Video transcript

- [Instructor] One of the most commonly used tools in all of statistics is the notion of a Z-score. And one way to think about a Z-score is it's just the number of standard deviations away from the mean that a certain data point is. So let me write that down. Number of standard deviations. I'll write it like this. Number of standard deviations from our population mean for a particular, particular data point. Now let's make that a little bit concrete. Let's say that you're some type of marine biologist and you've discovered a new species of winged turtles and there's a total of seven winged turtles, the entire population of these winged turtles is seven. And so you go and you're actually able to measure all the winged turtles and you care about their length and you also wanna care about, how are those lengths distributed? Lengths of winged turtles. All right, and let's say, and this is all in centimeters. These are very small turtles. So you discover, and these are all adults. So there's a two centimeter one, there's another two centimeter one. There's a three centimeter one. There's another two centimeter one. There's a five centimeter one, a one centimeter one, and a six centimeter one. So we have seven data points and from this, and I encourage you at any point if you want. Pause this video and see if you wanna calculate, what is the population mean here? We're assuming that this is the population of all the winged turtles. Well, the mean in this situation is going to be equal to, you could add up all these numbers and divide by seven and you would then get three. And then using these data points and the mean you can calculate the population standard deviation. And once again, as review I always encourage you to pause this video and see if you can do it on your own. But I've calculated that ahead of time. The population standard deviation in this situation is approximately, I'll round to the hundredth place, 1.69. So with this information you should be able to calculate the Z-score for each of these data points. Pause this video and see if you can do that. So let me make a new column here. So here I'm gonna put our Z-score. And if you just look at the definition what you're going to do for each of these data points, let's say each data point is x, you're going to subtract from that the mean and then you're going to divide that by the standard deviation. The numerator right over here's gonna tell you how far you are above or below the mean, but you wanna know how many standard deviations you are from the mean, so then you'll divide by the population standard deviation. So for example, this first data point right over here if I wanna calculate the Z-score I will take two. From that I will subtract three and then I will divide by 1.69. I will divide by 1.69. And if you've got a calculator out this is going to be -1 divided by 1.69 and if you use a calculator you would get, this is going to be approximately -0.59. And the Z-score for this data point is going to be the same. That is also going to be -0.59. One way to interpret this is, this is a little bit more than half a standard deviation below the mean, and we could do a similar calculation for data points that are above the mean. Let's say this data point right over here. What is its Z-score? Pause this video and see if you can figure that out. Well, it's going to be six minus our mean, so minus three. All of that over the standard deviation. All of that over 1.69 and this, if you have a calculator, and I calculated it ahead of time, this is going to be approximately 1.77. So more than one, but less than two standard deviations above the mean. I encourage you to pause this video and now try to figure out the Z-scores for these other data points. Now, an obvious question that some of you might be asking is why, why do we care how many standard deviations above or below the mean a data point is? In your future statistical life, Z-scores are gonna be a really useful way to think about how usual or how unusual a certain data point is. And that's going to be really valuable once we start making inferences based on our data. So I will leave you there. Just keep in mind it's a very useful idea, but at the heart of it a fairly simple one. If you know the mean you know the standard deviation. Take your data point, subtract the mean from the data point and then divide by your standard deviation. That gives you your Z-score.