If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Determining sample size based on confidence and margin of error

AP.STATS:
UNC‑4 (EU)
,
UNC‑4.C (LO)
,
UNC‑4.C.1 (EK)
,
UNC‑4.C.2 (EK)
,
UNC‑4.C.3 (EK)
,
UNC‑4.C.4 (EK)
Determining sample size based on confidence level and margin of error.

Want to join the conversation?

  • blobby green style avatar for user Pat
    Can someone help me walk through how Sal determined that 0.5 will maximize p(1-p)?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • mr pink green style avatar for user Ramen23
      So basically, think of it this way. p(1 - p) = p - p^2. If you graph this, you will have roots at 0 and 1. This means the vertex is at x = 0.5. Since the graph is opening downwards with an a value that is less than one, the vertex will be a maximum point. Plug in 0.5, and you get 0.5-0.5^2 = 0.25. You will never get a value that is larger than that.
      (28 votes)
  • blobby green style avatar for user rashed.sabra
    Why we need to maximize this term "p_hat(1 - p_hat)"?
    (13 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user Omster
      You need to maximize this term so that you basically handle the worst case scenario when the margin of error is largest. After you maximize the margin of error, you can now find n accordingly so that you're 100% sure that the margin of error won't exceed 2.
      (2 votes)
  • blobby green style avatar for user Collin Mendia
    When we maximize the term, p_hat(1-p_hat), is that value, 0.5, always the same?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Mez Cooper
    Don't z-scores tell you have many standard deviations from the mean you are? Why is the z-score 1.96 and not 2 for 95% confidence?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      Yes z is the number of standard deviations from the mean. In a normal distribution, approximately 95% of the data is within 2 standard deviations from the mean.

      So for 95% confidence, 2 is an approximation of the z-score. However, 1.96 is a more precise approximation of this z-score.
      (6 votes)
  • winston baby style avatar for user Yash Singh
    Doesn't the empirical rule say that 95% is two standard deviations? That means that the z* critical value is two, not 1.96. Am I right?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leaf grey style avatar for user lukasz.kruk
    How do we interpret the 2% margin of error here? Does it mean that if Della's survey yields, let's say, that 30% of the sample supports the tax increase, then we can say that we are 95% sure that between 28% and 32% of the entire population support the tax increase?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • duskpin sapling style avatar for user Nathan Young
    Is there a formula to find the minimum sample size?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • duskpin seed style avatar for user Christy Lindow
    You are interested in estimating the the mean weight of the local adult population of female white-tailed deer (doe). From past data, you estimate that the standard deviation of all adult female white-tailed deer in this region to be 25 pounds. What sample size would you need to in order to estimate the mean weight of all female white-tailed deer, with a 96% confidence level, to within 9 pounds of the actual weight?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      A 96% confidence level means probability [(100-96)/2] * 100% = 2% in each of the two tails.

      From the normal table, the z-score associated with a right tail of 2% (cumulative probability 98%) is 2.05. By symmetry, the z-score associated with a left tail of 2% is -2.05.

      So the margin of error (the distance from either endpoint of the confidence interval to the center) is 2.05 times the standard deviation of the sample mean.

      The standard deviation of the sample mean is 25/sqrt(n) pounds.

      Since the margin of error needs to be 9 pounds, we have

      2.05*25/sqrt(n) = 9
      1/sqrt(n) = 9/(2.05*25)
      sqrt(n) = 2.05*25/9
      n = (2.05*25/9)^2 = 32.43.

      To make sure not to exceed the needed margin, we round up. So we need sample size at least n = 33.

      Have a blessed, wonderful day!
      (2 votes)
  • blobby green style avatar for user nusrat61
    if population is 650 how much should be the sample size to be significant
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user HenryL
    I see him use 2 different formulas, one involving the square root of variance over samples, and the other involving the square root of p times 1-p over samples. Can anyone tell me what the differences between the 2 formulas are? I need help.
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] We're told Della wants to make a one-sample z interval to estimate what proportion of her community members favor a tax increase for more local school funding. She wants her margin of error to be no more than plus or minus 2% at the 95% confidence level. What is the smallest sample size required to obtain the desired margin of error? So let's just remind ourselves what the confidence interval will look like and what part of it is the margin of error, and then we can think about what is her sample size that she would need. So she wants to estimate the true population proportion that favor a tax increase. She doesn't know what this is, so she's going to take a sample size of size n, and in fact this question is all about what n does she need in order to have the desired margin of error. Well whatever sample she takes there, she's going to calculate a sample proportion. And then the confidence interval that she's going to construct is going to be that sample proportion plus or minus critical value, and this critical value is based on the confidence level. We'll talk about that in a second. What z star, what critical value would correspond to a 95% confidence level, times and then you would have times the standard error of her statistic. And so in this case it would be the square root, it would be the standard error of her sample proportion, which is the sample proportion times one minus the sample proportion, all of that over her sample size. Now she wants the margin of error to be no more than 2%. So the margin of error is this part right over here. So this part right over there, she wants to be no more than 2%, has to be less than or equal to 2%, that green color is kind of too shocking. It's unpleasant, all right. (laughing) Less than or equal to 2% right over here. So how do we figure that out? Well the first thing let's just make sure we incorporate the 95% confidence level. So we could look at a z-table. Remember 95% confidence level, that means if we have a normal distribution here, if we have a normal distribution here, 95% confidence level means the number of standard deviations we need to go above and beyond this in order to capture 95% of the area right over here. So this would be 2.5% that is unshaded at the top right over there, and then this would be 2.5% right over here. And we could look up in a z-table, and if you were to look up in a z-table, you would not look up 95%. You would look up the percentage that would leave 2.5% unshaded at the top. So you would actually look up 97.5%. But it's good to know in general that at a 95% confidence level, you're looking at a critical value of 1.96. And that's just something good to know. We could of course look it up on a z-table. So this is 1.96. And so this is going to be 1.96 right over here. But what about p hat? We don't know what p hat is until we actually take the sample, but this whole question is, how large of a sample should we take. Well remember we want this stuff right over here that I'm now circling or squaring in this less, less bright color, (laughing) this blue color. We want this thing to be less than or equal to 2%. This is our margin of error. And so what we could do is we could pick a sample proportion, we don't know if that's what it's going to be, that maximizes this right over here. Because if we maximize this, we know that we're essentially figuring out the largest thing that this could end up being, and then we'll be safe. So the p hat, the maximum p hat, and so if you wanna maximize p hat times one minus p hat, you could do some trial and error here. This is a fairly simple quadratic. It's actually going to be p hat is 0.5, and I wanna be, I wanna emphasize we don't know. She didn't even perform the sample yet. She didn't even take the random sample and calculate the sample proportion, but we wanna figure out what n to take, and so to be safe she says okay, well what sample proportion would maximize my margin of error? And so let me just assume that and then let me calculate n. So let me set up an inequality here. We want 1.96, that's our critical value, times the square root of, we're just going to assume 0.5 for our sample proportion, although of course we don't know what it is yet until we actually take the sample. So that's our sample proportion. That's one minus our sample proportion. All of that over n needs to be less than or equal to 2%. We don't want our margin of error to be any larger than 2%. Let me just write this as a decimal, 0.02. And now we just have to do a little bit of algebra to calculate this. So let's see how we could do this. So this could be rewritten as, we could divide both sides by 1.96, 1.96. One over 1.96. And so this would be equal to, on the left-hand side we'd have the square root of all of this, but that's the same thing as the square root of 0.5 times 0.5 so that would just be 0.5 over the square root of n needs to be less than or equal to, actually let me write it this way. This is the same thing as two over 100. So two over 100 times one over 1.96 needs to be less than or equal to two over 196. Let me scroll down a little bit. This is fancier algebra than we typically do in statistics, or at least in introductory statistics class. All right so let's see we could take the reciprocal of both sides. We could say the square root of n over 0.5, and 196 over two. Now let's see what's 196 divided by two? That is going to be 98. So this would be 98. And so if we take the reciprocal of both sides, then you're gonna swap the inequality. So it's gonna be greater than or equal to. Let's see I could multiply both sides of this by 0.5. So 0.5, that's why I said 0.5 but my fingers wrote down 0.4. Let's see 0.5. And so there we get the square root of n needs to be greater than or equal to 49, or n needs to be greater than or equal to 49 squared. And what's 49 squared? Well you know 50 squared is 2,500, so you know it's going to be close to that, so you can already make a pretty good estimate that it's going to be D. But if you wanna multiply it out we can. 49 times 49, nine times nine is 81. Nine times four is 36 plus eight is 44. Four times nine, 36. Four times four is 16 plus three, we have 19. And then you add all of that together, and you indeed do get, so that's 10, and so this is a 14. You do indeed get 2,401. So that's the minimum sample size that Della should take if she genuinely wanted her margin of error to be no more than 2%. Now it might turn out that her margin of error when she actually takes the sample of size 2,401, if her sample proportion is less than 0.5 or greater than 0.5, well then she's going to be in a situation where her margin of error might be less than this. But she just wanted to be no more than that. Another important thing to appreciate is, it just the math all worked out very nicely just now, where I got our n to be actually a whole number. But if I got 2,401.5, then you would have to round up to the nearest whole number because you can't have a your sample size is always going to be a whole number value. So I will leave you there.