Why don't we use the sample standard deviation for the standard error? At the end, it says the formula for standard error ≈ sqrt(p-hat*(1-p-hat)/n). But since p-hat is a sample, why don't we use the sample standard deviation with the n-1 correction to estimate the true standard deviation of the sample distribution? Shouldn't it be sqrt(p-hat*(1-p-hat)/n-1)?

The appearance of n in the expression for the standard deviation for p-hat is not due to sampling, but due to the number of trials n for the Binomial random variable X~B(n,p), where n is the number of trials and p is the probability of a success in any given trial. Unfortunately, in this context, the letter p is used for both the probability and the proportion. So, the random variable p-hat is actually a scaling, by 1/n, of the Binomial random variable X~B(n,p). That is, p-hat = B(n,p)/n. That's how we get the proportion of successes - divide the number of successes, X, by the number of trials, n. So, by the properties of scaling a random variable by the factor 1/n, the expected value E(p-hat)=(1/n)E(X) and the variance V(p-hat)=(1/n^2)V(X). Thus, the standard deviation for p-hat is given by the square root of (1/n^2)V(X) Recall, the mean and variance for the binomial random variable are, np and np(1-p), respectively. Hence the variance for p-hat is... V(p-hat) = np(1-p)/n^2, so that, the standard deviation for p-hat is... sqrt(np(1-p)/n^2) = sqrt(p(1-p)/n) as shown in the video. Hope this helped, with kind regards...

I remember another condition where something (sample size maybe?) had to be at equal or greater than 30. What was that?

It is the "large enough" condition - when we are calculating for means, we don't have a p-value, so we can't calculate np and nq. Instead, we check if n > 30. If so, it meets the large enough condition in place of the success/failure condition.

Isn't "not being independent" would also affect the sampling distribution of the sample mean ?

Yes it would! I'm fairly sure the CLT assumes that the instances in the samples that you're taking the mean of are independent. Also, the formula for the SD of the sampling distribution of the sample mean would not work if our instances aren't independent.

If our data does not meet normality, randomness, and independence conditions for statistical inference, what is the consequence? Can you still technically make inference if you do not meet one or more of this conditions?

if you dont do the conditions or it doesnt meet all the standards which would rarely ever happen in a class then its basically saying you cant carry out the significance test, confidence interval, etc

The lesson says that the independence condition must be met for us to be able to use the standard deviation formula for the sample proportion. Does this mean that the other two conditions (the normal and random conditions) do not necessarily need to be met?

While meeting the independence condition is necessary for using the standard deviation formula for the sample proportion, the other conditions (random and normal) are also important for the validity of the inference. The random condition ensures that the sample is representative of the population, while the normal condition ensures that the sampling distribution is approximately normal, which is necessary for constructing accurate confidence intervals or performing significance tests. Therefore, all three conditions should ideally be met for valid inference.

Main content

Course: Statistics and probability > Unit 12

Lesson 3: Tests about a population proportion

Reference: Conditions for inference on a proportion

Google Classroom

When we want to carry out inferences on one proportion (build a confidence interval or do a significance test), the accuracy of our methods depend on a few conditions. Before doing the actual computations of the interval or test, it's important to check whether or not these conditions have been met, otherwise the calculations and conclusions that follow aren't actually valid.

The conditions we need for inference on one proportion are:

Random: The data needs to come from a random sample or randomized experiment.
Normal: The sampling distribution of $\hat{p}$ ‍ needs to be approximately normal — needs at least $10$ ‍ expected successes and $10$ ‍ expected failures.
Independent: Individual observations need to be independent. If sampling without replacement, our sample size shouldn't be more than $10 %$ ‍ of the population.

Let's look at each of these conditions a little more in-depth.

The random condition

Random samples give us unbiased data from a population. When samples aren't randomly selected, the data usually has some form of bias, so using data that wasn't randomly selected to make inferences about its population can be risky.

More specifically, sample proportions are unbiased estimators of their population proportion. For example, if we have a bag of candy where

50 %

of the candies are orange and we take random samples from the bag, some will have more than

50 %

orange and some will have less. But on average, the proportion of orange candies in each sample will equal

50 %

. We write this property as

μ_{\hat{p}} = p

, which holds true as long as our sample is random.

This won't necessarily happen if our sample isn't randomly selected though. Biased samples lead to inaccurate results, so they shouldn't be used to create confidence intervals or carry out significance tests.

The normal condition

The sampling distribution of

\hat{p}

is approximately normal as long as the expected number of successes and failures are both at least

10

. This happens when our sample size

n

is reasonably large. The proof of this is beyond the scope of AP statistics, but our tutorial on sampling distributions can provide some intuition and verification that this condition indeed works.

So we need:

\begin{aligned} expected successes: n p \geq 10 \\ expected failures: n (1 - p) \geq 10 \end{aligned}

If we are building a confidence interval, we don't have a value of

p

to plug in, so we instead count the observed number of successes and failures in the sample data to make sure they are both at least

10

. If we are doing a significance test, we use our sample size

n

and the hypothesized value of

p

to calculate our expected numbers of successes and failures.

The independence condition

To use the formula for standard deviation of

\hat{p}

, we need individual observations to be independent. When we are sampling without replacement, individual observations aren't technically independent since removing each item changes the population.

But the

10 %

condition says that if we sample

10 %

or less of the population, we can treat individual observations as independent since removing each observation doesn't significantly change the population as we sample. For instance, if our sample size is

n = 150

, there should be at least

N = 1500

members in the population.

This allows us to use the formula for standard deviation of

\hat{p}

σ_{\hat{p}} = \sqrt{\frac{p (1 - p)}{n}}

In a significance test, we use the sample size

n

and the hypothesized value of

p

If we are building a confidence interval for

p

, we don't actually know what

p

is, so we substitute

\hat{p}

as an estimate for

p

. When we do this, we call it the standard error of

\hat{p}

to distinguish it from the standard deviation.

So our formula for standard error of

\hat{p}

σ_{\hat{p}} \approx \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

Want to join the conversation?

Sort by:

Warren Sunada-Wong
Posted 6 years ago. Direct link to Warren Sunada-Wong's post “Why don't we use the samp...”
Why don't we use the sample standard deviation for the standard error?

At the end, it says the formula for standard error ≈ sqrt(p-hat*(1-p-hat)/n). But since p-hat is a sample, why don't we use the sample standard deviation with the n-1 correction to estimate the true standard deviation of the sample distribution? Shouldn't it be sqrt(p-hat*(1-p-hat)/n-1)?
Button navigates to signup pageComment on Warren Sunada-Wong's post “Why don't we use the samp...”
(19 votes)
Answer
- Schrödinger's Cat
  Posted 4 years ago. Direct link to Schrödinger's Cat's post “The appearance of n in th...”
  The appearance of n in the expression for the standard deviation for p-hat is not due to sampling, but due to the number of trials n for the Binomial random variable X~B(n,p), where n is the number of trials and p is the probability of a success in any given trial.
  
  Unfortunately, in this context, the letter p is used for both the probability and the proportion.
  
  So, the random variable p-hat is actually a scaling, by 1/n, of the Binomial random variable X~B(n,p). That is, p-hat = B(n,p)/n. That's how we get the proportion of successes - divide the number of successes, X, by the number of trials, n.
  
  So, by the properties of scaling a random variable by the factor 1/n, the expected value E(p-hat)=(1/n)E(X) and the variance V(p-hat)=(1/n^2)V(X).
  
  Thus, the standard deviation for p-hat is given by the square root of (1/n^2)V(X)
  
  Recall, the mean and variance for the binomial random variable are, np and np(1-p), respectively. Hence the variance for p-hat is...
  V(p-hat) = np(1-p)/n^2,
  so that, the standard deviation for p-hat is...
  sqrt(np(1-p)/n^2) = sqrt(p(1-p)/n) as shown in the video.
  
  Hope this helped,
  with kind regards...
  Button navigates to signup page
  (14 votes)
marcello834
Posted 5 years ago. Direct link to marcello834's post “I remember another condit...”
I remember another condition where something (sample size maybe?) had to be at equal or greater than 30. What was that?
Button navigates to signup pageButton navigates to signup page
(11 votes)
Answer
- G0ingInsqne
  Posted 4 years ago. Direct link to G0ingInsqne's post “It is the "large enough" ...”
  It is the "large enough" condition - when we are calculating for means, we don't have a p-value, so we can't calculate np and nq. Instead, we check if n > 30. If so, it meets the large enough condition in place of the success/failure condition.
  Comment on G0ingInsqne's post “It is the "large enough" ...”
  (13 votes)
Alex Kubiesa
Posted a year ago. Direct link to Alex Kubiesa's post “Here we've approximated t...”
Here we've approximated the standard deviation of the sample proportion by taking the formula sigma_p_hat = sqrt(p(1-p)/n) and just replacing p by p_hat to get sqrt(p_hat(1-p_hat)/n).

But in one of the videos earlier, we instead used sigma_p_hat = sigma/sqrt(n) and replaced the population standard deviation sigma with the sample standard deviation s to get s/sqrt(n).

These two formulas give different results, because s/sqrt(n) = sqrt(p_hat(1-p_hat)/(n-1)) due to the Bessel correction factor.

Which of these two approximations is best? I'm guessing the second one?
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- daniella
  Posted 2 months ago. Direct link to daniella's post “The choice between the tw...”
  The choice between the two approximations depends on the context and the specific characteristics of the data. The formula sqrt(p(1-p)/n) is a theoretical approximation based on the assumption of a large sample size and is commonly used in theoretical statistics. On the other hand, s/sqrt(n) with the Bessel correction factor (n-1) is an empirical estimate based on the sample standard deviation (s) and is used when the sample size is small relative to the population size. In general, if the sample size is large and the population size is much larger than the sample size, the first approximation may be more appropriate. If the sample size is small relative to the population size, the second approximation with the Bessel correction factor may be more accurate.
  Button navigates to signup page
  (1 vote)
Andrea Menozzi
Posted 3 years ago. Direct link to Andrea Menozzi's post “it talks about significan...”
it talks about significance test, these are yet to be explained in this course right?
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
Soerenna Farhoudi
Posted 3 years ago. Direct link to Soerenna Farhoudi's post “Can someone show me this ...”
Can someone show me this proof for the normal condition or reference a link?
All I can find is information about the 10% rule
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
Josh E
Posted a year ago. Direct link to Josh E's post “would i be able to apply ...”
would i be able to apply this to video game stat distributions?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- daniella
  Posted 2 months ago. Direct link to daniella's post “Yes, you can apply these ...”
  Yes, you can apply these concepts to analyze distributions in video games, particularly if you are interested in understanding player behavior or performance based on sampled data. However, you would need to ensure that the assumptions underlying the statistical methods are appropriate for the context of the video game data.
  Button navigates to signup page
  (1 vote)
Qingyun
Posted 5 years ago. Direct link to Qingyun's post “What is the difference be...”
What is the difference between the standard error of the mean(sigma^2/n) and the standard error of the sample proportion mentioned above? Thanks!
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Super-intelligent Shade of the Color Blue
  Posted 4 years ago. Direct link to Super-intelligent Shade of the Color Blue's post “Same difference. For Bern...”
  Same difference. For Bernoulli distribution sigma^2 = p * (1 - p)
  Button navigates to signup page
  (1 vote)
Mohamed Ibrahim
Posted 4 years ago. Direct link to Mohamed Ibrahim's post “Isn't "not being independ...”
Isn't "not being independent" would also affect the sampling distribution of the sample mean ?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Bryan
  Posted 4 years ago. Direct link to Bryan's post “Yes it would! I'm fairly ...”
  Yes it would! I'm fairly sure the CLT assumes that the instances in the samples that you're taking the mean of are independent.
  
  Also, the formula for the SD of the sampling distribution of the sample mean would not work if our instances aren't independent.
  Button navigates to signup page
  (3 votes)
Priscilla Baltezar
Posted 6 years ago. Direct link to Priscilla Baltezar's post “If our data does not meet...”
If our data does not meet normality, randomness, and independence conditions for statistical inference, what is the consequence? Can you still technically make inference if you do not meet one or more of this conditions?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- kjames1
  Posted 5 years ago. Direct link to kjames1's post “if you dont do the condit...”
  if you dont do the conditions or it doesnt meet all the standards which would rarely ever happen in a class then its basically saying you cant carry out the significance test, confidence interval, etc
  Button navigates to signup page
  (2 votes)
Prisha B
Posted 5 months ago. Direct link to Prisha B's post “The lesson says that the ...”
The lesson says that the independence condition must be met for us to be able to use the standard deviation formula for the sample proportion. Does this mean that the other two conditions (the normal and random conditions) do not necessarily need to be met?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- daniella
  Posted 2 months ago. Direct link to daniella's post “While meeting the indepen...”
  While meeting the independence condition is necessary for using the standard deviation formula for the sample proportion, the other conditions (random and normal) are also important for the validity of the inference. The random condition ensures that the sample is representative of the population, while the normal condition ensures that the sampling distribution is approximately normal, which is necessary for constructing accurate confidence intervals or performing significance tests. Therefore, all three conditions should ideally be met for valid inference.
  Button navigates to signup page
  (2 votes)