If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Sample statistic bias worked example

Determining bias of sample statistics based on approximate sampling distributions example.

Want to join the conversation?

  • blobby green style avatar for user ju lee
    is sample median as good a estimator as sample mean? or one is better than the other?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Andrea Menozzi
    if i don't know the parameter, how do i know if the distribution is biased or not? and if i already know the parameter, why would i need to estimate it?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The challenge you've outlined is fundamental in statistics. Typically, the purpose of constructing sampling distributions and using estimators is to infer population parameters when they're unknown. Without knowing the parameter, one way to assess the potential bias of an estimator is through theoretical properties or simulations. For instance, statisticians know that the sample mean is an unbiased estimator of the population mean for any distribution due to the Central Limit Theorem and other statistical properties. For other estimators, simulations can help by generating sampling distributions under known conditions to see if, on average, they converge to the parameter of interest. In practical applications, knowing the exact parameter might not be feasible, but understanding the properties of estimators can guide towards those likely to offer unbiased estimates or how bias might affect the estimates.
      (1 vote)
  • male robot johnny style avatar for user Mohamed Ibrahim
    What about the mean of the sampling distribution of medians ? shouldn't it be a better estimation of bias ?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Indeed, the mean of the sampling distribution of the sample medians can provide a clearer indication of bias. If the sampling distribution's mean closely matches the population median, it suggests that the sample median is an unbiased estimator of the population median. Bias in an estimator refers to the tendency of the estimator to over or under-estimate the parameter. If the mean of the sampling distribution deviates significantly from the population parameter (in this case, the median), it indicates bias. However, looking at the distribution's symmetry around the population parameter, as described, also offers insight into bias. The closer the sampling distribution's mean is to the population median, the less biased the estimator is considered to be.
      (1 vote)
  • duskpin ultimate style avatar for user Yewon Byun
    So will the term "sampling distribution" be only applied to the population parameter when all the samples are indicated? For example, will the dot plot above just be an approximation of the sampling distribution and not the sampling distribution itself?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Emily
    I don't think I understand what it means to have a biased estimator. What could cause such biase?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user ilya112358
    How much skewed should it be to become biased?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user HannahowensPierre
    Does the unbiased estimator refer to the mean or the sampling distribution in general? For example, would I say "the mean of this sampling distribution is an unbiased estimator" or would I say "this sampling distribution is unbiased" or is mean always an unbiased estimator?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • hopper happy style avatar for user ShaanPatel
      The unbiased/biased estimator usually refers to the measure of central tendency (mean, median, mode, midrange) stated in the question. The unbiased/biased estimator isn't always the mean. Sometimes, the unbiased/biased estimator is the mean; other times, the unbiased/biased estimator is the median. Rarely, if at all, are the mode and the midrange the unbiased/biased estimator. Of the measures of central tendency, you are more likely to see the mean and median used the most as an/a unbiased/biased estimators. Now, to answer your actual question, the unbiased/biased estimator refers to the sampling distribution in general, and sometimes the unbiased/biased estimator refers to the measure of central tendency. So you could say, "This measure of central tendency (insert the one you use here) appears to be an/a unbiased/biased estimator." You could also say, "This sampling distribution appears to be an/a unbiased/biased." What the unbiased/biased estimator refers to is determined by the question, but most often, the unbiased/biased estimator refers to the measure of central tendency.

      I hope this helps and clears any confusion!

      BTW: The mode is the value that appears most often. The midrange is the value of (minimum value + maximum value)/2.
      (1 vote)
  • blobby green style avatar for user Steven Nguyen
    Is sample median as good a estimator as sample mean?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The sample mean and sample median serve as estimators for central tendency but have different properties and are preferable under different conditions. The sample mean is highly efficient (requiring fewer samples to achieve a certain level of precision) and is an unbiased estimator of the population mean. However, it is sensitive to outliers and non-normal distributions.

      The sample median, on the other hand, is a robust estimator that is less affected by outliers and skewed distributions. It might not be as statistically efficient as the mean for normal distributions (requiring more samples to achieve the same precision as the mean), but its robustness makes it valuable in many real-world scenarios where data are not perfectly normal or contain outliers.

      In terms of being "as good," the context dictates preference. For symmetric, outlier-free distributions, the mean can provide more precise estimates. For skewed distributions or those with outliers, the median may offer more representative estimates of central tendency. The choice between using the mean or median depends on the data's nature and the specific requirements of the analysis.
      (1 vote)
  • blobby green style avatar for user davi.692270
    thanks this is so cool and helpfull.
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] We're told Alejandro was curious if sample median was an unbiased estimator of population median. He placed ping pong balls numbered from zero to 32 so I guess that would be what, 33 ping pong balls in a drum and mixed them well. Note that the median of the population is 16, alright? The median number of course yes in that population is 16. He then took a random sample of five balls and calculated the median of the sample. So we have this population of balls. He takes a, we know the population parameter. We know that the population median is 16 but then he starts taking a sample of five balls so n equals five and he calculates a sample median, sample median, and then he replaced the balls and repeated this process for a total of 50 trials. His results are summarized in the dot plot below where each dot represents the sample median from a sample of five balls. So he does this, he takes these five balls, puts them back in then he does it again then he does it again and every time he calculates the sample median for that sample and he plots that on the dot plot so, and he will do this for 50 samples and each dot here represents that sample statistic so it shows that four times we got a sample median, in four of those 50 samples, we got a sample median of 20. In five of those sample medians, we got a sample median of 10 and so what he ends up creating with these dots is really an approximation of the sampling distribution of the sample medians. Now, to judge whether it is a biased or unbiased estimator for the population median, well, actually, pause the video, see if you can figure that out. Alright, now let's do this together. Now, to judge it, let's think about where the true population parameter is, the population median. It's 16, we know that and so that is right over here, the true population parameter. So if we were dealing with a biased, a biased estimator for the population parameter then as we get that, our approximation of the sampling distribution, we would expect it to be somewhat skewed. So for example, if the sampling, if this approximation of the sampling distribution looked something like that then we'd say, okay, that looks like a biased estimator or if it was looking something like that, we'd say, okay, that looks like a biased estimator but if this approximation for our sampling distribution that Alejandro was constructing where we see that roughly the same proportion of the sample statistics came out below as came out above the true parameter and it doesn't have to be exact but it seems roughly the case, this seems pretty unbiased and so to answer the question based on these results, it does appear that the sample median is an unbiased estimator of the population median.