If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Identifying bias in samples and surveys

It's important to identify potential sources of bias when planning a sample survey.
When we say there's potential bias, we should also be able to argue if the results will probably be an overestimate or an underestimate.
Try to identify the source of bias in each scenario, and speculate on the direction of the bias (overestimate or underestimate).

Scenario 1

David hosts a podcast and he is curious how much his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll.
The poll shows that 89% of the 200 respondents "love" his show.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Scenario 2

David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the next 100 listeners who send him fan emails.
They don't all respond, but 94 of the 97 listeners who responded said they "loved" his show.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Scenario 3

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling 100 people whose names were randomly sampled from the phone book (note that mobile phones and unlisted numbers aren't in phone books). The senator's office called those numbers until they got a response from all 100 people chosen.
The poll showed that 42% of respondents were "very concerned" about internet privacy.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Scenario 4

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached. They called over 1,000 random phone numbers—most people didn't answer—until they had reached 100 respondents.
The poll showed that 46% of respondents were "very concerned" about internet privacy.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Scenario 5

A high school wanted to know what percent of its students smoke cigarettes. During the week when students visited the counselors to schedule classes, they asked every student in person if they smoked cigarettes or not.
The data showed that 5% of students smoked cigarettes.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Scenario 6

A high school wanted to know what percent of its students smoke cigarettes. Counselors selected a random sample of students to take a survey on drug use. One of the questions reads, "If you are under the age of 18 years, do you illegally smoke cigarettes?"
The data showed that 5% of students smoked cigarettes.
question a
What is the most concerning source of bias in this scenario?
Choose 1 answer:

question b
Which direction of bias is more likely in this scenario?
Choose 1 answer:

Want to join the conversation?

  • aqualine ultimate style avatar for user Chandni Patel
    How is voluntary bias different from non responsive bias?
    (26 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user gavin hood
      Voluntary response bias occurs when the sampling population has the ability to not respond. Referencing the podcast show example, the negative effect of allowing listeners to respond voluntary is that a majority of those that enjoyed the show would have more desired and spend time to answer a question, rather than those who didn't find enjoyment from the show. When a large proportion of the population in question doesn't respond, the random sample size is reduced and non responsive bias becomes an issue. If 1,000 people are sampled, and only 100 people respond, a 90% non responsive rate would result in a non responsive bias.
      (71 votes)
  • blobby green style avatar for user ajahan82
    I have a question.
    A reporter from the newspaper wanted to know how much time do students spent on homework in a typical week, so he passes out questionnaires to students in a grade 9 English class, an art class, and a grade 12 math class. After some time, he then collects them. So is this biased or not?
    (9 votes)
    Default Khan Academy avatar avatar for user
  • duskpin seedling style avatar for user Miriam O
    Why would anybody answer a question where they admitted to doing something illegal?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Nathaniel Eades
    I disagree with Scenario 4's direction of bias. When the senator is polling people who are still using listed landlines, they are likely avoiding using mobile devices intentionally. Those of us who use mobile devices are generally less concerned with internet safety than those who avoid devices with internet access like mobile phones. Can someone explain what I'm missing here? I just think the bias is actually showing an overestimation of the population's view of internet safety.
    (5 votes)
    Default Khan Academy avatar avatar for user
    • eggleston blue style avatar for user Jeff Dodds
      From the author:Hi! Scenario 4 says, "She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached."

      That means the sample included folks who use landlines and mobile devices. The big issue here is that they tried calling over 1000 phone numbers and most people didn't answer. It's likely that the folks who didn't answer a call from a strange phone number are more concerned about privacy than the 100 people who did answer the call.

      So if this survey finds that 46% of the 100 respondents are concerned about internet privacy, I'd bet that's an underestimate since the group who didn't answer might care about privacy more in general.
      (10 votes)
  • leafers ultimate style avatar for user Sonja Halicki
    Perhaps scenarios 1-3 could be mixed up a little? They are identical to the worked example from a previous video on sample bias so I find I'm just parroting back the answers rather than actually having my knowledge tested.
    (9 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user dalves24
    what is the difference of response bias and nonresponse bias?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Tanner P
      Nonresponse bias is a type of response bias. In general, response bias occurs when the results of a survey are biased due to missing or incorrect responses. In the case of nonresponse bias, a particular group is left out of the survey, so their answers aren't represented in the results. In addition to nonresponse bias, another type of response bias is when a respondent gives an untrue response.
      Hope this helps!
      (7 votes)
  • blobby green style avatar for user boucher_madina
    what is convenience sampling?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Caleb Man
      Convenience sampling is given away by its name. It is when the sample you chose is the most convenient for you to sample. For example I could conduct a study about overall satisfaction of any online learning program. I could use a sample of only people on Khan Academy learning statistics. That sample is convenient to me because I am on Khan Academy learning statistics. However that does not reflect overall satisfaction of any online learning program, it only shows us the satisfaction for people learning stats on Khan.
      (5 votes)
  • primosaur ultimate style avatar for user Efrain Quintero Narvaez
    Is there a book that I can read for deep study of statistics?
    (0 votes)
    Default Khan Academy avatar avatar for user
  • female robot grace style avatar for user loumast17
    Is voluntary response when those being asked have the option not to respond, or just when the question itself does not have an assigned sample it is asking?

    A question on the Practice: Bias in Samples and Surveys exercises read like this, "A mobile phone service provider wants to survey its customers to study privacy concerns and the sharing of their personal information. They call 5,000 randomly selected phone numbers from a database containing the phone number of every customer. If someone selected doesn't answer, they'll attempt calling back up to 2 more times before giving up on reaching that person.

    They reach 350 customers with this strategy, and 60% of those reached say they are at least "somewhat concerned" about their personal information being shared without their knowledge or consent.

    Which of these is the most concerning potential source of bias in the provider's survey?"

    The answer is Nonresponse bias because of how many did not respond, but one of the options was bias from voluntary response. The reason it gives for this not being correct is, "Voluntary response is when a researcher gives an open invitation and people decide to be in the sampler not. the service provider selected a random sample of 5000 customers so they didn't use a voluntary response strategy"

    Again, I know it isn't the correct answer, but I thought voluntary response was a correct way of describing the situation. If not then voluntary response seems like a not so accurate label
    (3 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Shmuel
      The bias might be undercoverage, because they tried to reach 5000 customers and only. 350 responded. If they were using voluntary response method, they would give them a survey and ask them to fill it out. This means that if they don't want to fill it out, they just don't answer the survey.
      (1 vote)
  • duskpin ultimate style avatar for user ronaldo
    why do people ask about drug using anyway?
    (2 votes)
    Default Khan Academy avatar avatar for user