# Identifying bias in samples and surveys

AP.STATS:
DAT‑2 (EU)
,
DAT‑2.E (LO)
,
DAT‑2.E.1 (EK)
,
DAT‑2.E.2 (EK)
,
DAT‑2.E.3 (EK)
,
DAT‑2.E.4 (EK)
,
DAT‑2.E.5 (EK)
,
DAT‑2.E.6 (EK)
,
VAR‑1 (EU)
,
VAR‑1.E (LO)
,
VAR‑1.E.1 (EK)
It's important to identify potential sources of bias when planning a sample survey.
When we say there's potential bias, we should also be able to argue if the results will probably be an overestimate or an underestimate.
Try to identify the source of bias in each scenario, and speculate on the direction of the bias (overestimate or underestimate).

## Scenario 1

David hosts a podcast and he is curious how much his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll.
The poll shows that 89, percent of the 200 respondents "love" his show.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Scenario 2

David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the next 100 listeners who send him fan emails.
They don't all respond, but 94 of the 97 listeners who responded said they "loved" his show.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Scenario 3

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling 100 people whose names were randomly sampled from the phone book (note that mobile phones and unlisted numbers aren't in phone books). The senator's office called those numbers until they got a response from all 100 people chosen.
The poll showed that 42, percent of respondents were "very concerned" about internet privacy.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Scenario 4

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached. They called over 1, comma, 000 random phone numbers—most people didn't answer—until they had reached 100 respondents.
The poll showed that 46, percent of respondents were "very concerned" about internet privacy.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Scenario 5

A high school wanted to know what percent of its students smoke cigarettes. During the week when students visited the counselors to schedule classes, they asked every student in person if they smoked cigarettes or not.
The data showed that 5, percent of students smoked cigarettes.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Scenario 6

A high school wanted to know what percent of its students smoke cigarettes. Counselors selected a random sample of students to take a survey on drug use. One of the questions reads, "If you are under the age of 18 years, do you illegally smoke cigarettes?"
The data showed that 5, percent of students smoked cigarettes.
question a
What is the most concerning source of bias in this scenario?

question b
Which direction of bias is more likely in this scenario?

## Want to join the conversation?

• How is voluntary bias different from non responsive bias?
• Voluntary response is when the participants choose to be in that survey. However, non response is when they choose not to be a participants. Really, the main difference is that they are just switched around. The questions will normally tell you what exactly they're looking for :) Hope this helps!
• I have a question.
A reporter from the newspaper wanted to know how much time do students spent on homework in a typical week, so he passes out questionnaires to students in a grade 9 English class, an art class, and a grade 12 math class. After some time, he then collects them. So is this biased or not?
• This would be biased as it is not a random sample of all students.If he went to all kinds of schools and handed out surveys at random it would be an unbiased survey.
• Perhaps scenarios 1-3 could be mixed up a little? They are identical to the worked example from a previous video on sample bias so I find I'm just parroting back the answers rather than actually having my knowledge tested.
• what is convenience sampling?
• Convenience sampling is given away by its name. It is when the sample you chose is the most convenient for you to sample. For example I could conduct a study about overall satisfaction of any online learning program. I could use a sample of only people on Khan Academy learning statistics. That sample is convenient to me because I am on Khan Academy learning statistics. However that does not reflect overall satisfaction of any online learning program, it only shows us the satisfaction for people learning stats on Khan.
• Is there a book that I can read for deep study of statistics?
• Aren't these questions supposed to test our knowlege of the subject? these are all either common sense or in the video.
• what is the purpose of all this as far weather the questions are bias or not.
• I disagree with Scenario 4's direction of bias. When the senator is polling people who are still using listed landlines, they are likely avoiding using mobile devices intentionally. Those of us who use mobile devices are generally less concerned with internet safety than those who avoid devices with internet access like mobile phones. Can someone explain what I'm missing here? I just think the bias is actually showing an overestimation of the population's view of internet safety.
(1 vote)
• From the author:Hi! Scenario 4 says, "She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached."

That means the sample included folks who use landlines and mobile devices. The big issue here is that they tried calling over 1000 phone numbers and most people didn't answer. It's likely that the folks who didn't answer a call from a strange phone number are more concerned about privacy than the 100 people who did answer the call.

So if this survey finds that 46% of the 100 respondents are concerned about internet privacy, I'd bet that's an underestimate since the group who didn't answer might care about privacy more in general.