### Course: AP®︎/College Statistics>Unit 6

Lesson 2: Potential problems with sampling

# Identifying bias in samples and surveys

It's important to identify potential sources of bias when planning a sample survey.
When we say there's potential bias, we should also be able to argue if the results will probably be an overestimate or an underestimate.
Try to identify the source of bias in each scenario, and speculate on the direction of the bias (overestimate or underestimate).

## Scenario 1

David hosts a podcast and he is curious how much his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll.
The poll shows that $89\mathrm{%}$ of the $200$ respondents "love" his show.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

## Scenario 2

David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the next $100$ listeners who send him fan emails.
They don't all respond, but $94$ of the $97$ listeners who responded said they "loved" his show.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

## Scenario 3

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling $100$ people whose names were randomly sampled from the phone book (note that mobile phones and unlisted numbers aren't in phone books). The senator's office called those numbers until they got a response from all $100$ people chosen.
The poll showed that $42\mathrm{%}$ of respondents were "very concerned" about internet privacy.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

## Scenario 4

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached. They called over $1,000$ random phone numbers—most people didn't answer—until they had reached $100$ respondents.
The poll showed that $46\mathrm{%}$ of respondents were "very concerned" about internet privacy.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

## Scenario 5

A high school wanted to know what percent of its students smoke cigarettes. During the week when students visited the counselors to schedule classes, they asked every student in person if they smoked cigarettes or not.
The data showed that $5\mathrm{%}$ of students smoked cigarettes.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

## Scenario 6

A high school wanted to know what percent of its students smoke cigarettes. Counselors selected a random sample of students to take a survey on drug use. One of the questions reads, "If you are under the age of $18$ years, do you illegally smoke cigarettes?"
The data showed that $5\mathrm{%}$ of students smoked cigarettes.
What is the most concerning source of bias in this scenario?

Which direction of bias is more likely in this scenario?

• How is voluntary bias different from non responsive bias?
• Voluntary response bias occurs when the sampling population has the ability to not respond. Referencing the podcast show example, the negative effect of allowing listeners to respond voluntary is that a majority of those that enjoyed the show would have more desired and spend time to answer a question, rather than those who didn't find enjoyment from the show. When a large proportion of the population in question doesn't respond, the random sample size is reduced and non responsive bias becomes an issue. If 1,000 people are sampled, and only 100 people respond, a 90% non responsive rate would result in a non responsive bias.
• I have a question.
A reporter from the newspaper wanted to know how much time do students spent on homework in a typical week, so he passes out questionnaires to students in a grade 9 English class, an art class, and a grade 12 math class. After some time, he then collects them. So is this biased or not?
• This would be biased as it is not a random sample of all students.If he went to all kinds of schools and handed out surveys at random it would be an unbiased survey.
• I disagree with Scenario 4's direction of bias. When the senator is polling people who are still using listed landlines, they are likely avoiding using mobile devices intentionally. Those of us who use mobile devices are generally less concerned with internet safety than those who avoid devices with internet access like mobile phones. Can someone explain what I'm missing here? I just think the bias is actually showing an overestimation of the population's view of internet safety.
• From the author:Hi! Scenario 4 says, "She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached."

That means the sample included folks who use landlines and mobile devices. The big issue here is that they tried calling over 1000 phone numbers and most people didn't answer. It's likely that the folks who didn't answer a call from a strange phone number are more concerned about privacy than the 100 people who did answer the call.

So if this survey finds that 46% of the 100 respondents are concerned about internet privacy, I'd bet that's an underestimate since the group who didn't answer might care about privacy more in general.
• Why would anybody answer a question where they admitted to doing something illegal?
• That's an example of "Biased wording", Miriam! If you read carefully, it actually explains this in the lesson!
• Perhaps scenarios 1-3 could be mixed up a little? They are identical to the worked example from a previous video on sample bias so I find I'm just parroting back the answers rather than actually having my knowledge tested.
• what is the difference of response bias and nonresponse bias?
• Nonresponse bias is a type of response bias. In general, response bias occurs when the results of a survey are biased due to missing or incorrect responses. In the case of nonresponse bias, a particular group is left out of the survey, so their answers aren't represented in the results. In addition to nonresponse bias, another type of response bias is when a respondent gives an untrue response.
Hope this helps!
• what is convenience sampling?
• Convenience sampling is given away by its name. It is when the sample you chose is the most convenient for you to sample. For example I could conduct a study about overall satisfaction of any online learning program. I could use a sample of only people on Khan Academy learning statistics. That sample is convenient to me because I am on Khan Academy learning statistics. However that does not reflect overall satisfaction of any online learning program, it only shows us the satisfaction for people learning stats on Khan.
• Is there a book that I can read for deep study of statistics?
• Is voluntary response when those being asked have the option not to respond, or just when the question itself does not have an assigned sample it is asking?

A question on the Practice: Bias in Samples and Surveys exercises read like this, "A mobile phone service provider wants to survey its customers to study privacy concerns and the sharing of their personal information. They call 5,000 randomly selected phone numbers from a database containing the phone number of every customer. If someone selected doesn't answer, they'll attempt calling back up to 2 more times before giving up on reaching that person.

They reach 350 customers with this strategy, and 60% of those reached say they are at least "somewhat concerned" about their personal information being shared without their knowledge or consent.

Which of these is the most concerning potential source of bias in the provider's survey?"

The answer is Nonresponse bias because of how many did not respond, but one of the options was bias from voluntary response. The reason it gives for this not being correct is, "Voluntary response is when a researcher gives an open invitation and people decide to be in the sampler not. the service provider selected a random sample of 5000 customers so they didn't use a voluntary response strategy"

Again, I know it isn't the correct answer, but I thought voluntary response was a correct way of describing the situation. If not then voluntary response seems like a not so accurate label