If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:9:13

Techniques for random sampling and avoiding bias

DAT‑2 (EU)
DAT‑2.C (LO)
DAT‑2.C.3 (EK)
DAT‑2.C.4 (EK)

Video transcript

let's say that we run a school and in that school there is a population of students right over here that is our population and we want to get a sense of how these students feel about the quality of math instruction at the school so we construct a survey and we just need to decide who are we going to get to actually answer this survey one option is to just go to every member of the population but let's just say it's a really large school let's say we're we're College and there's 10,000 people in the college or say well we can't just talk to everyone so instead we say let's let's sample this population to get an indication of how the entire school feels so we are going to sample it we're going to sample that population now in order that in order to avoid having bias in our response in order for it to have the best chance of it being indicative of the entire population we want our sample to be random so our sample could either be random random or not random not random and it might seem at first pretty straightforward to do a random sample but when you actually get down to it it's not always as straightforward as as you would think so one type of random sample is just a simple random sample so simple simple random random sample and this is saying all right let me maybe assign a number to every person in the school maybe they already have a student ID number and I'm just going to get a computer a random number generator to generate the 100 people the 100 students so let's say there's a sample of a hundred students that I'm going to apply the survey to so that would be a simple random sample we are just going into this whole population and randomly let me just draw this so this is the population we are just randomly picking people out and we know it's random because a random number generator or we have a string of numbers or something like that that is allowing us to pick these students now pretty good it's unlikely that you're going to have bias from this sample but there is some probability that just by chance your random number your random number generator just happened to select maybe a disproportionate number of boys over girls or a disproportionate number of freshmen or a disproportionate number of engineering majors versus English majors and that's a possibility so even though you're taking a simple random sample it is truly random once again it's some probably this not indicative of the entire population and so to mitigate that there are other techniques at our disposal one technique is a stratified sample stratified and so this is the idea of taking our entire population and essentially stratifying it so let's say we want to we take that same population we take that same population I'll draw it as a square here just for convenience and we're going to stratify it by let's say we're concerned that we get a purple of freshmen sophomores juniors and seniors so we'll stratify it by freshmen sophomores juniors and seniors and then we sample 25 from each of these groups so these are the stratifications this is freshmen sophomore juniors and seniors and instead of just sampling 100 out of the entire pool we sample 25 from each of these so just like that and so that makes sure that you are you are getting indicative responses from at least all of the different all of the different group all of the different age groups or levels within your university now there might be another issue where you say well I'm actually more concerned that we have accurate representation of males and females in the school and there is some probability if I do 100 random people it's very likely that it's close to 5050 but there's some chance just due to randomness that's disproportionately male or disproportionately female and that's even possible in the stratified case and so what you might say is well you what I'm going to do I'm going to there's a technique called a clustered sample let me write this right over here clustered a clustered sample and what we do is we sample groups each of those groups we feel confident has a good balance of male females so for example we might instead of instead of sampling individuals from the entire population we might say look you know in on Tuesdays and Thursdays and this well even there as you can tell this is not a trivial thing to do well let's just say that we can split our let's say we can split our population into groups maybe these are classrooms and each of these classrooms have an even distribution of males and females or pretty close to even distributions and so what we do is we sample the actual classrooms so that's why it's called cluster or cluster technique or clustered random sample because we're going to randomly sample our classrooms each of which have a close or maybe exact balance of males and females so we know that we're going to get good representation but we are still sampling we're sampling from the clusters but then we're going to survey every single person in each of these clusters every single person in one of these classrooms so one again once again these are all forms of random surveys your random samples you have a simple random sample you can stratify or you can cluster and then randomly pick the clusters and then survey everyone in that cluster now if these are all if these are all random samples what are the non-random things like well one case of non-random you could have a voluntary voluntary survey or voluntary sample this might just be you tell every student at the school hey here's here's a web address if you're interested come and fill out this survey and that's likely to introduce bias because you might have maybe the students who really liked the the math instruction at their school more likely to fill it out maybe the students who really don't like it more more likely to fill it out maybe it's just the kids who have more time more likely to fill it out so this has a good chance of introducing bias the students who fill out the survey are just might be just more more skewed one way or the other because just you know they volunteered for it another non another not random sample it would be called a a a you're introducing bias because of convenience is the term that's often used and this might say well let's just sample the hundred first students who show up in school and that's just convenient for me because they didn't have doing these random numbers or do the stratification or doing any of this clustering but you can understand how this also would introduce bias because the first hundred students who show up at school maybe those are the most diligent students maybe they all take the early math class that has a very good instructor or they're all happy about it or it might go the other way the instructor there isn't the best one and so it might introduce bias the other way so if you let people volunteer or you just say oh let me go to the first end students or you say hey let me just talk to all the students who happened to be in front of me right now they might be in front of you out of convenience but they might not be a true random sample now there's other reasons why you might introduce bias and it might not be because of the sampling you might introduce bias because of the wording of your survey you could imagine a survey that says do you consider yourself lucky to get a math education that very few other people in the world have access to well that might bias you to say well yeah I guess I feel lucky well if the wording was do you do you like the fact that a disproportionate more students at your school tend to fail algebra than school than our surrounding schools well they might bias you negatively so the wording really really really matters in surveys and there there's a lot that would go into this and the other one is just people's you know it's called response bias and once again this isn't about response response bias and this is just people not wanting to tell the truth or maybe not wanting to respond at all maybe they're afraid that somehow their response is going to show up it with in front of their math teacher or the administrators or if they're too negative it might be taken out on them in some way and because of that they might not be truthful and so they they might be overly positive or not fill it out at all so anyway these are these are this is very high-level overview of how you could think about sampling you want to go random because it lowers the probability of their introducing some bias into it and then these are some techniques and also think about whether you're falling until some of these pitfalls that have a good chance of introducing bias