Main content

## High school statistics

### Course: High school statistics > Unit 5

Lesson 3: Sampling methods# Systematic random sampling

AP.STATS:

DAT‑2 (EU)

, DAT‑2.C (LO)

, DAT‑2.C.5 (EK)

CCSS.Math: In a systematic random sample, we arrange members of a population in some order, pick a random starting point, and select every member in a set interval. Created by Sal Khan.

## Want to join the conversation?

- 2:15

Why is it important to randomly pick the first person? Can't we just simply pick every 100th person, for example? Why would that be biased?(3 votes)- While that isn't super important we are just doing our best to get rid of all types of bias that could occur. In this case, we might be afraid that some time of bias could arise starting on intervals from the first "person" or "item".

Hope this helps. :-)(3 votes)

- 3:14, if the k=37, shouldn't we be surveying the next 37th person instead of the next 100th?(2 votes)
- If it was every 37th person, we would expect to survey roughly 269 people. Sal says we are surveying only a hundred, and as we know, 10,000/100=100. We would therefore have just the right amount if we survey every 100th person.

Hope this helps!(2 votes)

- Would systematic random sampling have bias like convenience since you are just at the front gate? Could you have two survey checkpoints were other people get surveyed?(2 votes)
- Good point! I suppose this example only refers to one entrance gate. Hopefully helped!(1 vote)

- Do you just randomly choose any?(2 votes)
- Any what? Please specify so that I can help. Thanks!(1 vote)

## Video transcript

- [Instructor] In this video,
we're going to talk about random sampling, which
we've already talked about in other videos. And we're going to compare
what we already know about simple random sampling to a new type of random sampling that
we're going to introduce in this video. And that is systematic random sampling. So let's look at an example. Let's say that there is a
concert that is happening and we expect approximately 10,000 people to attend the concert. And we want to randomly
sample people at the concert. Maybe we wanna do a study on how do people get to the concert. How do people get to the concert? Do they drive and park? Do
they ride with a friend? Do they take an Uber
or a cab of some kind? And so we wanna find a random
sample, ideally without bias, to survey people. So there's a couple of
ways you could do it. You could try to do a
simple random sample. And that might be a case
of if you could somehow get the names of all
10,000 people and put them into a big bowl like this. And then let's say you
want to sample 100 people. Let's say you want to sample
approximately 100 people. You could just mix up all the names that may be on these little
pieces of paper, 10,000 of them, and then pull them out and
pull out a random sample of 100 of them. That would be a simple random sample. But you could already
imagine there might be some logistic difficulties of doing this. How are you going to get the 10,000 names? You're gonna write 'em
on a piece of paper. That's gonna be a you'd
have to really mix 'em good so it's truly random
who you're picking out. So are there other ways
of doing a random sample? And as you can imagine, yes, there are. And that's where systematic
random sampling is useful. One way to think about
systematic random sampling is you're going to randomly
sample a subset of the people who are maybe walking into the concert. So let's say people get to
the concert and they start forming a line to get into the concert. What you wanna do in
systematic random sampling is randomly pick your first person. There's a bunch of ways
that you could do that. Let's say you have a
random number generator that'll generate a number from one to 100. And that's going to be the
first person you survey. If that random number
generator generates a 37, then you're going to start
with the 37th person in line. So you pick that first person
randomly. You survey them. And remember, our goal is
to sample about 100 people out of 10,000. So we wanna roughly sample
one out of every 100 people. And so what you do there is
once you have that first person that you're sampling, you then sample every 100th person after that. That's called sometimes
the sample interval. And the reason why 100 people
is because if you sample every 100th person after
that, you're gonna roughly get 100 people in your sample
out of a total of 10,000. So this is going to be after 100, you're going to sample someone else. And then after another 100, you're going to sample someone else. Now, the reason why this
is useful is you could say okay, that first person was random. And then every person after that. It doesn't seem like there'd be any bias for why they would be the 100th person after that first person. You don't wanna just
do the first 100 people because those might be the early birds, the people who may be
disproportionately went parking or planned earlier or had
some bias in some way. So you do wanna make sure
that you're getting, you know, both the beginning, the middle,
and the end of the line, which this thing helps. Now, we have to be careful. Even systematic random
sampling is not foolproof. There's a situation where inadvertently even this system has bias. Let's say that this is the arena. This is a top view of the
arena right over here. And this is the line of people coming in. And this is where you are
standing and you are counting every 100th person. But maybe, and let's say
there's a tree right over here. And maybe there's a road. I'm
making this quite elaborate. So maybe there is a road right over here. And a lot of people, maybe all
of the people who are walking or taking a cab are coming
from this direction. And maybe all of the
people from the parking lot are coming from this direction. And maybe you have a police
officer right over here who is doing crowd control,
who lets 50 people, 50 of these people in, followed
by 50 of these people in. Well, in that situation,
every 100th, you might end up just sampling one side or the other. So you have to make sure
that there isn't some bias that's being introduced
into this line somehow that might distort your sample.