Main content

## More significance testing videos

# Z-statistics vs. T-statistics

## Video transcript

I want to use this video
to kind of make sure we intuitively and otherwise and
understand the difference between a Z-statistic--
something I have trouble saying-- and a T-statistic. So in a lot of what we're doing
in this inferential statistics, we're trying to
figure out what is the probability of getting a
certain sample mean. So what we've been doing,
especially when we have a large sample size-- so let
me just draw a sampling distribution here. So let's say we have a sampling
distribution of the sample mean right here. It has some assumed mean value
and some standard deviation. What we want to do is any result
that we get, let's say we get some sample
mean out here. We want to figure out the
probability of getting a result at least as
extreme as this. So you can either figure out the
probability of getting a result below this and subtracted
that from 1, or just figure out this area
right over there. And to do that we've been
figuring out how many standard deviations above the mean
we actually are. The way we figured that out is
we take our sample mean, we subtract from that our mean
itself, we subtract from that what we assume the mean should
be, or maybe we don't know what this is. And then we divide that by the
standard deviation of the sampling distribution. This is how many standard
deviations we are above the mean. That is that distance
right over there. Now, we usually don't know
what this is either. We normally don't know
what that is either. And the central limit theorem
told us that assuming that we have a sufficient sample size,
this thing right here, this thing is going to be the same
thing as-- the sample is going to be the same thing as the
standard deviation of our population divided by
the square root of our sample size. So this thing right over here
can be re-written as our sample mean minus the mean of
our sampling distribution of the sample mean divided by
this thing right here-- divided by our population mean,
divided by the square root of our sample size. And this is essentially our
best sense of how many standard deviations away from
the actual mean we are. And this thing right here, we've
learned it before, is a Z-score, or when we're dealing
with an actual statistic when it's derived from the sample
mean statistic, we call this a Z-statistic. And then we could look it up
in a Z-table or in a normal distribution table to say what's
the probability of getting a value of this
Z or greater. So that would give us
that probability. So what's the probability
of getting that extreme of a result? Now normally when we've done
this in the last few videos, we also do not know what the
standard deviation of the population is. So in order to approximate that
we say that the Z-score is approximately, or the
Z-statistic, is approximately going to be-- so let me just
write the numerator over again-- over, we estimate this
using our sample standard deviation-- let me do this in
a new color-- with using our sample standard deviation. And this is OK if our sample
size is greater than 30. Or another way to think about
it is this will be normally distributed if our sample
size is greater than 30. Even this approximation will
be approximately normally distributed. Now, if your sample size is less
than 30, especially if it's a good bit less than
30, all of a sudden this expression will not be
normally distributed. So let me re-write the
expression over here. Sample mean minus the mean of
your sampling distribution of the sample mean divided by your
sample standard deviation over the square root of
your sample size. We just said if this thing is
well over 30, or at least 30, then this value right here, this
statistic, is going to be normally distributed. If it's not, if this is small,
then this is going to have a T-distribution. And then you're going to do the
exact same thing you did here, but now you would assume
that the bell is no longer a normal distribution, so this
example it was normal. All of Z's are normally
distributed. Over here in a T-distribution,
and this will actually be a normalized T-distribution
right here because we subtracted out the mean. So in a normalized
T-distribution, you're going to have a mean of 0. And what you're going to do is
you want to figure out the probability of getting a T-value
at least this extreme. So this is your T-value you
would get, and then you essentially figure out the area
under the curve right over there. So a very easy rule of thumb
is calculate this quantity either way. Calculate this quantity
either way. If you will have more than 30
samples, if your sample size is more than 30, your sample
standard deviation is going to be a good approximator for your population standard deviation. And so this whole thing is
going to be approximately normally distributed, and so
you can use a Z-table to figure out the probability
of getting a result at least that extreme. If your sample size is small,
then this statistic, this quantity, is going to have a
T-distribution, and then you're going to have to use a
T-table to figure out the probability of getting a T-value
at least this extreme. And we're going to see this
in an example a couple of videos from now. Anyway, hopefully that helped
clarify some things in your head about when to use a
Z-statistic or when to use a T-statistic.