If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Example: Analyzing the difference in distributions

Finding the probability that a randomly selected woman is taller than a randomly selected man by understanding the distribution of the difference of normally distributed variables.

## Want to join the conversation?

• why do we use man minus woman?and not woman minus man?
• That is a great question.
In the video we have D = M - W and the Z-score of -0.8 gives P(D < 0) = 0.2119 under the N(8, 100) distribution.

If we define D = W - M our distribution is now N(-8, 100) and we would want P(D > 0) to answer the question. Our Z-score would then be 0.8 and P(D > 0) = 1 - 0.7881 = 0.2119, which is same as our original result.

The difference between the approaches is which side of the curve you are trying to take the Z-score for.
• I don't understand why did we set the probability P(D<0)?
• D is the difference between men and women's height: M-W.
If D is less than 0 ie. negative than that would mean the woman was taller than the man. Man 170 cm Women 175 cm D = M-W D = 170-175 D = -5.
• Sorry what if we don't have TI 84 calculator tool? how do you do it manually? without calculator ? thanks
• You can do it manually by consulting a z-score table (e.g., http://users.stat.ufl.edu/~athienit/Tables/Ztable.pdf). You find the cross section that matches your z-score, and it will give you what his calculator just did.
(1 vote)
• it would make more sense that D=W-M because we are looking for the probability of women being taller than men!
• For combining for normally distributed variables, the independence clause needs to be there? Since variance is the spread of the data.
• 1. for mean, no need. you can just add or subtract each of means

2. but for variance and standard deviation, we have to consider the independence between them. cause there's a concept of "co"variance when they are dependent on one another, which means some parts of their values vary together. you can see it as a kind of amplified wave when two similar ones meet together

3. for example
X = how many salty snacks i ate
Y = how much water i drank with them
the more varied X is, the more varied Y would be, we can guess
and vice versa
so var(X+Y) = var(X) + var(Y) + covar(X,Y)
#covar(X,Y) tells how much more would you consume X or Y by the consumption of Y or X

one more thing, what about var(X-Y)? would you think you still have to add covar(X,Y) to var(X)+var(Y)? or subtract? or need a whole new formula? please sleep on it
• I went ahead and solved this before watching the rest of the video.
My logic was slightly different and I ended up getting a Z-score of 0.8 instead of -0.8 which resulted in a slightly different value for the Probability (got 0.2119 instead of 0.212).

For my Z-score I did 178-170 (X - mean) because my mean is the women's mean 170 and for all the women taller than the men they must have at least a height surpassing the men's mean 178. Then, I got a positive 8 then divide by 10 (sum of the 2 variances). Hence, I got a positive 0.8 Z-score.

I got the result mentioned above because when I drew out normal distribution on paper I drew out the men's then the women's on top of it because the -1SD of men (170) is the women's mean, instead of drawing out 3 different normal distribution graphs like Sal did.

When drawing this way, it led me into thinking that I needed to find the area under the curve of the women's distribution on the right from 178 towards positive infinity. Hence, I did (178-170)/10 to get a positive 0.8 Z-score. :(

But when Sal showed that D = M-W and to find the women taller than men P(D<0) this one makes sense to me as well. I find his way makes sense when he draws out 3 different graphs and my way makes sense when I superimpose 2 distribution graphs. So, I guess my mistake here was that I superimposed the graphs and for future solving of this kind of problem I should never superimpose to avoid being misled?
(1 vote)
• Why do we want to find the difference? (M - W) instead of find the sum of each independent variables? (M + W)
(1 vote)
• Can someone help me understand why we subtract the random variables in order to find the answer to the question?
(1 vote)
• Is there more than this, because my AP stats class already past this stuff in the 1st quarter?
• Wouldn't -1E9999 work better or even -1E9999999999999999999999999
You know what I mean right?