If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Combining random variables

## Effect on mean, standard deviation, and variance

We can form new distributions by combining random variables. If we know the mean and standard deviation of the original distributions, we can use that information to find the mean and standard deviation of the resulting distribution.
We can combine means directly, but we can't do this with standard deviations. We can combine variances as long as it's reasonable to assume that the variables are independent.
MeanVariance
Adding: T, equals, X, plus, Ymu, start subscript, T, end subscript, equals, mu, start subscript, X, end subscript, plus, mu, start subscript, Y, end subscriptsigma, start subscript, T, end subscript, squared, equals, sigma, start subscript, X, end subscript, squared, plus, sigma, start subscript, Y, end subscript, squared
Subtracting: D, equals, X, minus, Ymu, start subscript, D, end subscript, equals, mu, start subscript, X, end subscript, minus, mu, start subscript, Y, end subscriptsigma, start subscript, D, end subscript, squared, equals, sigma, start subscript, X, end subscript, squared, plus, sigma, start subscript, Y, end subscript, squared
Here's a few important facts about combining variances:
• Make sure that the variables are independent or that it's reasonable to assume independence, before combining variances.
• Even when we subtract two random variables, we still add their variances; subtracting two variables increases the overall variability in the outcomes.
• We can find the standard deviation of the combined distributions by taking the square root of the combined variances.

## Example 1: Establishing independence

To combine the variances of two random variables, we need to know, or be willing to assume, that the two variables are independent.
QUESTION A (Example 1)
For which pairs of variables would it be reasonable to assume independence?

## Example 2: SAT scores

Approximately 1.7 million students took the SAT in 2015. Each student received a critical reading score and a mathematics score.
Here are summary statistics for each section of the test in 2015:
SectionMeanStandard deviation
Critical readingmu, start subscript, C, R, end subscript, equals, 495sigma, start subscript, C, R, end subscript, equals, 116
Mathematicsmu, start subscript, M, end subscript, equals, 511sigma, start subscript, M, end subscript, equals, 120
Totalmu, start subscript, T, end subscript, equals, start text, question mark, end textsigma, start subscript, T, end subscript, equals, start text, question mark, end text
Suppose we choose a student at random from this population.
Question A (Example 2)
What is the mean of the sum of a student’s critical reading and mathematics scores?

Question B (Example 2)
What is the standard deviation of the sum of a student’s critical reading and mathematics scores?

## Example 3: Item inspections

Each of a certain item at a factory gets inspected by 4 employees. The amount of time it takes each employee to inspect the item has a mean of 30 seconds and a standard deviation of 6 seconds. Furthermore, the amount of time it takes a given employee to inspect an item is not impacted by how long it takes another employee to inspect that item.
Let T be the total amount of time it takes it takes 4 employees to inspect a randomly selected item.
Question A (Example 3)
What is the mean total amount of time it takes 4 employees to inspect a randomly selected item?

Question B (Example 3)
What is the standard deviation of the total amount of time it takes 4 employees to inspect a randomly selected item?

## Example 4: Difference in heights

A sociologist took a large sample of military members and looked at the heights of the men and women in the sample. The summary statistics for the heights of the people in the study are shown below.
Suppose that we choose a random man and a random woman from the study and look at the difference between their heights. Let M represent the man's height, W represent the woman's height, and D represent the difference between their heights left parenthesis, D, equals, M, minus, W, right parenthesis.
MeanStandard deviation
Manmu, start subscript, M, end subscript, equals, 178, start text, c, m, end textsigma, start subscript, M, end subscript, equals, 7, start text, c, m, end text
Womanmu, start subscript, W, end subscript, equals, 164, start text, c, m, end textsigma, start subscript, W, end subscript, equals, 6, start text, c, m, end text
Differencemu, start subscript, D, end subscript, equals, start text, question mark, end textsigma, start subscript, D, end subscript, equals, start text, question mark, end text
Question A (Example 4)
What is the mean of the difference between the two heights?

Question B (Example 4)
What is the standard deviation of the difference between the two heights?

## Want to join the conversation?

• I do not agree with explanation of Example 2 "... In fact, we should suspect such scores to not be independent." Why would the reading and math scores are correlated to each other? Plenty of people are good at one only. •  Well, I don't think anyone has the 'right' answer but I believe people usually get higher scores on both sections, not just one (in most cases). But I still think they should've stated it more clearly.
• In the examples, we only added two means and variances, can we add more than two means or variances? • In Example 2, both the random variables are dependent . Thus the mean of the sum of a student’s critical reading and mathematics scores must be different from just the sum of the expected value of first RV and the second RV. But the answer says the mean is equal to the sum of the mean of the 2 RV, even though they are independent. • I have understood that E(T=X+Y) = E(X)+E(Y) when X and Y are independent.

But I am unable to understand it in my gut because, check the Example 3 Question A.

If each employee on an average require 30 seconds to inspect a randomly selected item and T is the time it takes 4 employees to inspect a randomly selected item, how can the mean of 4 employees, E(T), inspecting a randomly selected item be 120 seconds?

I know substituting in the formula gives 120 sec but this is exactly opposite of what my intuition says. • I'm not sure if this will help any, but I think when they are talking about adding the total time an item is inspected by the employees, it's being inspected by each employee individually and the times are added up, instead of the employees simultaneously inspecting it.

So the item starts with Employee A, who inspects it for 30 seconds, and then it's passed to Employee B, who inspects it for 30 seconds, and so forth. So 30 + 30 + 30 + 30 = 120, so the item spends a total of 120 seconds being inspected by the employees.

Does that help clarify it for you?
• Still not feeling the intuition that substracting random variables means adding up the variances. Why should the difference between men's heights and women's heights lead to a SD of ~9cm? • "Subtracting two variables increases the overall variability in the outcomes."

I'd like to understand this comment intuitively.
Why does the standard deviation (variance) increase when subtracting two variables?

Is it because the number of samples decreases when subtracting variables? (Thus, variability increases?)
(1 vote) • When would you include something in the squaring? For example, in 3b, we did sqrt(4(6)^) or sqrt(4x36) for the SD. Is there any situation (whether it be in the given question or not) that we would do sqrt((4x6)^2) instead?
(1 vote) • Example 2: SAT scores

Is the mean of the sum of two random variables different from the mean of two randome variables?

Assuming the case like below:
Critical Reaing: {498, 495, 492}, mean = 495
Mathmatics: {512, 502, 519}, mean = 511

The mean of the sum of a student’s critical reading and mathematics scores = 495 + 511 = 1006
The mean of a student’s critical reading and mathematics scores = 503, which is not 1006

What is "the mean of the sum of two random variables"? I cannot understand it intuitively.
What kind of situation do we actually use "the mean of the sum of two random variables" in statistics? • 𝑋 = {498, 495, 492} ⇒ 𝜇(𝑋) = (498 + 495 + 492)∕3 = 495
𝑌 = {512, 502, 519} ⇒ 𝜇(𝑌) = (512 + 502 + 519)∕3 = 511

𝑋 + 𝑌 = {498 + 512, 495 + 502, 492 + 519} = {1010, 997, 1011}
⇒ 𝜇(𝑋 + 𝑌) = (1010 + 997 + 1011)∕3 = 1006

𝜇(𝑋) + 𝜇(𝑌) = 495 + 511 = 1006

– – –

Let's say we wanted to know how many hours the average person spends at work per week.

One way to conduct the survey would be to choose five random Mondays, five random Tuesdays, and so on. That would give us a total of 35 days.
Then on each of these days we call 25 random people and ask them how many hours they spent working yesterday.

For each of the 35 days we can calculate the average amount of hours for the 25 people we called that day:
𝜇(Day 1)
𝜇(Day 2)

𝜇(Day 35)

Let's say Day 1-5 are Mondays, Day 6-10 are Tuesdays, and so on.
𝜇(Monday) = (𝜇(Day 1) + 𝜇(Day 2) + ... + 𝜇(Day 5))∕5
𝜇(Tuesday) = (𝜇(Day 6) + 𝜇(Day 7) + ... + 𝜇(Day 10))∕5

𝜇(Sunday) = (𝜇(Day 31) + 𝜇(Day 32) + ... + 𝜇(Day 35))∕5

Finally,
𝜇(Week) = 𝜇(Monday + Tuesday + ... + Sunday)
= 𝜇(Monday) + 𝜇(Tuesday) + ... + 𝜇(Sunday)  