If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 6

Lesson 1: Introduction to planning a study

# Worked example identifying observational study

Worked example identifying observational study.

## Want to join the conversation?

• How could you have a Sample study and it not be an Observational or Experimental study as well? You sample the population, but how do you get data from it without using Observation or Experimentation?

Shouldn't it be Observation and Experimentation, and Sample or Non-Sample be methods of performing those study methods?

I.E. Sample Observation, Sample Experimentation, non-Sample (or population?) Observation, or something like that?
• A sample study is a way to estimate the value of some data over all the member of the population by just sampling randomly over some percentage of the population. i.e. We want to know how many percentage of US citizens play chess, and we estimate the number by sampling (asking) randomly over 5000 citizens to estimate the real percentage.

Experimental study is basically a comparison experiment between two groups, the test (treatment) group and the control group to reveal whether there's a causality relation between the treatment and the effect observed.

The objective of observational study is to find a correlation between two variables by observing over a sample, i.e. survey over some random people to know the relationship between sugar intake and heart disease risk.
• Why would the association be more appropriate in this case?
• Association refers to a more general analysis of two variables whereas correlation refers to a specific measure (The Pearson Correlation Coefficient) between two variables of a linear relationship in particular.
• The details of the problem here seem vague. We actually can't really say if the 258 liters are significant or not. Was that 258 liters per cow? per farm? for the entire population of named cows? And was this per day, per month or per year? If the change in milk production was an increase over some 5000 liters of milk, then, yes, of course it suggests a correlation. But if it was an increase above a couple of million liters of milk, that's only about .01% increase. Then the number is not at all statistically significant, and doesn't suggest any correlation at all.

Am I missing something?
• The lack of specific details in the problem description could indeed impact the interpretation of the results. Knowing whether the 258 liters are per cow, per farm, or for the entire population of named cows, as well as the time period over which the increase occurred (per day, per month, per year), would provide crucial context for assessing the significance of the observed correlation.
(1 vote)
• Why's association more appropriate ?
• Association is more appropriate than causation because observational studies can only establish correlations or associations between variables, not causation. In this case, the study identifies a positive association between naming cows and higher milk yield, but it cannot determine whether naming the cows directly causes the increase in milk yield or if there are other factors involved.
• The observational study shows a positive correlation between farmer perception of cow's mental capacity and milk yield, and there are two clusters explained by "naming" and "no naming" where the naming cluster has a higher milk yield on average. Am I right ?
• Yes, you are correct. The observational study demonstrates a positive correlation between the farmer perception of cow's mental capacity and milk yield. The clusters formed by "naming" and "no naming" show that farms where cows are named have a higher milk yield on average compared to farms where cows are not named.
• A question stated:

The mayor of Statville has to decide how to allocate her education budget between the two high schools in town, “Stat Sticks” and “Datum High.” To decide which school deserves a bigger portion of the budget, she went over the grade sheets of the students from both schools in the past 5 years, and analyzed the data. She found that the overall grade average of “Stat Sticks” students is 4 points higher than the overall grade average of the students of “Datum High.”

And the answer to "what valid conclusion can be made from the result?" was

Students from “Stat Sticks,” in the last 5 years, had higher grades, on average, than students from “Datum High.”

But there was another option that specified the overall grade for each student being 4 points higher, I was wondering why that one was incorrect and the one more generally saying it was higher was correct.