hijkjiewjtijijdiqjsnasm

the answer for this would be msansjqidjijitjweijkjih

Main content

Course: AP®︎/College Computer Science Principles > Unit 5

Lesson 1: Data tools

Finding patterns in data sets

We often collect data so that we can find patterns in the data, like numbers trending upwards or correlations between two sets of numbers.

Depending on the data and the patterns, sometimes we can see that pattern in a simple tabular presentation of the data. Other times, it helps to visualize the data in a chart, like a time series, line graph, or scatter plot.

Let's explore examples of patterns that we can find in the data around us.

Spotting trends

A trending quantity is a number that is generally increasing or decreasing.

Consider this data on babies per woman in India from 1955-2015:

Year	Babies per woman
1960	5.91
1970	5.59
1980	4.83
1990	4.05
2000	3.31
2010	2.60

Source: Gapminder, Children per woman (total fertility rate).

In this case, the numbers are steadily decreasing decade by decade, so this is a downward trend.

Now consider this data about US life expectancy from 1920-2000:

Year	Life expectancy
1920	55.38
1930	59.57
1940	63.24
1950	68.07
1960	69.86
1970	70.86
1980	73.91
1990	75.4
2000	76.9

Source: Gapminder, Life expectancy at birth.

In this case, the numbers are steadily increasing decade by decade, so this an upward trend.

Visualizing with charts

Let's try identifying upward and downward trends in charts, like a time series graph.

This graph from GapMinder visualizes the babies per woman in India, based on data points for each year instead of each decade:

There is a clear downward trend in this graph, and it appears to be nearly a straight line from 1968 onwards.

📉 Chart choices: The x axis goes from 1960 to 2010, and the y axis goes from 2.6 to 5.9. Would the trend be more or less clear with different axis choices? Experiment with the options on GapMinder to see for yourself.

This is a graph of life expectancy from GapMinder, again based on data points for each year instead of each decade:

The trend isn't as clearly upward in the first few decades, when it dips up and down, but becomes obvious in the decades since.

📉 Chart choices: The x axis goes from 1920 to 2000, and the y axis starts at 55. How do those choices affect our interpretation of the graph? Try changing the options on GapMinder to see for yourself.

Check your understanding

Google Analytics is used by many websites (including Khan Academy!) to track user behavior.

This Google Analytics chart shows the page views for our AP Statistics course from October 2017 through June 2018:

What trends are apparent in this chart?

Statistical fluctuations

Google Trends is a site that visualizes the popularity of Google search terms over time.

We can use Google Trends to research the popularity of "data science", a new field that combines statistical data analysis and computational skills.

This is their graph for "data science" from April 2014 to April 2019:

That graph shows a large amount of fluctuation over the time period (including big dips at Christmas each year). Yet, it also shows a fairly clear increase over time.

When we're dealing with fluctuating data like this, we can calculate the "trend line" and overlay it on the chart (or ask a charting application to add it for us). A trend line smoothes out the data and makes the overall trend more clear, if there is one to be found.

Here's the same graph with a trend line added:

The trend line shows a very clear upward trend, which is what we expected. It helps that we chose to visualize the data over such a long time period, since this data fluctuates seasonally throughout the year.

Whenever you're analyzing and visualizing data, consider ways to collect the data that will account for fluctuations. For time-based data, there are often fluctuations across the weekdays (due to the difference in weekdays and weekends) and fluctuations across the seasons.

Making predictions

One reason we analyze data is to come up with predictions.

Consider this data on average tuition for 4-year private universities:

School year	Tuition
2011-12	$30,210
2012-13	$30,970
2013-14	$31,570
2014-15	$32,140
2015-16	$33,180
2016-17	$34,100

Source: College Board: Trends in College Pricing

We can see clearly that the numbers are increasing each year from 2011 to 2016. To make a prediction, we need to understand the rate at which the numbers are increasing.

One way to do that is to calculate the percentage change year-over-year. Here's the same table with that calculation as a third column:

School year	Tuition	One year % change
2011-12	$30,210
2012-13	$30,970	2.5%
2013-14	$31,570	1.9%
2014-15	$32,140	1.8%
2015-16	$33,180	3.2%
2016-17	$34,100	2.8%

It can also help to visualize the increasing numbers in graph form:

If the rate was exactly constant (and the graph exactly linear), then we could easily predict the next value. However, in this case, the rate varies between 1.8% and 3.2%, so predicting is not as straightforward.

Let's try a few ways of making a prediction for 2017-2018:

Strategy	Predicted change	Predicted tuition
Most recent rate	2.8%	$35,054
Average last 3 rates	2.6%	$34,986.6
Average all rates	2.44%	$34,932.04

Which strategy do you think is the best? As it turns out, the actual tuition for 2017-2018 was $34,740. It increased by only 1.9%, less than any of our strategies predicted. The closest was the strategy that averaged all the rates.

Statisticians and data analysts typically use a technique called linear regression, which finds the line that best fits the data so we can make predictions based on that line. With this data, a linear regression also predicts 2.44%.

How could we make more accurate predictions? We could try to collect more data and incorporate that into our model, like considering the effect of overall economic growth on rising college tuition.

Ultimately, we need to understand that a prediction is just that, a prediction. More data and better techniques helps us to predict the future better, but nothing can guarantee a perfectly accurate prediction.

Finding correlations

Another goal of analyzing data is to compute the correlation, the statistical relationship between two sets of numbers.

A correlation can be positive, negative, or not exist at all. A scatter plot is a common way to visualize the correlation between two sets of numbers.

There's a positive correlation between temperature and ice cream sales:

There's a negative correlation between temperature and soup sales:

There's no correlation between temperature and salt sales:

Statisticans and data analysts typically express the correlation as a number between

- 1

and

1

, where

- 1

is a strong negative correlation,

1

is a strong positive correlation, and

0

is no correlation. You can learn more about correlation coefficients on Khan Academy.

A variation on the scatter plot is a bubble plot, where the dots are sized based on a third dimension of the data.

Here's a bubble plot from GapMinder that compares income to life expectancy, with each dot representing a country and its population:

📉 Chart choices: The dots are colored based on the continent, with green representing the Americas, yellow representing Europe, blue representing Africa, and red representing Asia. The y axis goes from 19 to 86, and the x axis goes from 400 to 96,000, using a logarithmic scale that doubles at each tick. A logarithmic scale is a common choice when a dimension of the data changes so extremely.

As countries move up on the income axis, they generally move up on the life expectancy axis as well. There's a positive correlation between income and life expectancy.

Here's another bubble plot from GapMinder, this time comparing CO2 emissions to life expectancy:

📉 Chart choices: This time, the x axis goes from 0.0 to 250, using a logarithmic scale that goes up by a factor of 10 at each tick.

We once again see a positive correlation: as CO2 emissions increase, life expectancy increases.

Wait a second, does this mean that we should earn more money and emit more carbon dioxide in order to guarantee a long life? No, not necessarily.

Correlation does not imply causation. A correlation tells us that there is some sort of association between two sets of numbers, but it does not tell us why there's an association.

In this case, the correlation is likely due to a hidden cause that's driving both sets of numbers, like overall standard of living.

In other cases, a correlation might be just a big coincidence. There are plenty of fun examples online of spurious correlations.

Finding a correlation is just a first step in understanding data. It can't tell you the cause, but it can point you in the direction of possible causes and experiments to learn more.

Check your understanding

Our World In Data is a non-profit website that collects and visualizes data about world trends.

Their research on Working Hours includes this chart that compares productivity (GDP per hour worked) to the average number of hours worked per person.

What best describes the relationship between productivity and work hours?

(Choice A)
There is a positive correlation between productivity and the average hours worked.
(Choice B)
There is a negative correlation between productivity and the average hours worked.
(Choice C)
There is no correlation between productivity and the average hours worked.

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?

Sort by:

KathyAguiriano
Posted 2 years ago. Direct link to KathyAguiriano's post “hijkjiewjtijijdiqjsnasm”
hijkjiewjtijijdiqjsnasm
Button navigates to signup pageComment on KathyAguiriano's post “hijkjiewjtijijdiqjsnasm”
(46 votes)
Answer
- asisrm12
  Posted a year ago. Direct link to asisrm12's post “the answer for this would...”
  the answer for this would be msansjqidjijitjweijkjih
  Comment on asisrm12's post “the answer for this would...”
  (35 votes)
Ethan Holstrom
Posted 2 years ago. Direct link to Ethan Holstrom's post “how to tell how much mone...”
how to tell how much money a car is?
Button navigates to signup pageComment on Ethan Holstrom's post “how to tell how much mone...”
(0 votes)
Answer
- Jme
  Posted a year ago. Direct link to Jme's post “Look it up?”
  Look it up?
  Comment on Jme's post “Look it up?”
  (30 votes)