If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Computing basic statistics

Once we have data stored in a text file, spreadsheet, or database, we can compute statistics describing the data set.
There are many tools we can use for data analysis, depending on our needs and skills. We'll step through our analysis here in two of the most popular tools, spreadsheets and SQL, so that you can choose the one that works best for you. Our goal is to gain insights about the data, so any tool that can yield insights is equally helpful.

Using statistical functions

First things first: we need data. We always hear that fast food is bad for us, so let's analyze some fast food data and find out for ourselves. 🍔
I've loaded nutritional information about fast food menu items into a Google Spreadsheet and SQL database.
Now let's try to answer some questions about the data...

How much data is there?

The simplest statistic about a data set is the number of rows. That's the first thing we should find out, since it helps us understand how comprehensive our data set is (or is not), and make better sense of the results we draw from it.
Check your understanding
How many rows are there of fast foods data?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

What are the averages for the numeric columns?

To understand the range of values for a column, we can compute statistical metrics like the average, as well as more sophisticated metrics like median, mode, and standard deviation.
Check your understanding
What's the average number of calories (rounded)?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

What are the largest and smallest values for a column?

Another way of understanding numeric data is to compute the minimum and maximum.
Check your understanding
What's the smallest serving size?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

What's the highest amount of sodium?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

What is the total value of a column?

It can be useful to sum up the values in a column. For example, many companies track metrics that relate to their ability to succeed financially, like purchases or page views, and totaling those metrics helps them see how well they're doing.
Check your understanding
If you ate every menu item on the list, how many calories would that be?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

Filtering data

When we want to compute statistics on a subset of a data set, we filter the data. It might be that we don't care much about burgers, but we really want to know all about the milkshakes.
The simplest filter is to look at only the rows where a column is exactly equal to a particular value. For example, we can filter down the fast food data set to only the rows where the "type" is "Milkshake".
  • In a spreadsheet, use the IF functions, like COUNTIF to count the number of rows that are equal to a value. Related functions are AVERAGEIF, SUMIF, MINIFS, MAXIFS.
  • In SQL, use the same functions as before, but add a WHERE clause with a condition.
Check your understanding
How many milkshakes are in the data set?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

Filters can get much more sophisticated. A filter can use conditions based on whether a column is less than or greater to a value, like calories > 500. A filter can also combine conditions on multiple columns, like calories > 500 AND serving_size < 200. It all depends on how you want to slice and dice the data.

Summarizing by group

Our earlier strategies all compute a single statistic, either of the entire data set or a subset. Sometimes we want to view a summary of statistics according to some grouping of the data, like the total number of items for each restaurant or the average calories for each food type.
Screenshot of Google Spreadsheets pivot table, with this data:
typeAVERAGE of calories
Breaded Chicken Sandwich522
Burger620
Chicken Nuggets275
French Fries314
Grilled Chicken Sandwich408
Milkshake607
  • In a spreadsheet, create a pivot table that groups by a particular column and shows the desired statistic for that column.
  • In SQL, use GROUP BY on the column.
Now that we know how to make a table summarizing statistics by group, we can answer multiple questions at once.
Check your understanding
What's the highest calorie count of the menu items from White Castle?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

Which restaurant offers the menu item with the highest calorie count?


🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?