Displaying and describing data
Contents
One of the first things we do in statistics is try to understand sets of data. This involves plotting the data in different ways and summarizing what we see with measures of center (like mean and median) and measures of spread (like range and standard deviation). This topic focuses on concepts that are often referred to as "descriptive statistics".
See how you score on these 20 practice questions
Statistics gives us the tools to effectively answer questions using data. So the first step in actually doing statistics is to form a question! This tutorial looks at what type of questions we deal with in statistics.
Categorical data comes from classifying information into categories. To understand categorical data, we often display it in bar graphs and pie graphs.
Two-way tables show how individuals are classified in terms of two categorical variables. We use two-way tables to see if there is an association between two variables.
Dot plots and frequency tables both display how often different values occur in a set of numerical data.
Histograms display how many numerical data points there are in each "bucket" of values.
Some features we look at in a distribution are its overall shape, unusual points called outliers, clusters of data, and gaps where there is no data at all.
Stem-and-leaf plots are another way to plot numerical data. They combine the best features of dot plots and histograms. Turn your head sideways and look at a stem-and-leaf plot and you'll see what we mean.
Line graphs show change over time in a numerical variable.
Mean and median are the most commonly used tools to measure the center of a distribution. They give us an idea of the typical value in a set of numerical data.
Once you know the basics of how to calculate the mean and median, you can start to think deeper about more advanced concepts. This tutorial covers how to find the mean and median from a data display, how new data points impact the mean and median, and how to find missing values given a mean.
The most basic ways to measure spread in a distribution of numerical data are with range, interquartile range, and mean absolute deviation.
Box and whisker plots show the median, quartiles, and more information about a set of numerical data.
Standard deviation tells us how far away from the mean each data point typically is in a set of numbers. How we calculate standard deviation depends on whether the data represents an entire population or came from a sample. This tutorial focuses on the data that represents an entire population.
Standard deviation tells us how far away from the mean each data point typically is in a set of numbers. How we calculate standard deviation depends on whether the data represents an entire population or came from a sample. This tutorial focuses on how we deal with standard deviation coming from sample data.