Lesson 10: Shape of data distributions

# Clusters, gaps, peaks & outliers

This lesson explores the features of distributions in data sets, like clusters, gaps, and peaks. We learn how to identify outliers, which are data points far from the rest. We also discuss how to spot peaks and clusters in data.

• What is an outlier?
What is a range?
What is an interquartile range?
What is mean?
What is median?
What is mode?
What is a lower quartile?
What is an upper quartile?

I have them all mixed and am so confused.
• Outlier - a data value that is way different from the other data.
Range - the Highest number minus the lowest number
Interquarticel range - Q3 minus Q1
Mean- the average of the data (add up all the numbers then divide it by the total number of values that you originally added)
Median - the number in the middle of the data. If the numbers are all in order, whichever number is in the middle
Mode - whichever number there is the most of
Lower Quartile - Q1 - the middle of the bottom half of the data, if you take the median, it's the middle of the data on the right of the median(it's basically the number at the 1st quarter.
Upper Quartile - Q3 - the middle of the data above the median, the value at the 3rd quarter of the data.
• What is the exact meaning of an outlier?
• 1) A data point that is distinctly separate from the rest of the data.
2) Any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile.

• Can you please explain peak?
• a peak, like he said in the video is the hight of the numbers. or the highest point.
• What is cluster? explain please.
• It is data is is clustered like 2 or 3 groups together like if it was 4 - 9 and 6-8 had 3 dots then the cluster would be 6-9
• Whats a outlier
• An outlier is a piece of data that is far away from other data.
• In statistics this is a measure of the variation of the data. For example, the range (difference between maximum and minimum values), the mean absolute deviation (average distance between each point and the median), and interquartile range (distance between the lower and upper quartiles).
• outlier is a small set of data separated from all the big clusters? Right?
• It's usually only one data point (I think)
• a few questions about outlier

1.lets say there is are two clusters on the graph with a huge gap in between
would data in one cluster be considered as an outlier wrt another cluster ? or does this not have any outlier at all

2.lets say that this time there is a cluster on one side of the graph . but after the cluster the data points are just low but no gaps . but after a while for one value there is an abnormally high no.of data points
is this consider an outlier ?
(shown below)
.________.____
..._______.____
..._______.____
.......................__

. -> data point
-> space/blank (the comment isnt taking more than one space when given , thats why i used underscore)
• I think that you would not consider it an outlier, since it is a significant part of your data.
• I don't understand what you mean in this video about clusters. Can you be more specific?
• A cluster is a large group of data points close to one another.
Notice the first group of dots below are very close together, and the second group of dots aren't? The first group of dots is a cluster.
::.::. .. : .