The values in a data set can be very close together (close to the mean) or very spread out. This is called the spread or dispersion of the data. There are a few measures of dispersion that quantify the spread within a data set. Range is the difference between the largest and smallest data points in a set:
R = largest data point – smallest data point
Notice that the range depends on only two data points, the two extremes. For a large data set, relying on only two data points is not an exact tool. For example, the largest or smallest values may be outliers.
To better understand the data set, calculate quartiles, which divide data sets into four equally sized groups. To calculate quartiles:
- Arrange the data in ascending order.
- Find the median of the entire set of data (also called quartile 2 or Q2).
- Split the data set into two halves, at Q2.
- Find the median of the lower half of the data, called quartile 1 (Q1).
- Find the median of the upper half of the data, called quartile 3 (Q3).
The interquartile range (IQR) provides a more reliable range that is not as affected by extremes. IQR is the difference between the third quartile data point and the first quartile data point:
IQR = Q3 – Q1
Standard deviation (σ) is the average distance of each data point from the mean of the data. The formula is: