A measure of the spread of a data set, indicating how much the data values deviate from the mean.
The standard deviationis a measure of spreadthat is in the same unitsas the data.
Calculating the Standard Deviation
The standard deviation of a data set is calculated as follows:
- Find the mean of the data set.
- Subtract the mean from each data value to find the deviation of each value from the mean.
- Square each deviation.
- Find the mean of the squared deviations (this is called the variance).
- Take the square root of the variance.
The varianceis the meanof the squared deviations. It is a measure of spreadthat is in squared units.
Standard Deviation of a Population
The standard deviation of a population is calculated using the following formula:
$$\sigma = \sqrt{\frac{\sum (x_i - \mu)^2}{N}}$$
Where:
- \$\sigma\$ is the standard deviation of the population.
- \$x_i\$ is each data value.
- \$\mu\$ is the mean of the population.
- \$N\$ is the number of data values in the population.
Standard Deviation of a Sample
The standard deviation of a sample is calculated using the following formula:
$$s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n - 1}}$$
Where:
- \$s\$ is the standard deviation of the sample.
- \$x_i\$ is each data value.
- \$\bar{x}\$ is the mean of the sample.
- \$n\$ is the number of data values in the sample.
The denominatorin the formula for the standard deviationof a sampleis \$n - 1\$ instead of \$n\$ to account for the fact that the sampleis only an estimateof the population. This is called Bessel's correction.
The standard deviationis notthe same as the mean absolute deviation. The mean absolute deviationis the meanof the absolute valuesof the deviations, while the standard deviationis the square rootof the meanof the squared deviations.
Standard Deviation for Frequency Distributions
When the data is presented in a frequency distribution, the standard deviation can be calculated using the following formula:
$$\sigma = \sqrt{\frac{\sum f_i (x_i - \mu)^2}{N}}$$
Where:
- \$f_i\$ is the frequency of the data value \$x_i\$.
- \$N\$ is the total frequency.
| Number | Frequency |
|---|---|
| 42 | 7 |
| 53 | 9 |
| 61 | 12 |
| 74 | 8 |
| 86 | 4 |
| 127 | 10 |
| 150 | 5 |
When calculating the standard deviationfor a frequency distribution, it is helpful to create a table with columns for the data values, frequencies, deviations, squared deviations, and squared deviations multiplied by frequencies. This makes it easier to keep track of the calculations.
Interpreting the Standard Deviation
The standard deviation provides information about the spread of the data:
- A small standard deviation indicates that the data values are close to the mean.
- A large standard deviation indicates that the data values are spread out from the mean.
The standard deviationis sensitiveto outliers. A single outliercan significantly increasethe standard deviation.
| Number | Frequency |
|---|---|
| 10 | 3 |
| 15 | 5 |
| 20 | 2 |
| 25 | 4 |
| 30 | 1 |
How does the choice of measure of spread(e.g., range, interquartile range, standard deviation) affect the interpretationof the data? Are there situations where one measure is more appropriatethan the others?