Presentation of Data: Frequency Distributions
Frequency distributions provide a structured way to present both discrete and continuous data in tabular form.
Discrete Data
For discrete data, a frequency distribution table typically includes:
- Possible values or categories of the variable
- Frequency (count) of each value or category
- Relative frequency (proportion of total)
Continuous Data
For continuous data, we typically use class intervals:
- Class intervals (ranges of values)
- Frequency of observations in each interval
- Relative frequency of each interval
In the IB Math AA SL course, class intervals are given as inequalities without gaps. For example, the first interval in the height example would be written as $100 \leq h < 110$.
Histograms
Histograms are graphical representations of frequency distributions for continuous data. They consist of adjacent rectangles with areas proportional to the frequencies of the class intervals they represent.
Frequency Histograms with Equal Class Intervals
In a frequency histogram:
- The x-axis represents the variable's values (class intervals)
- The y-axis represents the frequency
- Each bar's height corresponds to the frequency of its class interval
- Bars are adjacent, with no gaps between them
Cumulative Frequency
Cumulative frequency (CF) represents the running total of frequencies up to each class interval. It's particularly useful for finding median, quartiles, and percentiles.
Cumulative Frequency Graphs
A cumulative frequency graph, also known as an ogive, is created by:
- Calculating the cumulative frequencies
- Plotting these against the upper boundaries of each class interval
- Connecting the points with a smooth curve
Using CF Graphs for Statistical Measures
CF graphs are powerful tools for finding various statistical measures:
- Median: The value corresponding to half the total frequency
- Quartiles:
- Percentiles: Any desired percentile can be found
- Range: Difference between the maximum and minimum values
- Interquartile Range (IQR): Difference between Q3 and Q1
Box and Whisker Diagrams
Box and whisker plots, also known as box plots, provide a visual summary of the distribution of a dataset. They display the five-number summary: minimum, Q1, median, Q3, and maximum.
Components of a Box Plot
- Box: Represents the interquartile range (IQR)
- Line inside the box: Median
- Whiskers: Extend to the minimum and maximum values (excluding outliers)
- Outliers: Indicated with crosses beyond the whiskers
Creating a Box Plot
- Calculate the five-number summary
- Draw a box from Q1 to Q3
- Draw a line inside the box at the median
- Extend whiskers to the minimum and maximum (within 1.5 * IQR from the edges of the box)
- Plot outliers individually beyond the whiskers
Comparing Distributions
Box plots are excellent for comparing two or more distributions side by side. Key aspects to consider:
- Symmetry: If the median is centered in the box and whiskers are roughly equal, the distribution is likely symmetric
- Spread: Compare the IQRs and overall ranges
- Central tendency: Compare the medians
- Outliers: Note any outliers and their positions
Normal Distribution Indication
Box plots can give a hint about whether data might be normally distributed:
- Symmetric box and whiskers
- Median in the center of the box
- Whiskers of approximately equal length
- Few or no outliers