Why Histograms Matter For Quantitative Data
- Histograms are one of the most useful ways to see the shape of a data set at a glance.
- They are designed for quantitative data (numerical values) and are especially important when the data are continuous (measured values like mass, height, time, and temperature), where values can take any number within an interval.
- In MYP Standard Mathematics, histograms support three core skills:
- Constructing bar charts and histograms correctly
- Interpreting frequency and relative frequency histograms
- Visualizing key characteristics of a distribution (where the data cluster, how spread out they are, and whether the data are skewed)
Histogram
A graph for quantitative data where values are grouped into class intervals on the horizontal axis, the bars touch, and the bar area represents frequency (or relative frequency).
- The word histogram comes from Greek roots meaning something like "upright stands" and "record".
- The modern statistical idea is that the graph records how data are distributed across intervals.
Bar Charts And Histograms Look Similar, But Represent Different Data
- Both displays use rectangular bars, but they answer different questions.
- Confusing them is one of the most common causes of incorrect graphs and incorrect interpretation.
Bar Charts Summarize Categories
- A bar chart represents qualitative (categorical) data, such as type of fruit, eye color, or favorite sport.
- Key features:
- Bars have equal width.
- There are spaces between bars because categories are separate.
- The height of the bar represents the frequency.
- The vertical axis is typically labeled frequency.
Bar chart
A chart that displays frequencies for qualitative categories using separated, equal-width bars where bar height represents frequency.
Histograms Summarize Numerical Values
- A histogram represents quantitative data, including all continuous data and sometimes discrete numerical data (when it is appropriate to group the values).
- Key features:
- Each bar represents a class interval (a numerical range).
- Bars touch because the intervals connect with no gaps.
- The area of each bar represents the frequency.
- The vertical axis may show frequency, relative frequency, or frequency density.
- A common mistake is to read frequency directly from bar height even when class widths are not equal.
- With unequal class widths, it is the area that represents frequency, not the height.
Class Intervals And Boundary Notation Prevent Overlap
Histograms require you to group data into class intervals, so it is essential to read the interval notation accurately.
Class interval
A range of values (such as $20.0 < x \le 20.5$) used to group quantitative data in a frequency table.
In an interval like $19.5 < x \le 20.0$:
- values are greater than 19.5
- and up to and including 20.0
This style ensures every value belongs to exactly one class (for example, $20.0$ is included in the second class $20.0 < x \le 20.5$ only if the first class was written $19.5 < x \le 20.0$ and the second is $20.0 < x \le 20.5$, so the "included end" must be used consistently).
- When designing class intervals, choose boundaries so there are no gaps and no overlaps.
- Then check: any single measurement should fit in exactly one class.
How To Construct A Histogram From A Frequency Table
A good histogram is built from a grouped frequency table. The key decision is what to put on the vertical axis.
Step 1: Put Class Boundaries On The Horizontal Axis
The horizontal axis shows class boundaries (the endpoints of the intervals), not the class midpoints.
$$19.5, 20.0, 20.5, 21.0, \dots$$
Step 2: Choose The Correct Vertical Axis Quantity
There are three common choices.
Case A: Equal Class Widths (Frequency Works)
- If every class has the same width, you can use frequency on the vertical axis.
- In that situation, bar area is proportional to bar height, so reading heights is safe.
Case B: Unequal Class Widths (Use Frequency Density)
- If class widths differ, use frequency density: $$\text{frequency density} = \frac{\text{frequency}}{\text{class width}}.$$
- This ensures: $$\text{area of bar} = (\text{class width})\times(\text{frequency density})=\text{frequency}.$$
Frequency density
Frequency divided by class width, used on the vertical axis of a histogram when class widths are unequal.
Case C: Relative Frequency Histograms
- Sometimes you want to compare data sets of different sizes.
- Then you may plot relative frequency (a proportion or percentage).
- If the total number of values is $N$ and a class has frequency $f$: $$\text{relative frequency} = \frac{f}{N}.$$
- A relative frequency histogram has the same overall shape as a frequency histogram, but the vertical scale is in proportions (or percentages).
- This makes comparisons between groups fairer.
Chick Masses (Continuous Data)
The grouped table below shows the masses $x$ (grams) of 50 baby chicks hatched in one week.
| Mass interval $x$ (g) | Frequency |
|---|---|
| $19.5<x\le20.0$ | 3 |
| $20.0<x\le20.5$ | 5 |
| $20.5<x\le21.0$ | 1 |
| $21.0<x\le21.5$ | 6 |
| $21.5<x\le22.0$ | 2 |
| $22.0<x\le22.5$ | 6 |
| $22.5<x\le23.0$ | 4 |
| $23.0<x\le23.5$ | 6 |
| $23.5<x\le24.0$ | 5 |
| $24.0<x\le24.5$ | 5 |
| $24.5<x\le25.0$ | 6 |
| $25.0<x\le25.5$ | 1 |
All class widths are $0.5$ g, so a standard frequency histogram is appropriate.
Constructing The Histogram
- Draw axes: horizontal axis labeled Mass (g), vertical axis labeled Frequency.
- Mark the class boundaries from 19.5 to 25.5 in steps of 0.5.
- For each interval, draw a bar of width 0.5 and height equal to the frequency.
- Ensure there are no gaps between bars.
What You Can Conclude From The Table (Before Drawing)
- Several intervals have frequency 6 (for example $21.0<x\le21.5$, $22.0<x\le22.5$, $23.0<x\le23.5$, $24.5<x\le25.0$), while the lightest and heaviest intervals have frequency 1.
- So the histogram would show most chicks clustered in the low-to-mid 20s grams, with relatively few extreme values.
How To Analyze A Distribution From A Histogram
A histogram makes it easier to describe a data distribution clearly. In words, you should usually comment on the following.
- Center and typical range
- Identify where most of the bar area is. This gives a sense of "typical" values.
- Spread (variability)
- Describe how wide the data range is (approximately the smallest to the largest class boundary with data).
- Skewness
- Right-skewed (positive skew): a longer tail to the right, a few unusually large values.
- Left-skewed (negative skew): a longer tail to the left.
- Modes and the modal class
- The modal class is the class interval with the greatest frequency (or greatest frequency density if class widths are unequal).
- Unusual features
- Look for gaps (empty intervals), unexpectedly small bars in the middle, or isolated bars far from the rest (possible outliers).
A strong "describe this histogram" response usually follows this order:
- shape (symmetrical or skewed, unimodal or multimodal),
- where the data cluster (quote an interval),
- spread (approximate range),
- any unusual features (gaps, outliers).
When A Graph Is Neither A Bar Chart Nor A Histogram
Some graphs look similar to histograms but are not correct statistical displays.
Typical reasons:
- The horizontal axis shows categories but the graph is called a histogram.
- The bars have gaps even though the data represent continuous intervals.
- The class widths are unequal, but the vertical axis is labeled "frequency" and heights are used as if they were frequencies.
- The axis labels or boundaries are unclear, making class intervals ambiguous.
- If class widths are unequal, a "frequency histogram" with bar heights equal to frequencies is misleading.
- You must use frequency density (or redesign the classes to have equal widths).
Axis Breaks And The Zigzag Symbol
- Sometimes an axis does not start at zero, particularly when the data values are all far from zero.
- A zigzag (axis break) symbol indicates that some values on the scale are skipped.
- Axis breaks can help a graph fit neatly on a page, but they can also exaggerate apparent differences.
- Always check the scale before making conclusions.
- Histograms encourage quick statements like "most values are around…" or "the data are skewed…".
- These are useful, but they depend on choices you make when drawing the histogram, especially class width and class boundaries.
- Two schools compare student travel times using histograms.
- One uses 2-minute class intervals and sees two clusters (walkers and bus riders).
- The other uses 10-minute intervals and the distribution looks like one broad peak.
- Both are based on real data, but different grouping choices lead to different conclusions.