- Not all linear modelling begins with an equation.
- Often you start with paired data (two variables measured together), plotted on a scatter diagram.
Scatter diagram
A graph of paired (bivariate) data points plotted on an $x$-$y$ plane to investigate the relationship between two variables.
When describing a scatter diagram, you should comment on:
- Form: linear or non-linear
- Direction: positive or negative
- Strength: strong, moderate, or weak association
These descriptions must be interpreted in the context of the variables (for example, "as study time increases, test score tends to increase").
- Correlation is not causation.
- A scatter diagram can show association, but it does not prove that one variable causes the other.
A Line Of Best Fit Summarizes A Linear Trend
When there is reasonable correlation, you can draw a line of best fit (also called a trend line) to represent the overall linear relationship.
Line of best fit
A straight line drawn through the middle of a scatter plot so that points are (roughly) evenly distributed above and below it, used to model and predict relationships.
Key qualitative features when drawing a line of best fit by eye:
- there should be enough data points to support a relationship
- the line should pass through the point $(\bar{x},\bar{y})$, where $\bar{x}$ is the mean of the $x$-values and $\bar{y}$ is the mean of the $y$-values
- points should be roughly equally distributed on either side of the line
- A good "by eye" line usually does not connect the first and last point.
- Instead, it should represent the middle of the cloud of points and reflect the overall trend.
Using A Line Of Best Fit For Prediction
Once you have a line of best fit, you can make predictions:
- interpolation: predict within the range of the data (generally more reliable)
- extrapolation: predict beyond the data range (generally less reliable)
Extrapolation can be very misleading if the relationship changes outside the observed range, or if the real situation has limits (for example, maximum speed, maximum capacity, costs that cannot go below zero).
Building A Linear Model From Data
When a linear trend is plausible, the model often takes the form $y=mx+c$. In practice you might:
- plot the scatter diagram
- decide whether a linear model is appropriate (form, direction, strength)
- draw a line of best fit and select two clear points on the line (not necessarily data points)
- calculate $m$ using $m=\dfrac{\Delta y}{\Delta x}$
- substitute one point into $y=mx+c$ to find $c$
- interpret $m$ and $c$ in context
- use the model carefully for prediction
- Suppose a line of best fit passes through $(2, 7)$ and $(8, 19)$.
- Gradient: $$m=\frac{19-7}{8-2}=\frac{12}{6}=2$$
- Use $(2,7)$ to find $c$: $$7=2(2)+c \Rightarrow c=3$$
- Model: $y=2x+3$.
- Interpretation: $y$ increases by 2 units for every 1 unit increase in $x$, and when $x=0$, $y$ is about 3.
Validating And Interpreting A Model Matters As Much As Solving It
In modelling you should always ask:
- Does a linear model make sense for this situation (constant rate of change)?
- Are there outliers, and do they have an explanation?
- Is prediction being made within the data range?
- Are the units and variable meanings consistent?
- Does the mathematical solution match what is possible in real life?
- In $y=-3x+12$, what do $-3$ and $12$ mean on a graph?
- Two lines have the same gradient but different intercepts. How many solutions does the system have?
- A scatter plot shows a weak relationship. Should you rely on a line of best fit for prediction? Explain briefly.
Modelling Connects Representation, Logic, And Decision-Making
- Linear modelling combines representation (graphs, equations, tables), logic (equivalence transformations that preserve solutions), and interpretation (explaining what results mean).
- A well-chosen linear model can support better decisions, but only if it is checked against reality and used within its limitations.
Consider the following case:
- A company uses a linear model to predict monthly electricity cost from production hours.
- The model works well during normal operation, but fails during peak seasons when overtime rates and extra cooling systems increase costs faster than expected.
- This highlights a common modelling lesson: a relationship can be approximately linear only within certain conditions.