Correlation Coefficient Summarizes Linear Relationship
- When you collect bivariate data (paired values $(x,y)$), a scatter plot can suggest whether the variables are related.
- A more precise way to describe this relationship is the correlation coefficient, usually written as $r$.
- It measures how closely the points lie to a straight line.
Bivariate data
Data consisting of pairs of values $(x, y)$ where two quantitative variables are measured for each case.
Correlation coefficient (Pearson’s r)
A number between $-1$ and $1$ that measures the strength and direction of a linear relationship between two variables.
Interpreting The Value Of r: Direction And Strength
Two features are encoded in $r$:
- Direction (sign): whether $y$ tends to increase or decrease as $x$ increases.
- Strength (magnitude): how closely the points follow a straight line.
Always: $$-1\le r\le 1$$
Direction: The Sign Of r
- $r>0$ means a positive correlation: larger $x$ values tend to come with larger $y$ values.
- $r<0$ means a negative correlation: larger $x$ values tend to come with smaller $y$ values.
- $r=0$ means no linear correlation (but other relationships may still exist).
Strength: The Size Of |r|
- $|r|\approx 1$: points lie very close to a straight line.
- $|r|\approx 0$: points form a loose cloud with little linear pattern.
- "Strong" correlation does not mean "steep" slope.
- A steep line can have $r\approx 1$ or $r\approx 0$ depending on how tightly points cluster around the line.
- A value of $r=0$ does not mean "no relationship."
- It means "no linear relationship."
- For example, data on a U-shaped curve can have $r\approx 0$ even though $x$ clearly influences $y$.
Scatter Diagrams Show What r Cannot
- A single number cannot show everything.
- A scatter diagram lets you check whether it is reasonable to summarize the relationship with a correlation coefficient.
- You should look for:
- Linearity: do points follow a roughly straight trend?
- Outliers: are there points far from the pattern?
- Clusters: are there separate groups (which can distort $r$)?
- Before quoting $r$, always sketch or generate a scatter plot.
- Correlation is a summary of the plot, not a replacement for it.
Outliers Can Change $r$ Dramatically
Because $r$ uses deviations from the mean, a single unusual point can pull the correlation strongly up or down.
If most points form a weak cloud, but one extreme point lies far to the right and high up, it can create a moderate positive $r$ even though the main cluster shows little relationship.
Attitudes To News Media
- Ten students were scored for their attitude to social media as a news source ($x$) and television as a news source ($y$).
- Higher scores mean higher satisfaction.
| Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Social media $x$ | 5 | 0 | 3 | 1 | 2 | 2 | 5 | 3 | 5 | 4 |
| Television $y$ | 1 | 2 | 1 | 3 | 3 | 4 | 3 | 1 | 0 | 2 |
A scatter plot of these data is shown below.
For this data set, the correlation coefficient is approximately $r\approx -0.19$.
This suggests a very weak negative linear relationship: students who score slightly higher for social media tend (very slightly) to score lower for television, but the points are quite scattered.
- When you "comment on your findings," include both direction and strength, and refer back to the scatter plot.
- A good sentence structure is: "There is a [weak/moderate/strong] [positive/negative] linear correlation because the points are [tightly clustered/quite scattered] around a line that slopes [up/down]."
Interpretation Should Match The Context
Even if you can compute $r$ perfectly, the meaning still depends on the situation:
- Is a linear relationship reasonable?
- Are the data reliable and measured consistently?
- Does it make sense to generalize beyond the sample?
- Never write "$x$ causes $y$" based only on a correlation coefficient. Y
- ou need a study design or additional evidence for causation.
If the scatter plot looks curved, consider transforming variables or using a different model instead of relying on $r$.
- If $r=0.82$, what can you say about direction and strength?
- If $r=-0.05$, is it fair to say there is "no relationship"? Why or why not?
- Give one reason why a scatter plot is essential even when you know $r$.