- IB
- Question Type 4: Calculating correlation coefficients with and without outliers
For the dataset , which includes an outlier:
(a) calculate the Pearson product-moment correlation coefficient, ;
(b) calculate the Spearman's rank correlation coefficient, ;
(c) compare the two values and comment on the effect of the outlier .
[6]Calculate the Pearson correlation coefficient for the following dataset, which includes an extreme outlier:
[5]
Calculate the percentage change in the Pearson correlation coefficient when the outlier is removed, using and .
[3]For the modified dataset where the outlier’s -value is reduced to 20, i.e.
calculate the Pearson correlation coefficient .
[4]For the dataset without the outlier , calculate both the Pearson product-moment correlation coefficient and the Spearman's rank correlation coefficient . Compare and comment on the two values.
[4]Explain why the Spearman rank correlation coefficient is less sensitive to an extreme outlier than the Pearson correlation coefficient, using mathematical reasoning based on ranks vs raw values.
[4]Calculate the Pearson correlation coefficient for the dataset after removing the outlier: Give your answer to three significant figures.
[4]Calculate the Spearman rank correlation coefficient for the dataset after removing the outlier:
[4]
A researcher calculates the Pearson correlation coefficient, , for three datasets based on the same nine base observations, but with different treatments of a tenth point .
The calculated values of for each scenario are: (a) including the point , (b) including the point , (c) excluding the tenth point entirely,
Compare the Pearson correlation coefficients for the three datasets and comment on how each treatment of the outlier affects the value of .
[4]This question assesses the calculation of Spearman's rank correlation coefficient and the understanding of its robustness against outliers compared to the Pearson product-moment correlation coefficient.
A student collects bivariate data to investigate the relationship between two variables, and . The following dataset is obtained:
Calculate the Spearman's rank correlation coefficient, , for this dataset.
[4]Comment on the effect of the outlier on the value of and explain why might be a more appropriate measure of correlation for this dataset than the Pearson product-moment correlation coefficient, .
[2]