The Central Limit Theorem
The Central Limit Theorem (CLT) is a fundamental concept in probability theory and statistics that describes the behavior of sample means for large sample sizes.
NoteIt's a powerful tool that allows statisticians to make inferences about populations based on sample data, even when the underlying population distribution is unknown or non-normal.
Linear Combinations of Normal Random Variables
Before diving into the CLT, it's important to understand a related property of normal distributions:
NoteA linear combination of independent normal random variables is itself normally distributed.
This means that if we have $n$ independent normal random variables $X_1, X_2, ..., X_n$, and we form a new variable $Y$ as a linear combination of these:
$$Y = a_1X_1 + a_2X_2 + ... + a_nX_n$$
where $a_1, a_2, ..., a_n$ are constants, then $Y$ will also follow a normal distribution.
ExampleSuppose we have two independent normal random variables: $X_1 \sim N(\mu_1, \sigma_1^2)$ and $X_2 \sim N(\mu_2, \sigma_2^2)$
If we create a new variable $Y = 2X_1 - 3X_2$, then $Y$ will also be normally distributed with:
$\mu_Y = 2\mu_1 - 3\mu_2$ $\sigma_Y^2 = 4\sigma_1^2 + 9\sigma_2^2$
HintThis property is crucial for understanding why the sample mean, which is a linear combination of random variables, tends towards a normal distribution.
Statement of the Central Limit Theorem
The Central Limit Theorem states that:
NoteFor a sufficiently large sample size, the distribution of the sample mean approaches a normal distribution, regardless of the underlying population distribution.
More formally, if we have a population with mean $\mu$ and standard deviation $\sigma$, and we take samples of size $n$, then as $n$ becomes large:
$$\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$$
where $\bar{X}$ is the sample mean.
Sample Size Considerations
A critical question is: How large should $n$ be for the CLT to apply? The answer depends on the underlying population distribution:
- For symmetric, unimodal distributions, $n \geq 30$ is often sufficient.
- For highly skewed or multimodal distributions, larger sample sizes may be needed.
For IB exam purposes, a sample size of $n > 30$ is generally considered sufficient for applying the CLT.
Using the Z-Table with the Central Limit Theorem
- One of the key applications of the CLT is determining probabilities and making statistical inferences using the z-table (also called the standard normal table).
- The z-table helps find the probability of a sample mean falling within a certain range when the population standard deviation is known.