The Poisson Distribution
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, assuming these events occur with a known constant mean rate and independently of the time since the last event.
NoteNamed after French mathematician Siméon Denis Poisson, this distribution plays a crucial role in probability theory and statistics, particularly in modeling rare events.
Mathematical Definition
The Poisson distribution is defined by its probability mass function:
$$P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}$$
Where:
- $X$ is the random variable representing the number of events
- $k$ is the number of occurrences $(k = 0, 1, 2, ...)$
- $e$ is Euler's number (approximately 2.71828)
- $\lambda$ is the expected number of occurrences that occur during the given interval
Mean and Variance
One of the remarkable properties of the Poisson distribution is that its mean and variance are equal:
$$\text{Mean} = \text{Variance} = \lambda$$
HintThis property, known as equidispersion, is unique to the Poisson distribution and can be useful in identifying whether a dataset follows a Poisson distribution.
NoteThe equality of mean and variance in the Poisson distribution is a key characteristic that distinguishes it from other discrete distributions like the binomial distribution.
Conditions for Poisson Distribution
For a situation to be appropriately modeled by a Poisson distribution, two key conditions must be met:
- Independence of Events:
- Each event must occur independently of all other events.
- This means that the occurrence of one event does not affect the probability of another event occurring.
- Uniform Average Rate:
- Events must occur at a constant average rate within the period of interest.
- This rate should not change over time or space within the interval being considered.
Consider the number of customers arriving at a small coffee shop between 2 PM and 3 PM on weekdays. If, on average, 20 customers arrive during this hour, and their arrivals are independent of each other and occur at a roughly constant rate, this scenario could be modeled using a Poisson distribution with $\lambda = 20$.
Non-Overlapping Intervals in Poisson Distribution
An essential condition for the Poisson distribution is that events occurring in non-overlapping intervals are independent. This means that the number of events in one interval does not affect the number of events in another interval.
Example- If we model the arrival of buses at a station using a Poisson distribution, knowing that two buses arrived in the first hour should not change the probability of how many arrive in the next hour.
- Each time period is considered separately, and events occurring in one do not influence another.
Sum of Independent Poisson Distributions
An important property of the Poisson distribution is that the sum of two independent Poisson distributions is also a Poisson distribution.
Mathematically, if $X \sim \text{Poisson}(\lambda_1)$ and $Y \sim \text{Poisson}(\lambda_2)$ are independent, then:
$$X + Y \sim \text{Poisson}(\lambda_1 + \lambda_2)$$
NoteThis property can be extremely useful in modeling complex systems that are composed of multiple independent Poisson processes.
ExampleIf a hospital's emergency room receives patients from two independent sources - local accidents (average 5 per hour) and general illnesses (average 8 per hour) - the total number of patients arriving per hour would follow a Poisson distribution with $\lambda = 5 + 8 = 13$.
Selecting Between Distributions
When faced with a real-world scenario, students should be able to determine whether a Poisson, normal, or binomial distribution is most appropriate. Here are some guidelines:
- Poisson Distribution: Use when dealing with the number of occurrences of rare events in a fixed interval of time or space, where events are independent and occur at a constant average rate.
- Normal Distribution: Appropriate for continuous data that is symmetrically distributed around a mean, often resulting from the sum of many small, independent effects.
- Binomial Distribution: Suitable for modeling the number of successes in a fixed number of independent trials, each with the same probability of success.
Students often confuse the Poisson and binomial distributions. Remember, the Poisson distribution deals with the number of occurrences in a fixed interval, while the binomial distribution concerns the number of successes in a fixed number of trials.
TipWhen working with the Poisson distribution, remember that it's discrete despite often being used to model continuous time intervals. The probability is always associated with a specific number of occurrences, not a range.
ExampleIn a large publishing house, the average number of typos per page in their books is 0.1. To find the probability of having exactly 2 typos on a randomly selected page, we can use the Poisson distribution with $\lambda = 0.1$:
$$P(X = 2) = \frac{e^{-0.1}(0.1)^2}{2!} \approx 0.00453$$
This means there's about a 0.453% chance of finding exactly 2 typos on a randomly selected page.
Cumulative Distribution Function (CDF) for Poisson Distribution
- While the probability mass function (PMF) provides the probability of a specific number of events occurring, the Cumulative Distribution Function (CDF) gives the probability that at most a certain number of events will occur.
It is defined as:
$$
P(X \leq k)=\sum_{i=0}^k \frac{e^{-\lambda} \lambda^i}{i!}
$$
If the number of calls received by a call center follows Poisson(4), what is the probability that they receive at most 3 calls in a given hour? Using the CDF, we sum up the probabilities for X = 0, 1, 2, and 3 to find the answer.
Link Between Poisson and Exponential Distributions
- The Exponential Distribution is closely related to the Poisson distribution.
- While the Poisson distribution describes the number of events occurring in a fixed time interval, the Exponential Distribution describes the time between events in a Poisson process.
If events occur at an average rate $\lambda$ per unit time, then the time $T$ between two consecutive events follows an exponential distribution:
$$
P(T \leq t)=1-e^{-\lambda t}
$$
If buses arrive at a station following Poisson(5) per hour, the Exponential(5) distribution tells us the probability that the next bus will arrive within a certain time.
Limitations and Considerations
While the Poisson distribution is powerful, it's important to recognize its limitations:
- It assumes events occur independently, which may not always be true in real-world scenarios.
- The assumption of a constant rate may break down over long periods or in changing environments.
- For large values of $\lambda$, the Poisson distribution can be approximated by a normal distribution, which might be easier to work with in some cases.
The IB syllabus does not require formal proofs of means and variances for probability distributions, including the Poisson distribution. Focus on understanding the concepts and their applications rather than deriving these properties mathematically.