Thursday, July 3, 2014

How to Remember the Poisson Distribution

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 values are

> ppois(0:15,4)
 [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

The probability of exactly $k$ events occurring is given by the probability density function (PDF) or, more technically, the probability mass function since it's a discrete distribution: \begin{equation} \Pr(X = k) = \dfrac{α^k}{k!} \; e^{-α}; \quad k = 0, 1, 2, \ldots \label{eqn:ppdf} \end{equation} which is just \eqref{eqn:pcdf} without the summation because only a single event is considered. In R, the probability density is calculated using the function dpois(). Using α = 4 again, we get


> dpois(0:15,4)
 [1] 1.831564e-02 7.326256e-02 1.465251e-01 1.953668e-01 1.953668e-01 1.562935e-01 1.041956e-01
 [8] 5.954036e-02 2.977018e-02 1.323119e-02 5.292477e-03 1.924537e-03 6.415123e-04 1.973884e-04
[15] 5.639669e-05 1.503912e-05

The Poisson distribution is used to model such things as the number of clicks detected by Geiger counter (audio). It is also the most commonly assumed source of arrivals in queueing theory and computer performance analysis. In fact, it was Agner Erlang who first presented the Poisson distribution as a model of incoming telephone calls with $\alpha = \lambda t$ in 1907 for the purpose of sizing trunkline capacity at Danish Telekom. Applying probability theory to engineering problems was revolutionary at that time. Einstein did something similar to explain Brownian motion in 1905. However, for those not engaged in applying probability theory on a regular basis, the expression in \eqref{eqn:pcdf} looks formidable and hard to remember.

The trick I devised for my my classes is to remember a much simpler, but wrong, version of \eqref{eqn:pcdf} and then correct it. The corrections can be regarded as a little story that is easy to remember: you're more likely to remember a story than a formula like \eqref{eqn:pcdf}. Here's the story.

  1. Start with this simple (but incorrect) expression for the CDF \begin{equation} F(α,n) \sim e^{+α} \; \times \; e^{-α} \label{eqn:1cdf} \end{equation} Equation \eqref{eqn:1cdf} cannot have a value bigger than 1, which is what is required of a probability. The problem, however, is that since α is a constant this equation will always be equal to 1, which is not quite what we want. For example, if α = 0: \begin{equation} e^{0} \; \times \; e^{0} = 1 \times 1 = 1 \end{equation} In general, for any positive α: \begin{equation} e^{+α} \; \times \; e^{-α} = \dfrac{e^{α}}{e^{α}} = 1 \end{equation} Clearly, this stuck version is wrong. The question becomes: How can we correct it?

  2. The factor $e^{-α}$ in \eqref{eqn:1cdf} is a decaying exponential that will approach zero for any large value of α. The problem lies with $e^{+α}$ since it will become enormous for an arbitrarily large value of α. So, we need to tame it.

  3. Recall that the exponential function can be written as an infinite power series: \begin{equation} e^{x} = 1 + x + \dfrac{x^2}{2!} + \dfrac{x^3}{3!} + \ldots \label{eqn:infexp} \end{equation}

  4. But, if we truncate the series \eqref{eqn:infexp} at $n$ terms \begin{equation} 1 + x + \dfrac{x^2}{2!} + \ldots + \dfrac{x^n}{n!} \label{eqn:truncexp} \end{equation} it is no longer equivalent to $e^{x}$, but something less. The shorthand notation for \eqref{eqn:truncexp} is \begin{equation} \sum_{k=0}^n \dfrac{x^k}{k!} \end{equation} In our case, $x$ takes a specific value α.
  5. The factor $e^{+α}$ in \eqref{eqn:1cdf} is now replaced with the tamed sum: \begin{equation} e^{+α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \label{eqn:sumexp} \end{equation}
Combining these corrections produces \begin{equation} e^{+α} \; \times \; e^{-α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \; \times \; e^{-α} \end{equation} which is \eqref{eqn:pcdf}, and done!

No comments: