Last Friday, a coworker and I were chatting about our current projects. Since her work inspired me to write this post, I obviously found it more interesting than what was I doing. Plus, I am still stuck on how best to approach mine. And so I will save the details for another day.
My coworker was trying to estimate the distribution of a certain metric for a class of orders processed by the warehouse on a given day of the week. The distribution is discrete, taking integer values greater than or equal to . It’s also very right-skewed–it has a drawn-out right tail. Learning of these properties, I thought the Poisson distribution might be a good fit. For a review of the basics of the Poisson distribution, check out the Wikipedia page.
The Poisson distribution is specified by a single real parameter . So to fit a distribution to a sample, we need an estimator for . One (popular) option is the maximum likelihood estimator , which is informally the value of that is most “likely” to give the observed sample. Given i.i.d. random variables , the maximum likelihood estimator, assuming it exists, is the random variable which maximizes the likelihood function
Suppose are i.i.d. random variables, where . The likelihood function of the observations is
To find the critical points , we take the natural logarithm and then the partial derivative with respect to . Setting the derivative equal to , we see that
which implies that
Since , for , and , for , we conclude that maximizes the likelihood function.
The maximum likelihood estimator is unbiased, meaning its expectation equals , since the linearity of the expectation implies that
Furthermore, is efficient, which means that the estimator’s variance achieves the lower bound for the variance of any estimator of . To formulate efficiency, we need the notion of the Fisher information of a random variable. We define the Fisher information of with unknown parameter by
If the log-likelihood function is twice differentiable with respect to , then
An important result in statistics called the Cramér–Rao bound relates the Fisher information of a random variable to the variance of an estimator of the unknown parameter through the inequality
Returning to my claim that is efficient, observe that
To see this, observe that the Fisher information of is
A fortiori, is a minimum-variance unbiased estimator.
I have no idea whether the Poisson distribution is a good fit for the data mentioned in the introduction. Laughably, I have not looked at the actual data yet!