Last Friday, a coworker and I were chatting about our current projects. Since her work inspired me to write this post, I obviously found it more interesting than what was I doing. Plus, I am still stuck on how best to approach mine. And so I will save the details for another day.

My coworker was trying to estimate the distribution of a certain metric for a class of orders processed by the warehouse on a given day of the week. The distribution is discrete, taking integer values greater than or equal to . It’s also very right-skewed–it has a drawn-out right tail. Learning of these properties, I thought the Poisson distribution might be a good fit. For a review of the basics of the Poisson distribution, check out the Wikipedia page.

The Poisson distribution is specified by a single real parameter . So to fit a distribution to a sample, we need an estimator for . One (popular) option is the maximum likelihood estimator , which is informally the value of that is most “likely” to give the observed sample. Given i.i.d. random variables , the maximum likelihood estimator, assuming it exists, is the random variable which maximizes the likelihood function

Suppose are i.i.d. random variables, where . The likelihood function of the observations is

To find the critical points , we take the natural logarithm and then the partial derivative with respect to . Setting the derivative equal to , we see that

which implies that

Since , for , and , for , we conclude that maximizes the likelihood function.

The maximum likelihood estimator is *unbiased*, meaning its expectation equals , since the linearity of the expectation implies that

Furthermore, is *efficient*, which means that the estimator’s variance achieves the lower bound for the variance of any estimator of . To formulate efficiency, we need the notion of the *Fisher information* of a random variable. We define the Fisher information of with unknown parameter by

If the log-likelihood function is twice differentiable with respect to , then

An important result in statistics called the Cramér–Rao bound relates the Fisher information of a random variable to the variance of an estimator of the unknown parameter through the inequality

Returning to my claim that is efficient, observe that

To see this, observe that the Fisher information of is

*A fortiori, * is a *minimum-variance unbiased estimator. *

I have no idea whether the Poisson distribution is a good fit for the data mentioned in the introduction. Laughably, I have not looked at the actual data yet!

Great!!

Thanks for the article, really helps my homework, :-), by the way, the maximum likelihood is only to be used after you have decided to use a certain distribution. But how to pick a distribution to fit data well is tricky.