Purpose: Stirling’s approximation is used to approximate the factorial of large numbers, which are computationally infeasible to calculate directly.

The Approximation: For a large number , the factorial function is approximated as:

Taking the natural logarithm yields:

The most significant part of the approximation, known as the leading-order behavior, is:

The term is the first-order correction, which provides a more precise approximation.

Derivation using the Poisson Distribution

The approximation can be derived from the Poisson distribution, which for large mean can be approximated by a Gaussian distribution.

The Poisson distribution is given by:

For large , the distribution is centered around and is well-approximated by a Gaussian with mean and variance :

By setting , we find the peak probability:

Rearranging this equation gives Stirling’s approximation for :

Application: Approximating the Binomial Coefficient

Stirling’s approximation is particularly useful for approximating the logarithm of the binomial coefficient, .

Using only the leading-order behavior of Stirling’s approximation, we get:

This expression can be rewritten by factoring out and applying logarithm properties:

Let . The expression becomes:

This result is directly related to the binary entropy function (often used in information theory), which is defined as:

Thus, the approximation for the binomial coefficient can be expressed concisely as:

A more accurate approximation, including the first-order correction, is:

(MacKay, 2003, pp. 1–2)

Reference

MacKay, D. J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge university press.