2.8. Maximum penalized likelihood estimators
The methods discussed so far are all derived in an ad hoc way from the definition of a density. It is interesting to ask whether it is possible to apply standard statistical techniques, like maximum likelihood, to density estimation. The likelihood of a curve g as density underlying a set of independent identically distributed observations is given by
This likelihood has no finite maximum over the class of all densities. To see this, let h be the naive density estimate with window width 1/2 h; then, for each i,
and so
Thus the likelihood can be made arbitrarily large by taking densities approaching the sum of delta functions as defined in (2.7) above, and it is not possible to use maximum likelihood directly for density estimation without placing restrictions on the class of densities over which the likelihood is to be maximized.
There are, nevertheless, possible approaches related to maximum likelihood. One method is to incorporate into the likelihood a term which describes the roughness - in some sense - of the curve under consideration. Suppose R(g) is a functional which quantifies the roughness of g. One possible choice of such a functional is
(2.11) |
Define the penalized log likelihood by
(2.12) |
where is a positive smoothing parameter.
The penalized log likelihood can be seen as a way of quantifying the conflict between smoothness and goodness-of-fit to the data, since the log likelihood term log g(Xi) measures how well g fits the data. The probability density function is said to be a maximum penalized likelihood density estimate if it maximizes l(g) over the class of all curves g which satisfy - g = 1, g(x) 0 for all x, and R(g) < . The parameter controls the amount of smoothing since it determines the `rate of exchange' between smoothness and goodness-of-fit; the smaller the value of , the rougher - in terms of R() - will be the corresponding maximum penalized likelihood estimator. Estimates obtained by the maximum penalized likelihood method will, by definition, be probability densities. Further details of these estimates will be given in Section 5.4.