Next: Transient behaviour of Large Up: No Title Previous: References

Infinite-dimensional Likelihood Methods in Statistics

 
 Research Programme: 		 Statistics

 Researcher: 		 A.W. van der Vaart

Much of the statistical theory developed before 1980 was concerned with so-called parametric models. These are models which allow only finitely many degrees of freedom (unknowns) to the phenomenon that is being modelled. Thus, they tend to fit the phenomenon badly, unless this is observed under closely controlled and previously studied conditions. One of the most important directions in current statistical research is the study of infinite-dimensional models, for which there is both practical and theoretical motivation. Many large or badly structured data-sets simply cannot be reliably analyzed with the classical techniques. In particular, data that result from observational, rather than experimentally controlled studies, and/or are subject to several types of ``censoring'' (missing or partially observed data). An intrinsic mathematical motivation is that the research leads to interesting mathematics. The revolution in computing power in the past years was a precondition for these new techniques, because, typically, statistical techniques for infinite-dimensional models are computer-intensive.

If we restrict ourselves to independent replications of an experiment, leading to observations , then a model is precisely the set of possible probability distributions P of a single observation. For a classical parametric model this set of distributions is ``nicely'' parametrized by a Euclidean vector. The simplest type of infinite-dimensional model is the nonparametric model, in which we observe a random sample from a completely unknown distribution. Then is the collection of all probability measures on the sample space, and, as is intuitively clear, the empirical distribution (with the Dirac measure at x) is an optimal estimator for the underlying distribution. More interesting are the intermediate models, which are not ``nicely'' parametrized by a Euclidean parameter, as are the standard classical models, but do restrict the distribution in an important way. Such models are often parametrized by infinite-dimensional parameters, such as distribution functions or densities, that express the structure under study. In particular, the model may have a natural parametrization , where is a Euclidean parameter and runs through a nonparametric class of distributions, or some other infinite-dimensional set. This gives a semiparametric model, in which we aim at estimating and consider as a nuisance parameter. More generally, we focus on estimating the value of some function on the model, with values in the real line or in some other Banach space.

The precise study of the properties of statistical experiments for a fixed number n of observations is often intractable, and therefore many theoretical investigations concern asymptotics as . This is particularly true for infinite-dimensional models. In estimation theory we wish to find functions that approximate the quantity of interest as well as possible. Asymptotically we should at least have that converges in probability to if the probabilities are calculated according to P, for every . This consistency property is comforting, but in practice we wish to know much more, for instance a rate of convergence , and preferably limits of probabilities of the type . These will lead to confidence statements of the form: is in the ball of radius around , with probability (or confidence) . The number is determined from the limit distribution of .

For classical parametric models the most important general method of constructing estimators is the method of maximum likelihood. The likelihood function is the joint density of the observations (relative to some dominating measure) viewed as function of the parameter, and the maximum likelihood estimator is the point of maximum of this function. The asymptotic theory of this estimator for parametric models is well-known. In the most common case the scaling rate is equal to and the probabilities converge to probabilities under the normal distribution, with a certain variance that can be expressed in the Fisher information . A good first introduction to statistics should introduce approximate confidence statements of the type , based on the maximum likelihood estimator .

Given the results for the classical, parametric models, it is natural to try the same maximum likelihood recipe for infinite-dimensional models. Here the situation turns out to be much more complicated, and a lot is still unknown.

To begin with, it is not always clear how a ``likelihood function'' should be defined. Many infinite-dimensional models are not dominated, in the sense that every has a density relative to a certain fixed measure. If it is, then it may happen that the supremum over all is infinite, and a maximum likelihood estimator does not exist. One way to overcome such problems is to use the empirical likelihood, defined as the map

with domain , denoting the probability of the point x under P. Other possibilities are to introduce a penalty term J(P) in the likelihood, which disqualifies the P that were causing trouble before; or to restrict the maximization to an approximating set , which will need to grow with n to induce consistency and ensure a good rate of convergence.

Once a likelihood is defined, we wish to compute its point of maximum. Analytic solutions are rare, so we search for an efficient numerical algorithm. This may be nontrivial because the function to optimize may be of high dimension. Next, we wish to obtain confidence statements using the maximum likelihood estimator. Again the situation is considerably more complicated than for classical models. Some aspects of the maximum likelihood estimator resemble the approximation properties of classical maximum likelihood estimators. In particular, their convergence rate is , and asymptotically their distribution follows the normal distribution, with variance involving a generalization of the Fisher information. However, other aspects may show a totally different and novel behaviour, of which very little is known at the present time.

In any case, the mathematical arguments to derive these results involve completely different tools and arguments. Very important ones are entropy calculations of the statistical models. Roughly the size of is measured by the number of balls of a fixed radius , in a suitable metric, needed to cover . It will be required that this number is finite for every (at least locally), so that must be totally bounded, and the speed at which the numbers go up as decreases to 0 should be bounded as well (by roughly ) for some constant K. The rate at which the entropy grows is a measure of the size of the model, and is connected to the rate of convergence of the maximum likelihood estimator.

Likelihood inference in statistics is not limited to the maximum likelihood estimator. In practice the likelihood ratio statistic is perhaps considered even more important. This is the ratio of the likelihood function at a given point and its maximum value. Fortunately, as for classical models, the asymptotic behaviour of this statistic is closely related to the behaviour of the maximum likelihood estimator. The statistic is used both for testing certain hypotheses, and as an alternative for obtaining confidence statements.

The study of the efficiency of likelihood methods is of considerable interest. Again, for classical parametric models, this question has been solved, and the likelihood methods are efficient in an asymptotic sense. This means that for large n no better estimators or confidence statements are possible, by any method. Much progress has been made for infinite-dimensional models, but much is still unknown. It is not excluded that maximum likelihood is not optimal for certain purposes, even though it has already gained a strong position in practice.

As an example consider the proportional odds model, which is used in the analysis of life times. The observations are a random sample from the distribution of , where, given Z, the variables T and C are independent with unspecified probability distributions, apart from the requirement that the conditional distribution function of T given Z satisfies

The left side is the conditional odds given z of survival until t. The unknown parameter is a nondecreasing, cadlag function from into itself with . It is the odds of survival when and T is independent of Z. In a classical parametric model this function would have been modelled, for instance, linearly, or as a power function, but presently we only impose monotonicity.

In this example we cannot use the density of the observations as a likelihood, for the supremum will be infinite unless we restrict in an important way. Instead, we use the empirical likelihood. The probability that X=x is given by

displaymath4346

For likelihood inference concerning only, we may drop the terms involving and , and define the likelihood for one observation as

displaymath4354

The numerical problems is to compute the maximizer of the function , given fixed observations . The mathematical problem is to characterize probabilities of the type .

The figure shows levels of the profile likelihood function

(for a two-dimensional in this case), for a given data-set from study of the survival of lung cancer patients. The centre of the ellipses is an estimate for , and the ellipses are confidence sets that give an indication of the precision of the estimate. For instance, one should allow for the true value of the parameter to be outside the dark contour ellips with ``probability'' 5 %. The two coordinates of correspond to tumor type (horizontally) and general condition (vertically), and the plot shows that with large confidence the second has a small negative effect while the first a larger positive effect (with the signs relative to the measurement scales).

Statistics for infinite-dimensional parameters is a unifying theme for a large part of the research by the Programme ``Statistics'' of the Stieltjes Institute. Likelihood based methods, in all its varieties and in different settings, form an important subtheme in this research.

Next: Transient behaviour of Large Up: No Title Previous: References

J.H.M.Dassen
Fri Mar 20 16:01:06 MET 1998