Researcher: A.W. van der Vaart
Research Programme: Statistics
Much of the statistical theory developed before 1980 was concerned with so-called parametric models. These are models which allow only finitely many degrees of freedom (unknowns) to the phenomenon that is being modelled. Thus, they tend to fit the phenomenon badly, unless this is observed under closely controlled and previously studied conditions. One of the most important directions in current statistical research is the study of infinite-dimensional models, for which there is both practical and theoretical motivation. Many large or badly structured data-sets simply cannot be reliably analyzed with the classical techniques. In particular, data that result from observational, rather than experimentally controlled studies, and/or are subject to several types of ``censoring'' (missing or partially observed data). An intrinsic mathematical motivation is that the research leads to interesting mathematics. The revolution in computing power in the past years was a precondition for these new techniques, because, typically, statistical techniques for infinite-dimensional models are computer-intensive.
If we restrict ourselves to independent
replications of an experiment, leading to observations
,
then a model is precisely the set
of possible
probability distributions
P of a single observation.
For a classical parametric model this set of distributions
is ``nicely'' parametrized by a Euclidean vector. The simplest type of
infinite-dimensional
model is the nonparametric model, in which we observe a random
sample from a completely unknown distribution. Then
is the collection of all probability measures on the sample space,
and, as is intuitively clear,
the empirical distribution
(with
the Dirac measure at x) is an optimal
estimator for the underlying distribution.
More interesting are the intermediate models, which are not ``nicely''
parametrized
by a Euclidean parameter, as are the standard classical models,
but do restrict the distribution in an important way.
Such models are often parametrized by infinite-dimensional
parameters, such as distribution functions or densities,
that express the structure under study.
In particular, the model may have a natural parametrization
, where
is a Euclidean parameter
and
runs through a nonparametric class of
distributions, or some other infinite-dimensional set.
This gives a semiparametric model, in which we
aim at estimating
and consider
as a nuisance parameter.
More generally, we focus on estimating
the value
of some function
on the model,
with values in the real line or in some other Banach space.
The precise study of the properties of statistical experiments
for a fixed number n of observations is often intractable, and therefore many
theoretical investigations concern asymptotics as
.
This is particularly true for infinite-dimensional models.
In estimation theory we wish to find functions
that approximate the quantity of interest
as well
as possible. Asymptotically we should at least have that
converges in probability to
if the probabilities
are calculated according to P, for every
.
This consistency property is comforting, but in practice
we wish to know much more, for instance a rate of convergence
,
and preferably limits of probabilities of the type
. These will lead to
confidence statements
of the form:
is in the ball of radius
around
,
with probability (or confidence)
. The number
is determined
from the limit distribution of
.
For classical parametric models the most important general method
of constructing estimators
is the method of
maximum likelihood.
The likelihood function is the joint density of the observations
(relative to some dominating measure) viewed as function
of the parameter, and the maximum
likelihood estimator is the point of maximum of this function.
The asymptotic theory of this estimator for parametric models
is well-known. In the most common case the scaling rate
is equal to
and the probabilities
converge to probabilities
under the normal distribution, with a certain variance that
can be expressed in the Fisher information
. A good first
introduction to statistics should introduce approximate
confidence
statements of the type
,
based on the maximum likelihood estimator
.
Given the results for the classical, parametric models, it is natural to try the same maximum likelihood recipe for infinite-dimensional models. Here the situation turns out to be much more complicated, and a lot is still unknown.
To begin with, it is not always clear how a ``likelihood
function'' should be defined. Many infinite-dimensional models
are not dominated, in the sense that every
has
a density relative to a certain fixed measure. If it is,
then it may happen that
the supremum over all
is infinite, and a maximum likelihood
estimator does not exist. One way to overcome such problems is
to use the empirical likelihood, defined as the map
with domain
,
denoting the probability of the point x under P.
Other possibilities
are to introduce a penalty term J(P) in the likelihood, which
disqualifies the P that were causing trouble before;
or to restrict the maximization to an approximating set
,
which will need to grow with n to induce consistency
and ensure a good rate of convergence.
Once a likelihood is defined, we wish
to compute its point of maximum. Analytic solutions are rare, so
we search for an efficient numerical algorithm.
This may be nontrivial because the function to optimize may be
of high dimension. Next, we wish to obtain confidence
statements using the maximum likelihood estimator. Again
the situation is considerably more complicated than for
classical models. Some aspects
of the
maximum likelihood estimator resemble the approximation
properties of classical maximum likelihood estimators.
In particular, their convergence rate is
, and
asymptotically their distribution follows the normal distribution,
with variance involving a generalization of the Fisher information.
However, other aspects may show a totally different and
novel behaviour, of which very little is known at the present
time.
In any case, the mathematical arguments to derive
these results involve completely different tools and arguments.
Very important ones are entropy calculations of the statistical
models. Roughly the size of
is measured by the number
of balls of a fixed radius
, in a suitable metric,
needed to cover
. It will be required that this number
is finite for every
(at least locally), so that
must be totally bounded, and the speed at which the numbers go
up as
decreases to 0 should be bounded as well
(by roughly
) for some constant K.
The rate at which the entropy grows is a measure of the
size of the model, and is connected to the
rate of convergence of the maximum likelihood estimator.
Likelihood inference in statistics is not limited to the maximum likelihood estimator. In practice the likelihood ratio statistic is perhaps considered even more important. This is the ratio of the likelihood function at a given point and its maximum value. Fortunately, as for classical models, the asymptotic behaviour of this statistic is closely related to the behaviour of the maximum likelihood estimator. The statistic is used both for testing certain hypotheses, and as an alternative for obtaining confidence statements.
The study of the efficiency of likelihood methods is of considerable interest. Again, for classical parametric models, this question has been solved, and the likelihood methods are efficient in an asymptotic sense. This means that for large n no better estimators or confidence statements are possible, by any method. Much progress has been made for infinite-dimensional models, but much is still unknown. It is not excluded that maximum likelihood is not optimal for certain purposes, even though it has already gained a strong position in practice.
As an example consider the proportional odds model, which
is used in the analysis of life times.
The observations are a random sample from the distribution
of
, where, given Z,
the variables T and C are independent with unspecified
probability distributions, apart from the requirement that the
conditional distribution
function
of T given Z satisfies
The left side is the conditional odds given z
of survival until t. The unknown parameter
is a nondecreasing, cadlag function from
into itself with
. It is
the odds of survival when
and T is independent
of Z. In a classical parametric model this function
would have been modelled, for instance, linearly, or as a power function,
but presently we only impose monotonicity.
In this example we
cannot use the density of the observations
as a likelihood, for the supremum will
be infinite unless we restrict
in an important way.
Instead, we use the empirical likelihood. The probability that
X=x is given by
For likelihood inference concerning
only,
we may drop the terms involving
and
, and define the likelihood for one observation as
The numerical problems is to compute the maximizer of the
function
,
given fixed observations
. The mathematical
problem is to characterize probabilities of the type
.
The figure shows levels of the profile likelihood function
(for a two-dimensional
in this case),
for a given data-set from study of the survival of
lung cancer patients.
The centre of the ellipses is an estimate for
, and
the ellipses are confidence sets that give an indication of the
precision of
the estimate. For instance, one should allow for the true value
of the parameter to be outside the dark contour ellips with
``probability'' 5 %.
The two coordinates of
correspond
to tumor type (horizontally) and general condition (vertically),
and the plot shows that with large confidence the second
has a small negative effect while the first a larger positive effect
(with the signs relative to the measurement scales).
Statistics for infinite-dimensional parameters is a unifying theme for a large part of the research by the Programme ``Statistics'' of the Stieltjes Institute. Likelihood based methods, in all its varieties and in different settings, form an important subtheme in this research.