Prediction can be regarded as a game between two players, Predictor and Reality. Traditional approaches to prediction in probability theory, statistics, information theory, learning theory, etc., start from the assumption that Reality is following a stochastic strategy. Reality's strategy may be assumed known, known except a small number of parameters (perhaps with a prior distribution on them), consisting of independent and identically distributed trials, stationary, etc. Competitive on-line prediction (also known as universal prediction of individual sequences) replaces this assumption with a benchmark class of prediction strategies we would like to compete with. The implicit assumption of competitive on-line prediction is that some strategy in the benchmark class will perform well (the theory's conclusions hold regardless of this assumption but they are not interesting if none of the strategies in the benchmark class performs well). Unlike the Reality-centred assumptions of the traditional approaches competitive on-line prediction maybe be regarded to involve implicit Predictor-centred assumptions.
Interestingly, statistical learning theory (as developed by Vapnik and Chervonenkis, currently the dominant paradigm in learning theory) involves both Reality-centred and Predictor-centred assumptions: the performance guarantees provided by statistical learning theory depend both on the assumption that Reality produces examples independently from the same probability distribution and the assumption that the VC dimension of the considered class of prediction rules is small. In this sense it is intermediate between the traditional approaches and competitive on-line prediction.
Defensive forecasting is a new technique in competitive on-line prediction. The focal point of this project is investigation of this technique, but we also sometimes use other techniques (especially the Weak and Strong Aggregating Algorithm, the former recently proposed by Kalnishkan and Vyugin) when they lead to stronger results.
The technique of defensive forecasting has two sources: Glenn Shafer's and my work on the foundations of probability and the work in game theory on asymptotic calibration (starting from the 1998 Biometrika paper by Foster and Vohra). Later we realized that early work by Leonid Levin is also very relevant (see "Predictions as statements and decisions"). In our paper "Good sequential probability forecasting is always possible", Glenn and I showed that it is possible, using randomization, to make sequential probability forecasts that will pass any given battery of statistical tests; this simple result simplified and generalized the work in game theory. Akimichi Takemura pointed out that this can be simplified even further, avoiding randomization. Takemura's observation showed that for every continuous game-theoretic law of probability there is a forecasting strategy which is ideal as far as this law is concerned. This "meta-theorem" was applied to laws asserting that the true probabilities must be well-calibrated and have high resolution, giving a new forecasting algorithm, K29. The properties of this algorithm and its modifications are being actively studied.
The main idea of use of defensive forecasting in competitive on-line prediction can be summarized as follows (see, e.g., "Competitive on-line learning with a convex loss function" for details):
You can download the following papers (except the first) from the arXiv e-Print archive:
Building on the game-theoretic framework for probability, we show that it is possible, using randomization, to make sequential probability forecasts that will pass any given battery of statistical tests. This result, an easy consequence of von Neumann's minimax theorem, simplifies and generalizes work by earlier authors. A shorter version of this paper is published in the Journal of the Royal Statistical Society B, 67, 747763 (2005).
This paper establishes the "meta-theorem" mentioned above; it also introduces a simple forecasting strategy, called the K29 algorithm, based on defensive forecasting. Its version is published in the AI & Statistics 2005 proceedings.
The K29 algorithm (which is applicable only to binary forecasting problems) is generalized to many protocols studied in the book Probability and finance: it's only a game. A version of this paper is published in the ALT 2005 proceedings.
The K29 algorithm is shown to be well calibrated and to have good resolution for long enough sequences of observations and for a suitable choice of its kernel parameter. Our results are non-asymptotic: we establish explicit inequalities, shown to be tight, for the performance of the algorithm. A version of this paper is published in the ALT 2005 proceedings.
The results of the previous paper are applied to prediction with expert advice. Very roughly, it is shown that defensive probabilities (the forecasts produced by the K29 algorithm) lead to good decisions. A version of this paper is published in the ALT'2005 proceedings (under the title "Defensive forecasting with expert advice").
The approach of the previous paper is applied to the on-line regression problem. It is shown that, given a benchmark class of prediction strategies comprising a reproducing kernel Hilbert space (RKHS), it is possible to construct a prediction algorithm whose cumulative loss does not exceed the cumulative loss of any prediction strategy in the benchmark class plus a small term depending on the norm of the strategy in the RKHS. (Similar results for finite dimensional benchmark classes were proved earlier in my 2001 paper in International Statistical Review.) A version of this paper is published in the TAMC 2005 proceedings.
This paper concentrates on the problem of on-line prediction competitive with continuous but highly irregular prediction rules. The Hilbert-space methods of the previous paper are not longer applicable, and this paper proposes new Banach-space methods. A version of this paper is published in the COLT 2005 proceedings.
Prediction is a complex notion, and different predictors (such as people, computer programs, and probabilistic theories) can pursue very different goals. In this paper I will review some popular kinds of prediction and argue that the theory of competitive on-line learning can benefit from the kinds of prediction that are now foreign to it. A one-page abstract is published in the COLT 2005 proceedings.
We start from a simple asymptotic result for the problem of on-line regression with the quadratic loss function: the class of continuous limited-memory prediction strategies admits a "leading prediction strategy", which not only asymptotically performs at least as well as any continuous limited-memory strategy but also satisfies the property that the excess loss of any continuous limited-memory strategy is determined by how closely it imitates the leading strategy. More specifically, for any class of prediction strategies constituting a reproducing kernel Hilbert space we construct a leading strategy, in the sense that the loss of any prediction strategy whose norm is not too large is determined by how closely it imitates the leading strategy. This result is extended to the loss functions given by Bregman divergences and by strictly proper scoring rules. A short version is published in the ALT 2006 proceedings.
The method of defensive forecasting is applied to the problem of prediction with expert advice for binary outcomes. It turns out that defensive forecasting is not only competitive with the Aggregating Algorithm but also handles the case of "second-guessing" experts, whose advice depends on the learner's prediction; this paper assumes that the dependence on the learner's prediction is continuous.
Defensive forecasting is a method of transforming laws of probability (stated in game-theoretic terms as strategies for Sceptic) into forecasting algorithms. There are two known varieties of defensive forecasting: "continuous", in which Sceptic's moves are assumed continuous and which produces deterministic forecasts, and "randomized", in which Sceptic's moves are allowed to be discontinuous and Forecaster's moves are allowed to be randomized. This note shows that the randomized variety can be obtained from the continuous variety by smearing Sceptic's moves to make them continuous.
These are the slides of some of my talks on this topic: