Aggregating algorithms are a class of prediction algorithms developed in the strand of machine learning known as prediction with expert advice. The main algorithms in this class are the Strong Aggregating Algorithm (SAA, introduced in my COLT 1990 paper) and the Weak Aggregating Algorithm (WAA, introduced by Yura Kalnishkan and Misha Vyugin in their COLT 2005 paper). The SAA is the generalization of the Bayesian merging rule (introduced in learning theory by DeSantis, Markowsky, and Wegman in their FOCS 1988 paper) to a wider class of loss functions; it includes as special cases the earlier Weighted Majority Algorithm (proposed independently by Littlestone and Warmuth, COLT 1989 and Information and Computation 1994, and by me, Information and Computation 1992) and the universal portfolio algorithm (Cover, Mathematical Finance 1991). The WAA extends the basic scheme of the SAA to a wider class of loss functions.
There are many other techniques in prediction with expert advice, such as gradient descent (and its important modification, exponentiated gradient descent), following the perturbed leader, and defensive forecasting. The first two techniques are typically more computationally efficient (although one often has to pay the price of weaker performance guarantees). There are also new exciting kinds of results (such as prediction with limited feedback and the analysis of internal regret) in prediction with expert advice that are not reflected (as yet) on this page.
Papers (zipped postscript or pdf; where the authors are not given, I am the only author):
In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments. The conference version of this paper is published in the COLT 2007 Proceedings.
Assuming that the loss function is convex in the prediction, we construct a prediction strategy universal for the class of Markov prediction strategies, not necessarily continuous. Allowing randomization, we remove the requirement of convexity.
Competitive on-line prediction (also known as universal prediction of individual sequences) is a strand of learning theory avoiding making any stochastic assumptions about the way the observations are generated. The predictor's goal is to compete with a benchmark class of prediction rules, which is often a proper Banach function space. Metric entropy provides a unifying framework for competitive on-line prediction: the numerous known upper bounds on the metric entropy of various compact sets in function spaces readily imply bounds on the performance of on-line prediction strategies. This paper discusses strengths and limitations of the direct approach to competitive on-line prediction via metric entropy, including comparisons to other approaches.
In this technical report I show that the Brier game of prediction is perfectly mixable and find the optimal learning rate and substitution function for it. These results are straightforward, but the computations are surprisingly messy.
Talks: