Aggregating algorithms

Aggregating algorithms are a class of prediction algorithms developed in the strand of machine learning known as prediction with expert advice. The main algorithms in this class are the Strong Aggregating Algorithm (SAA, introduced in my COLT 1990 paper) and the Weak Aggregating Algorithm (WAA, introduced by Yura Kalnishkan and Misha Vyugin in their COLT 2005 paper). The SAA is the generalization of the Bayesian merging rule (introduced in learning theory by DeSantis, Markowsky, and Wegman in their FOCS 1988 paper) to a wider class of loss functions; it includes as special cases the earlier Weighted Majority Algorithm (proposed independently by Littlestone and Warmuth, COLT 1989 and Information and Computation 1994, and by me, Information and Computation 1992) and the universal portfolio algorithm (Cover, Mathematical Finance 1991). The WAA extends the basic scheme of the SAA to a wider class of loss functions.

There are many other techniques in prediction with expert advice, such as gradient descent (and its important modification, exponentiated gradient descent), following the perturbed leader, and defensive forecasting. The first two techniques are typically more computationally efficient (although one often has to pay the price of weaker performance guarantees). There are also new exciting kinds of results (such as prediction with limited feedback and the analysis of internal regret) in prediction with expert advice that are not reflected (as yet) on this page.


Papers (zipped postscript or pdf; where the authors are not given, I am the only author):

  1. "Aggregating strategies" (COLT 1990, pp 371–383). This paper introduces the Strong Aggregating Algorithm (SAA).
  2. "A game of prediction with expert advice" (Journal of Computer and System Sciences 56, 153–173, 1998). This paper shows that the SAA is optimal in a natural sense.
  3. "Derandomizing stochastic prediction strategies" (Machine Learning 35, 247–282, 1999). Application of the SAA to the problem of tracking the best expert and related problems; in particular, the paper shows that an earlier algorithm proposed by Herbster and Warmuth is a special case of the SAA.
  4. "Competitive on-line statistics" (International Statistical Review 69, 213–248, 2001). Application of the SAA to uncountably many experts; review of the field.
  5. "The weak aggregating algorithm and weak mixability" by Kalnishkan and Vyugin (COLT 2005). This paper introduces the Weak Aggregating Algorithm.
  6. Competing with stationary prediction strategies.
    In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments. The conference version of this paper is published in the COLT 2007 Proceedings.
  7. Competing with Markov prediction strategies.
    Assuming that the loss function is convex in the prediction, we construct a prediction strategy universal for the class of Markov prediction strategies, not necessarily continuous. Allowing randomization, we remove the requirement of convexity.
  8. Metric entropy in competitive on-line prediction.
    Competitive on-line prediction (also known as universal prediction of individual sequences) is a strand of learning theory avoiding making any stochastic assumptions about the way the observations are generated. The predictor's goal is to compete with a benchmark class of prediction rules, which is often a proper Banach function space. Metric entropy provides a unifying framework for competitive on-line prediction: the numerous known upper bounds on the metric entropy of various compact sets in function spaces readily imply bounds on the performance of on-line prediction strategies. This paper discusses strengths and limitations of the direct approach to competitive on-line prediction via metric entropy, including comparisons to other approaches.
  9. Prediction with expert advice for the Brier game.
    In this technical report I show that the Brier game of prediction is perfectly mixable and find the optimal learning rate and substitution function for it. These results are straightforward, but the computations are surprisingly messy.
Important note: the copyright for some of these papers belongs to the publishers. All papers may be downloaded for personal or research purposes only. Some of my papers containing results about aggregating algorithms are classified under different rubrics: e.g., "On-line regression competitive with reproducing kernel Hilbert spaces" is classified under defensive forecasting and "Universal portfolio selection" (by Vovk and Watkins) is classified under predictive complexity.


This page is maintained by Vladimir Vovk.   Last modified on 27 April 2009