We study the notion of learning in an oblivious changing environment. Existing online learning algorithms which minimize regret are shown to converge to the average of all locally optimal solutions. We propose a new performance metric, strengthening the standard metric of regret, to capture convergence to locally optimal solutions. We then describe a series of reductions which transform algorithms for minimizing (standard) regret into adaptive algorithms albeit incurring only poly-logarithmic computational overhead.
We describe applications of this technique to various well studied online problems, such as portfolio management, online routing and the tree update problem. In all cases we explain how previous algorithms perform suboptimally and how the reduction technique gives adaptive algorithms.
Our reductions combine techniques from data streaming algorithms, composition of learning algorithms and a novel use of unbiased gradient estimates.
We study the notion of learning in an oblivious changing environment. Existing online learning algorithms which minimize regret are shown to converge to the average of all locally optimal solutions. We propose a new performance metric, strengthening the standard metric of regret, to capture convergence to locally optimal solutions, and propose efficient algorithms which provably converge at the optimal rate.
One application is the portfolio management problem, for which we show that all previous algorithms behave suboptimally under dynamic market conditions. Another application is online routing, for which our adaptive algorithm exploits local congestion patterns and runs in near-linear time. We also give an algorithm for the tree update problem that is statically optimal for every sufficiently long contiguous subsequence of accesses.
Our algorithm combines techniques from data streaming algorithms, composition of learning algorithms, and a twist on the standard experts framework.