Generalized implicit online convex optimization

Date
2024
DOI
Version
OA Version
Citation
Abstract
Online learning is a setting where the learner receives an arbitrary sequence of loss functions, selects points before knowing the loss functions, and is evaluated on the values of the loss functions on the points it selects. The goal is to minimize the regret, which is the difference between the cumulative loss of the algorithm and that of any arbitrary comparator. In Online Convex Optimization (OCO), the domain is convex set and the losses are convex functions. A classic way to design OCO algorithm is to approximate the original convex losses with linear models, reducing the OCO problem into an online linear optimization (OLO) problem. Then we only need to solve the OLO problem. However, due to the linearization, the algorithm sacrifices useful curvature properties and may incur instability. In practice, algorithms using original losses, that is implicit updates, offer significant advantages in terms of empirical performance and robustness with respect to learning rate selection. Unfortunately, computational complexity can be enormous. Our goal is to design efficient implicit OCO algorithms: First, we focus on parameter-free stochastic gradient descent, a class of algorithms that do not require setting learning rates while achieving optimal theoretical performance. However, an empirical gap remains between tuned stochastic gradient descent and parameter-free stochastic gradient descent. We close this gap with a new implicit parameter-free algorithm based on continuous-time Coin-Betting on truncated linear models that approximate true losses better than linear models. Second, we propose new parameter-free algorithms that leverage truncated linear models through a new update with an implicit flavor. Based on a novel decomposition of regret, this new update is efficient, never overshoots the minimum of the truncated model, and achieves an optimal theoretical guarantee. Third, we introduce Generalized Implicit Follow-the-Regularized-Leader (FTRL), a simple, unifying framework of online learning algorithms that directly improves the worst-case upper bound on the regret. It can not only recover known algorithms, such as FTRL with linearized losses, implicit FTRL, Importance Weight Aware updates, but also allows for the design of new update rules, such as extensions of aProx and Mirror-Prox to FTRL. Lastly, Differentially-Private stochastic gradient descent is sensitive to learning rate settings, and tuning can lead to privacy leakage. Implicit methods are natural candidates to replace gradient descents due to their inherent robustness to learning rate settings. We propose an algorithm that performs implicit updates on "huberized" proxy functions, using objective perturbation for each sample in a minibatch.
Description
2024
License