Stochastic functional descent for learning Support Vector Machines
We present a novel method for learning Support Vector Machines (SVMs) in the online setting. Our method is generally applicable in that it handles the online learning of the binary, multiclass, and structural SVMs in a unified view. The SVM learning problem consists of optimizing a convex objective function that is composed of two parts: the hinge loss and quadratic regularization. To date, the predominant family of approaches for online SVM learning has been gradient-based methods, such as Stochastic Gradient Descent (SGD). Unfortunately, we note that there are two drawbacks in such approaches: first, gradient-based methods are based on a local linear approximation to the function being optimized, but since the hinge loss is piecewise-linear and nonsmooth, this approximation can be ill-behaved. Second, existing online SVM learning approaches share the same problem formulation with batch SVM learning methods, and they all need to tune a fixed global regularization parameter by cross validation. On the one hand, global regularization is ineffective in handling local irregularities encountered in the online setting; on the other hand, even though the learning problem for a particular global regularization parameter value may be efficiently solved, repeatedly solving for a wide range of values can be costly. We intend to tackle these two problems with our approach. To address the first problem, we propose to perform implicit online update steps to optimize the hinge loss, as opposed to explicit (or gradient-based) updates that utilize subgradients to perform local linearization. Regarding the second problem, we propose to enforce local regularization that is applied to individual classifier update steps, rather than having a fixed global regularization term. Our theoretical analysis suggests that our classifier update steps progressively optimize the structured hinge loss, with the rate controlled by a sequence of regularization parameters; setting these parameters is analogous to setting the stepsizes in gradient-based methods. In addition, we give sufficient conditions for the algorithm's convergence. Experimentally, our online algorithm can match optimal classification performances given by other state-of-the-art online SVM learning methods, as well as batch learning methods, after only one or two passes over the training data. More importantly, our algorithm can attain these results without doing cross validation, while all other methods must perform time-consuming cross validation to determine the optimal choice of the global regularization parameter.