Local learning by partitioning
MetadataShow full item record
In many machine learning applications data is assumed to be locally simple, where examples near each other have similar characteristics such as class labels or regression responses. Our goal is to exploit this assumption to construct locally simple yet globally complex systems that improve performance or reduce the cost of common machine learning tasks. To this end, we address three main problems: discovering and separating local non-linear structure in high-dimensional data, learning low-complexity local systems to improve performance of risk-based learning tasks, and exploiting local similarity to reduce the test-time cost of learning algorithms. First, we develop a structure-based similarity metric, where low-dimensional non-linear structure is captured by solving a non-linear, low-rank representation problem. We show that this problem can be kernelized, has a closed-form solution, naturally separates independent manifolds, and is robust to noise. Experimental results indicate that incorporating this structural similarity in well-studied problems such as clustering, anomaly detection, and classification improves performance. Next, we address the problem of local learning, where a partitioning function divides the feature space into regions where independent functions are applied. We focus on the problem of local linear classification using linear partitioning and local decision functions. Under an alternating minimization scheme, learning the partitioning functions can be reduced to solving a weighted supervised learning problem. We then present a novel reformulation that yields a globally convex surrogate, allowing for efficient, joint training of the partitioning functions and local classifiers. We then examine the problem of learning under test-time budgets, where acquiring sensors (features) for each example during test-time has a cost. Our goal is to partition the space into regions, with only a small subset of sensors needed in each region, reducing the average number of sensors required per example. Starting with a cascade structure and expanding to binary trees, we formulate this problem as an empirical risk minimization and construct an upper-bounding surrogate that allows for sequential decision functions to be trained jointly by solving a linear program. Finally, we present preliminary work extending the notion of test-time budgets to the problem of adaptive privacy.