Learning deep embeddings by learning to rank
MetadataShow full item record
We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications. First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks. Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching. Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited.