Domain invariance for semantically consistent image manipulation

OA Version
Citation
Abstract
Image manipulation is a fundamental task in computer vision, spanning its range of applications from domain adaptation and data augmentation to visual content creation. At the root of the task lies two equally important goals -- generating highly realistic and diverse images and preserving the aspects of the input image not related to the desired edit. In this thesis, we explore the latter goal, answering the questions: what can be considered a semantically correct image manipulation, and how to evaluate it? given unpaired examples before and after the edit, can a generative model infer what aspects of the input we aim to preserve, and which we want to manipulate? what are the necessary conditions that allow us to guarantee that manipulation preserves the semantics? and many more. This thesis ties semantic consistency to the problem of disentanglement, formulating it as disentangling the domain invariant factors of variation -- aspects shared across the examples before and after manipulation, which allows a more rigorous and systematic approach to solving the task. We illustrate the advantages of disentangling the domain-invariant features for semantically consistent mappings on various image editing tasks, including general unpaired image-to-image translation, sketch-to-photo translation and object relighting.
Description
2024
License
Attribution-NonCommercial 4.0 International