Deep representation learning for photorealistic content creation

Date
2021
DOI
Authors
Xia, Xide
Version
OA Version
Citation
Abstract
We study the problem of deep representation learning for photorealistic content creation. This is a critical component in many computer vision applications ranging from virtual reality, videography, and even retail and advertising. In this thesis, we use deep neural techniques to develop end-to-end models that are capable of generating photorealistic results. Our framework is applied in three applications. First, we study real-time universal Photorealistic Image Style Transfer. Photorealistic style transfer is the task of transferring the artistic style of an image onto a content target, producing a result that is plausibly taken with a camera. We propose a new end-to-end model for photorealistic style transfer that is both fast and inherently generates photorealistic results. The core of our approach is a feed-forward neural network that learns local edge-aware affine transforms that automatically obey the photorealism constraint. Our method produces visually superior results and is three orders of magnitude faster, enabling real-time performance at 4K on a mobile phone. Next, we learn real-time localized Photorealistic Video Style Transfer. We present a novel algorithm for transferring artistic styles of an image onto local regions of a target video while preserving its photorealism. Local regions may be selected either fully automatically from an image, through using video segmentation algorithms, or from casual user guidance such as scribbles. Our method is real-time and works on arbitrary inputs without runtime optimization once trained. We demonstrate our method on a variety of style images and target videos, including the ability to transfer different styles onto multiple objects simultaneously, and smoothly transition between styles in time. Lastly, we tackle the problem of attribute-based Fashion Image Retrieval and Content Creation. We present an effective approach for generating new outfits based on the input queries through generative adversarial learning. We address this challenge by decomposing the complicated process into two stages. In the first stage, we present a novel attribute-aware global ranking network for attribute-based fashion retrieval. In the second stage, a generative model is used to finalize the retrieved results conditioned on an individual’s preferred style. We demonstrate promising results on standard large-scale benchmarks.
Description
License