Efficient techniques for recovering 2D human body poses from images (PhD thesis)
OA Version
Citation
Tian, Tai-Peng. "Efficient Techniques for Recovering 2D Human Body Poses from Images (PhD Thesis)", Technical Report BUCS-TR-2011-006, Computer Science Department, Boston University, February 23, 2011. [Available from: http://hdl.handle.net/2144/11363]
Abstract
Human parsing recovers the 2D spatial layout of a human figure in an image. First, patches in the image that resemble body parts, i.e., head, torso and limbs, are identified, then a coherent human figure is assembled from these candidate positions. The human model is represented as a graph where each vertex represents a body part and each edge represents a relationship between parts. If the graph is a tree, then the optimal solution can be recovered efficiently using the Min-Sum (MS) algorithm. Tree models often return incorrect solutions with the left and right legs stacked on top of one another. To overcome this problem, we add constraints to the tree model, yielding a graph that contains loops. Finding the optimal solution for a loopy graph is computationally intensive. We propose a Branch and Bound search algorithm to recover the optimal solution. Our algorithm converges quickly in practice due to a novel tree structured lower bound and a fast way for evaluating these lower bounds. Naively, evaluating each lower bound requires $O(nh)$ time for a graph with $n$ vertices and $h$ candidate body part locations. We develop an $O(1)$ time method for evaluating the lower bound (in most iterations of the algorithm) by reusing messages from the MS algorithm and using a Range Minimum Query data structure. We also propose a human parsing model that encodes the viewpoint and walking phase of the human figure using the Common Factor Model (CFM). The main computational bottleneck of the CFM human parsing algorithm involves message creation for each iteration of the MS algorithm. The original CFM inference requires $O(kn)$ messages to be created for $k$ iterations of the MS algorithm in a graph with $n$ vertices. Our new algorithm reduces this to $O(n)$ messages created. This speedup is based on the insight that the messages are shifted from one iteration to the next and, therefore, messages can be created once and then shifted in subsequent iterations (shifting is an efficient operation which requires $O(1)$ time). In our experiments, the two proposed algorithms yield an order of magnitude computational speedup over competing algorithms.