Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions
MetadataShow full item record
A learning based framework is proposed for estimating human body pose from a single image. Given a differentiable function that maps from pose space to image feature space, the goal is to invert the process: estimate the pose given only image features. The inversion is an ill-posed problem as the inverse mapping is a one to many process. Hence multiple solutions exist, and it is desirable to restrict the solution space to a smaller subset of feasible solutions. For example, not all human body poses are feasible due to anthropometric constraints. Since the space of feasible solutions may not admit a closed form description, the proposed framework seeks to exploit machine learning techniques to learn an approximation that is smoothly parameterized over such a space. One such technique is Gaussian Process Latent Variable Modelling. Scaled conjugate gradient is then used find the best matching pose in the space of feasible solutions when given an input image. The formulation allows easy incorporation of various constraints, e.g. temporal consistency and anthropometric constraints. The performance of the proposed approach is evaluated in the task of upper-body pose estimation from silhouettes and compared with the Specialized Mapping Architecture. The estimation accuracy of the Specialized Mapping Architecture is at least one standard deviation worse than the proposed approach in the experiments with synthetic data. In experiments with real video of humans performing gestures, the proposed approach produces qualitatively better estimation results.