Deeply Learned Compositional Models for Human Pose Estimation
 
Wei Tang, Pei Yu, and Ying Wu
 
EECS Department, Northwestern Unversity, USA
 
Abstract
Compositional models represent patterns with hierarchies of meaningful parts and subparts. Their ability to characterize high-order relationships among body parts helps resolve low-level ambiguities in human pose estimation (HPE). However, prior compositional models make unrealistic assumptions on subpart-part relationships, making them incapable to characterize complex compositional patterns. Moreover, state spaces of their higher-level parts can be exponentially large, complicating both inference and learning. To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional Model (DLCM), for HPE. It exploits deep neural networks to learn the compositionality of human bodies. This results in a novel network with a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art methods on three benchmark datasets.
 
Overview
   
(a) A typical compositional model of a human body. The pose is estimated via two stages: bottom-up inference followed by top-down refinement. (b) Overview of our deeply learned compositional model. The orange and green arrows respectively denote compositional inference functions modeled by CNNs in bottom-up and top-down stages. The colored rectangles on the left side denote predicted score maps of parts at different semantic levels while the heat maps on the right side represent their corresponding ground truth in the training phase.
 
Resources
Poster: dlcm_poster.pptx
Source code: DLCM-release.zip
Trained models: dlcm_l3_flic.t7, dlcm_l3_mpii_lsp.t7, dlcm_l3_mpii_exclude_val.t7, dlcm_l3_mpii_include_val.t7
Please contact Wei Tang (weitang2015 AT u.northwestern.edu) for any questions concerning this project.
 
References
[1] W. Tang, P. Yu, and Y. Wu. Deeply Learned Compositional Models for Human Pose Estimation. In ECCV, 2018.
[2] W. Tang, P. Yu, J. Zhou, and Y. Wu. Towards a Unified Compositional Model for Visual Pattern Modeling. In ICCV, 2017.
[3] A. Newell, K. Yang, J. Deng. Stacked hourglass networks for human pose estimation. In ECCV, 2016.