Zhang, Huiyang
(Institute of Industrial Science, The University of Tokyo, Tokyo, Japan)
,
Gu, Yanlei
(Institute of Industrial Science, The University of Tokyo, Tokyo, Japan)
,
Kamijo, Shunsuke
(Institute of Industrial Science, The University of Tokyo, Tokyo, Japan)
For years, behavior understanding has been a hot topic in the field of computer vision. As an important part of human behavior understanding, pose estimation has attracted lots of interests. Recently, deep learning methods, such as Mask R-CNN, have achieved much better performance for computer visio...
For years, behavior understanding has been a hot topic in the field of computer vision. As an important part of human behavior understanding, pose estimation has attracted lots of interests. Recently, deep learning methods, such as Mask R-CNN, have achieved much better performance for computer vision tasks than that of traditional approaches, as deep neural network can find representative features efficiently. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, this feature is not sufficient, especially when the pose is occluded or not intact. In fact, many features other than joint can also contribute to pose estimation, such as body boundary, body orientation and visibility condition. By adopting multi-task strategy, these features can be efficiently combined inside deep learning model for pose estimation. In this paper, we present a multi-task pose estimation approach to deal with human behavior understanding. Our deep learning model is based on Mask-RCNN, of which the output contains 4 tasks: human keypoint prediction, body segmentation, orientation prediction and mutual occlusion detection. Our model is trained on the public dataset COCO, which is further augmented by ground truths of orientation mask and occlusion mask. Experiments show the learning accuracy of the proposed method. Comparisons further illustrate the performance improvement after combining more features by multi-task strategy.
For years, behavior understanding has been a hot topic in the field of computer vision. As an important part of human behavior understanding, pose estimation has attracted lots of interests. Recently, deep learning methods, such as Mask R-CNN, have achieved much better performance for computer vision tasks than that of traditional approaches, as deep neural network can find representative features efficiently. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, this feature is not sufficient, especially when the pose is occluded or not intact. In fact, many features other than joint can also contribute to pose estimation, such as body boundary, body orientation and visibility condition. By adopting multi-task strategy, these features can be efficiently combined inside deep learning model for pose estimation. In this paper, we present a multi-task pose estimation approach to deal with human behavior understanding. Our deep learning model is based on Mask-RCNN, of which the output contains 4 tasks: human keypoint prediction, body segmentation, orientation prediction and mutual occlusion detection. Our model is trained on the public dataset COCO, which is further augmented by ground truths of orientation mask and occlusion mask. Experiments show the learning accuracy of the proposed method. Comparisons further illustrate the performance improvement after combining more features by multi-task strategy.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.