Active Transfer Learning for Efficient Video-Specific Human Pose Estimation

Abstract

Human Pose (HP) estimation is actively researched because of its wide range of applications. However, even estimators pre-trained on large datasets may not perform satisfactorily due to a domain gap between the training and test data. To address this issue, we present our approach combining Active Learning (AL) and Transfer Learning (TL) to adapt HP estimators to individual video domains efficiently. For efficient learning, our approach quantifies (i) the estimation uncertainty based on the temporal changes in the estimated heatmaps and (ii) the unnaturalness in the estimated full-body HPs. These quantified criteria are then effectively combined with the state-of-the-art representativeness criterion to select uncertain and diverse samples for efficient HP estimator learning. Furthermore, we reconsider the existing Active Transfer Learning (ATL) method to introduce novel ideas related to the retraining methods and Stopping Criteria (SC). Experimental results demonstrate that our method enhances learning efficiency and outperforms comparative methods.

Overview of our Video-Specific Active Transfer Learning.

Method

Comparison of selected samples by existing methods (a) MPE, (b) Core-Set, and (c) Ours. Our method can effectively select uncertain and representative samples in a query video.

Uncertainty estimation results by the proposed Temporal Heatmap Continuity (THC) criterion.

An overview of the proposed Whole-body Pose Unnaturalness (WPU) criterion.

@InProceedings{VATL4Pose_WACV24, author = {Taketsugu, Hiromu and Ukita, Norimichi}, title = {Active Transfer Learning for Efficient Video-Specific Human Pose Estimation}, booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year = {2024} }

Active Transfer Learning for Efficient Video-Specific Human Pose Estimation

Our Video-Specific Active Transfer Learning can efficiently improve Human Pose Estimation results. (left: pretrained-model, right: ours with 10% annotation)

Abstract

Overview of our Video-Specific Active Transfer Learning.

Method

Comparison of selected samples by existing methods (a) MPE, (b) Core-Set, and (c) Ours. Our method can effectively select uncertain and representative samples in a query video.

Uncertainty estimation results by the proposed Temporal Heatmap Continuity (THC) criterion.

An overview of the proposed Whole-body Pose Unnaturalness (WPU) criterion.

Video Presentation

Results

Results of the pre-trained model.

Results of our method (with 10% annotation).

Code

Paper

Poster

BibTeX