Pose Estimation

This repository presents the implementation of several Pose Estimation models, a core task in Computer Vision that enables machines to infer the position and orientation of humans, animals, or objects in images and videos by identifying specific points, commonly referred to as keypoints or landmarks. These keypoints can represent joints, limbs, facial features, or other distinctive parts.

Pose estimation methods generally follow two main approaches: bottom-up and top-down.

In the bottom-up approach, the model first detects all individual keypoints across the entire image using a probabilistic map (heatmap) to estimate the likelihood that each pixel corresponds to a specific keypoint. Non-maximum suppression is then applied to select the most confident candidates. This approach is efficient but often less accurate.
In the top-down approach, the model first detects bounding boxes for each instance and then predicts the keypoints within them. This method provides higher accuracy and scale invariance but is computationally expensive, especially as the number of instances increases.

Recent models such as YOLO11-pose combine the strengths of both approaches. By avoiding manual grouping and heatmap generation, they retain the efficiency of bottom-up methods, while simultaneously leveraging the precision of top-down pipelines by detecting instances and estimating poses in a single unified process.

Current Applications

Pose estimation has become a cornerstone in multiple domains, including:

Healthcare and rehabilitation: motion analysis for physiotherapy, remote patient monitoring, and detection of postural anomalies.
Sports and performance: athlete technique evaluation, automated repetition counting, and injury prevention via real-time posture correction.
Animal research: behavioral studies, species monitoring in the wild, and welfare assessment in farms or labs.
Human-computer interaction (HCI): gesture-based interfaces, augmented reality, and touchless controls.
Surveillance and safety: suspicious behavior recognition and fall detection in sensitive environments such as hospitals or care facilities.
Entertainment and media: motion capture, animation, and video games.
Industry and robotics: human-robot collaboration, ergonomics in assembly lines, and task assistance in manufacturing.

Implemented Models

All projects leverage transfer learning, fine-tuning pretrained models on large-scale datasets with frameworks such as TensorFlow, PyTorch, and Ultralytics.

Basic models were fine-tuned on single-class datasets with one instance per image, following a bottom-up, heatmap-based approach.
Advanced models (YOLO11-pose), designed for real-time applications, were trained on multi-class, multi-instance datasets.

Training is carried out in Google Colab using TPUs or GPUs, depending on project requirements.

All notebooks incorporate data augmentation to improve generalization, either manually with Albumentations or automatically (e.g., in YOLO11-pose). Additionally, callbacks and learning rate schedulers are used to prevent overfitting and enhance performance.

Below are the evaluation results of the models implemented so far. When validation or test sets were not publicly available, evaluations were performed only on the accessible split.

📊 Ultralytics Models

Dataset	Task	Model	$\text{mAP}^{\text{box}}_{50}$	$\text{mAP}^{\text{box}}_{50-95}$	$\text{mAP}^{\text{pose}}_{50}$	$\text{mAP}^{\text{pose}}_{50-95}$	Eval. Set
AP-10K	Multi-species animal pose estimation	YOLO11l-pose	0.951 / 0.938	0.799 / 0.788	0.901 / 0.874	0.589 / 0.575	Validation / Test
OpenThermalPose2	Human pose estimation	YOLO11l-pose	0.995 / 0.995	0.979 / 0.967	0.991 / 0.987	0.94 / 0.934	Validation / Test
OneHand10K	Hand pose estimation	YOLO11s-pose	0.995	0.816	0.954	0.519	Test

📊 Basic Models

Dataset	Task	Model	$\text{OKS}$	$\text{PCK@0.05}$	Eval. Set
CUB-200-2011	Animal pose estimation	ConvNeXt-Base U-Net	0.929	0.938	Test
COFW	Face landmark estimation	ConvNeXt-Base U-Net	-	0.957	Test

Visual Results on Multiple Datasets

AP-10K

OpenThermalPose2

HumanPoseEstimation_YOLO11l_OpenThermalPose2.mp4

OneHand10K

HandPoseEstimation_YOLO11s_OneHand10K_1.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
BasicModels		BasicModels
Preprocessing		Preprocessing
UltralyticsModels		UltralyticsModels
images_videos		images_videos
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pose Estimation

Current Applications