Skip to content

JersonGB22/PoseEstimation-TensorFlow-PyTorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pose Estimation

This repository presents the implementation of several Pose Estimation models, a core task in Computer Vision that enables machines to infer the position and orientation of humans, animals, or objects in images and videos by identifying specific points, commonly referred to as keypoints or landmarks. These keypoints can represent joints, limbs, facial features, or other distinctive parts.

Pose estimation methods generally follow two main approaches: bottom-up and top-down.

  • In the bottom-up approach, the model first detects all individual keypoints across the entire image using a probabilistic map (heatmap) to estimate the likelihood that each pixel corresponds to a specific keypoint. Non-maximum suppression is then applied to select the most confident candidates. This approach is efficient but often less accurate.
  • In the top-down approach, the model first detects bounding boxes for each instance and then predicts the keypoints within them. This method provides higher accuracy and scale invariance but is computationally expensive, especially as the number of instances increases.

Recent models such as YOLO11-pose combine the strengths of both approaches. By avoiding manual grouping and heatmap generation, they retain the efficiency of bottom-up methods, while simultaneously leveraging the precision of top-down pipelines by detecting instances and estimating poses in a single unified process.

Current Applications

Pose estimation has become a cornerstone in multiple domains, including:

  • Healthcare and rehabilitation: motion analysis for physiotherapy, remote patient monitoring, and detection of postural anomalies.
  • Sports and performance: athlete technique evaluation, automated repetition counting, and injury prevention via real-time posture correction.
  • Animal research: behavioral studies, species monitoring in the wild, and welfare assessment in farms or labs.
  • Human-computer interaction (HCI): gesture-based interfaces, augmented reality, and touchless controls.
  • Surveillance and safety: suspicious behavior recognition and fall detection in sensitive environments such as hospitals or care facilities.
  • Entertainment and media: motion capture, animation, and video games.
  • Industry and robotics: human-robot collaboration, ergonomics in assembly lines, and task assistance in manufacturing.

Implemented Models

All projects leverage transfer learning, fine-tuning pretrained models on large-scale datasets with frameworks such as TensorFlow, PyTorch, and Ultralytics.

  • Basic models were fine-tuned on single-class datasets with one instance per image, following a bottom-up, heatmap-based approach.
  • Advanced models (YOLO11-pose), designed for real-time applications, were trained on multi-class, multi-instance datasets.

Training is carried out in Google Colab using TPUs or GPUs, depending on project requirements.

All notebooks incorporate data augmentation to improve generalization, either manually with Albumentations or automatically (e.g., in YOLO11-pose). Additionally, callbacks and learning rate schedulers are used to prevent overfitting and enhance performance.

Below are the evaluation results of the models implemented so far. When validation or test sets were not publicly available, evaluations were performed only on the accessible split.

πŸ“Š Ultralytics Models

Dataset Task Model $\text{mAP}^{\text{box}}_{50}$ $\text{mAP}^{\text{box}}_{50-95}$ $\text{mAP}^{\text{pose}}_{50}$ $\text{mAP}^{\text{pose}}_{50-95}$ Eval. Set
AP-10K Multi-species animal pose estimation YOLO11l-pose 0.951 / 0.938 0.799 / 0.788 0.901 / 0.874 0.589 / 0.575 Validation / Test
OpenThermalPose2 Human pose estimation YOLO11l-pose 0.995 / 0.995 0.979 / 0.967 0.991 / 0.987 0.94 / 0.934 Validation / Test
OneHand10K Hand pose estimation YOLO11s-pose 0.995 0.816 0.954 0.519 Test

πŸ“Š Basic Models

Dataset Task Model $\text{OKS}$ $\text{PCK@0.05}$ Eval. Set
CUB-200-2011 Animal pose estimation ConvNeXt-Base U-Net 0.929 0.938 Test
COFW Face landmark estimation ConvNeXt-Base U-Net - 0.957 Test

Visual Results on Multiple Datasets

AP-10K


OpenThermalPose2

HumanPoseEstimation_YOLO11l_OpenThermalPose2.mp4

OneHand10K

HandPoseEstimation_YOLO11s_OneHand10K_1.mp4

CUB-200-2011


COFW

More results can be found in the respective notebooks.

Technological Stack

Python TensorFlow PyTorch Ultralytics

OpenCV Pandas Plotly

Contact

Gmail LinkedIn GitHub