Human Pose Estimation Using LeYOLO-Nano Architecture

Authors

  • Vicky Lahimade Universitas Sam Ratulangi
  • Imanuel Kutika Universitas Sam Ratulangi
  • Tomi Todingan Universitas Sam Ratulangi
  • Vecky Poekoel Universitas Sam Ratulangi
  • Muhamad Dwisnanto Putro Universitas Sam Ratulangi

DOI:

https://doi.org/10.35793/jtek.v14i1.61391

Keywords:

Human pose estimation, keypoint detection, LeYOLO, real-time vision

Abstract

Abstract —Human pose estimation is essential in numerous practical applications, particularly in scenarios demanding fast processing and optimal resource usage, such as surveillance, human with computer interaction, and robotic systems. This technology aims to detect and analyze human body keypoints in images or videos, which is a critical step in understanding an individual's movements and behaviors. This article evaluates the performance of the original LeYOLO-Nano architecture, a lightweight variant of the YOLO architecture, in the application of human body keypoint detection for the purpose of pose estimation. Using MSCOCO2017 dataset, which includes a wide range of real-world conditions, this model achieved a mAP50 of 0.69 and a mAP50:95 of 0.362, demonstrating its ability to detect human poses with adequate accuracy. Moreover, the model is capable of handling data at a rate of 20.78 frames per second using a standard CPU, highlighting its effectiveness for real-time use on edge devices with restricted computing power. LeYOLO-Nano enables efficient human pose estimation on low-power devices, with the potential to further optimize speed and accuracy in real-world applications.

Key words—Human pose estimation, keypoint detection, LeYOLO, real-time vision

References

Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” May 30, 2019, arXiv: arXiv:1812.08008. doi: 10.48550/arXiv.1812.08008.

T. L. Munea, Y. Z. Jembre, H. T. Weldegebriel, L. Chen, C. Huang, and C. Yang, “The Progress of Human Pose Estimation: A Survey and Taxonomy of Models Applied in 2D Human Pose Estimation,” IEEE Access, vol. 8, pp. 133330–133348, 2020, doi: 10.1109/ACCESS.2020.3010248.

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929.

J. Hwang, J. Yang, and N. Kwak, “Exploring Rare Pose in Human Pose Estimation,” IEEE Access, vol. 8, pp. 194964–194977, 2020, doi: 10.1109/ACCESS.2020.3033531.

Y. Liu, “OpenPose-Based Yoga Pose Classification Using Convolutional Neural Network,” Highlights Sci. Eng. Technol., vol. 23, pp. 72–76, Dec. 2022, doi: 10.54097/hset.v23i.3130.

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” Apr. 13, 2018, arXiv: arXiv:1711.07971. doi: 10.48550/arXiv.1711.07971.

M. D. Putro, D.-L. Nguyen, and K.-H. Jo, “A Fast CPU Real-Time Facial Expression Detector Using Sequential Attention Network for Human–Robot Interaction,” IEEE Trans. Ind. Inform., vol. 18, no. 11, pp. 7665–7674, Nov. 2022, doi: 10.1109/TII.2022.3145862.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” May 09, 2016, arXiv: arXiv:1506.02640. doi: 10.48550/arXiv.1506.02640.

L. Hollard, L. Mohimont, N. Gaveau, and L. A. Steffenel, “LeYOLO, New Embedded Architecture for Object Detection,” Proc. Conf. Robots Vis., May 2025, doi: 10.21428/d82e957c.aed2cb06.

D. Maji, S. Nagori, M. Mathew, and D. Poddar, “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss,” Apr. 14, 2022, arXiv: arXiv:2204.06806. doi: 10.48550/arXiv.2204.06806.

F. Wang, G. Wang, and B. Lu, “YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection,” Electronics, vol. 13, no. 6, Art. no. 6, Jan. 2024, doi: 10.3390/electronics13061046.

A. Arif, Y. Yasin Ghadi, M. Alarfaj, A. Jalal, S. Kamal, and D.-S. Kim, “Human Pose Estimation and Object Interaction for Sports Behaviour,” Comput. Mater. Contin., vol. 72, no. 1, pp. 1–18, 2022, doi: 10.32604/cmc.2022.023553.

Y. Jiang, K. Yang, J. Zhu, and L. Qin, “YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation,” Electronics, vol. 13, no. 3, Art. no. 3, Jan. 2024, doi: 10.3390/electronics13030563.

C.-J. Chou, J.-T. Chien, and H.-T. Chen, “Self Adversarial Training for Human Pose Estimation,” Aug. 15, 2017, arXiv: arXiv:1707.02439. doi: 10.48550/arXiv.1707.02439.

J. Ou and H. Wu, “Efficient Human Pose Estimation with Depthwise Separable Convolution and Person Centroid Guided Joint Grouping,” Dec. 06, 2020, arXiv: arXiv:2012.03316. doi: 10.48550/arXiv.2012.03316.

J. Ding, S. Niu, Z. Nie, and W. Zhu, “Research on Human Posture Estimation Algorithm Based on YOLO-Pose,” Sensors, vol. 24, no. 10, Art. no. 10, Jan. 2024, doi: 10.3390/s24103036.

Y. Li, X. Wang, W. Liu, and B. Feng, “Pose Anchor: A Single-Stage Hand Keypoint Detection Network,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2104–2113, Jul. 2020, doi: 10.1109/TCSVT.2019.2912620.

X. Zhang, D. Zhang, J. Ge, K. Hu, L. Yang, and P. Chen, “Multi-stage Real-time Human Head Pose Estimation,” in 2019 6th International Conference on Systems and Informatics (ICSAI), Nov. 2019, pp. 563–567. doi: 10.1109/ICSAI48974.2019.9010492.

S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun, A Guide to Convolutional Neural Networks for Computer Vision. in Synthesis Lectures on Computer Vision. Cham: Springer International Publishing, 2018. doi: 10.1007/978-3-031-01821-3.

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 936–944. doi: 10.1109/CVPR.2017.106.

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 8759–8768. doi: 10.1109/CVPR.2018.00913.

A. Arif, Y. Yasin Ghadi, M. Alarfaj, A. Jalal, S. Kamal, and D.-S. Kim, “Human Pose Estimation and Object Interaction for Sports Behaviour,” Comput. Mater. Contin., vol. 72, no. 1, pp. 1–18, 2022, doi: 10.32604/cmc.2022.023553.

“COCO - Common Objects in Context.” Accessed: Jun. 06, 2025. [Online]. Available: https://cocodataset.org/#home

L. Hollard, L. Mohimont, N. Gaveau, and L. A. Steffenel, “LeYOLO, New Embedded Architecture for Object Detection,” Proc. Conf. Robots Vis., May 2025, doi: 10.21428/d82e957c.aed2cb06.

J. Ding, S. Niu, Z. Nie, and W. Zhu, “Research on Human Posture Estimation Algorithm Based on YOLO-Pose,” Sensors, vol. 24, no. 10, Art. no. 10, Jan. 2024, doi: 10.3390/s24103036.

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” Jul. 18, 2018, arXiv: arXiv:1807.06521. doi: 10.48550/arXiv.1807.06521.

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-Excitation Networks,” May 16, 2019, arXiv: arXiv:1709.01507. doi: 10.48550/arXiv.1709.01507.

Q. Hou, D. Zhou, and J. Feng, “Coordinate Attention for Efficient Mobile Network Design,” Mar. 04, 2021, arXiv: arXiv:2103.02907. doi: 10.48550/arXiv.2103.02907.

Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” Apr. 07, 2020, arXiv: arXiv:1910.03151. doi: 10.48550/arXiv.1910.03151.

Downloads

Published

2025-06-18