Hand landmarks detection on an image using Mediapipe

Aman Preet Last Updated : 01 Mar, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Overview

In this article, we will be making hands landmarks detection model with the profound library i.e. mediapipe as the base library, and for other computer vision preprocessing CV2 libraries. There are many use cases in the market for this problem statement whether it’s for business-related virtual reality or in the gaming section for the real-time experience.

Industry use cases

Smart homes: This is one of the modern use cases of computer vision and people are leveraged to live a more comfortable life and that’s why it’s no longer the niche area it is spreading to the common households as well.
Smart TVs: We have often seen this use case wherewith your hand moments one can change the volume, change the channels, and whatnot.
Games: For the real experience, this technology is getting more into interactive gaming.

Let’s build our hand detection model

Import the Libraries

Here we will import all the libraries that will be required in the whole pipeline.

import cv2
import numpy as np
import mediapipe as mp
import matplotlib.pyplot as plt

Initializing the hand’s landmarks detection model using Mediapipe

Whenever we talk about the detection whether it is an object, person, animal, or as in our case hands detection then the very first step is to initialize the model with valid parameters no matter what detection technique we are following it can either be Mediapipe or Yolo but initializing the model is important, following the same principle we will be following all the given steps:

# First step is to initialize the Hands class an store it in a variable
mp_hands = mp.solutions.hands

# Now second step is to set the hands function which will hold the landmarks points
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.3)

# Last step is to set up the drawing function of hands landmarks on the image
mp_drawing = mp.solutions.drawing_utils

Code-breakdown:

Firstly initializing the class of hands by mp.solutions.hands with a variable.
Then using the same variable setting up the function for hands by mp.solutions.hands.Hands().

By far we understood the structure of hands model initialization now let’s deep dive into the arguments used in hands the function.

static_image_mode: This argument takes up the boolean value as its valid value i.e. it can be either True or False. the by default condition remains the False condition which remains for the video streaming when I say video streaming then that means it results in the lower latency of processing i.e. it keeps on focusing on particular hands and localizes the same hands until it loses the track of that particular hand which can be beneficial when we have to detect the hands in live stream or videos but according to our requirement we have to detect the landmarks on image hence we will set the value to True.

max_num_hands: This argument will indicate the maximum number of hands that the model will detect at one instance. By default, the value is 2 which also make sense that at least we will want a pair of hand to be detected though we can change that for sure.

min_detection_confidence: This argument provides us the flexibility that how much rigidity we want from our detection model and in that case it provides the threshold value of the level of confidence. The ideal range of minimum detection confidence is [0.0,1.0] and by default, it remains 0.5 which means that if the confidence level drops below 50% then the hands will not be detected at all in the output image.

At last, we will be using the mp.solutions.drawing_utils which will be responsible to draw all the hand’s landmarks points on the output image which were detected by our Hands function.

Read an Image

Here we will be first use the cv2.imread() to read the image on which hands detection is to be performed and matplotlib library to show that particular input image.

# Reading the sample image on which we will perform the detection
sample_img = cv2.imread('media/sample.jpg')

# Here we are specifing the size of the figure i.e. 10 -height; 10- width.
plt.figure(figsize = [10, 10])

# Here we will display the sample image as the output.
plt.title("Sample Image");plt.axis('off');plt.imshow(sample_img[:,:,::-1]);plt.show()

Output:

Perform Hands Landmarks Detection

So as we have initialized our hand detection model now our next step will be to process the hand landmarks detection on the input image and draw all the 21 landmarks points on that image with the above-initialized model for doing that we will be going through the following steps.

results = hands.process(cv2.cvtColor(sample_img, cv2.COLOR_BGR2RGB))

if results.multi_hand_landmarks:
    
   for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        print(f'HAND NUMBER: {hand_no+1}')
        print('-----------------------')
        
        for i in range(2):
            print(f'{mp_hands.HandLandmark(i).name}:')
            print(f'{hand_landmarks.landmark[mp_hands.HandLandmark(i).value]}')

Output:

Code-breakdown:

In the very first step, we are using the process function from the Mediapipe library to store the hand landmarks detection results in the variable along with that we have converted the image from the BGR format to the RGB format.
While coming to the next step we will first check for some validation that whether the points are detected or not i.e. the variable should have some results in it.
If yes, then we will loop through all the points that were detected in the image which has the hand’s landmarks points.
Now in the other loop, we can see that there are only 2 iterations because we only want to show 2 landmarks of the hands.
At the last, we will print out all the landmarks point that is detected and filtered out according to the requirements.

From the above processing, we have encountered that all the landmarks which were detected are normalized to the common scales but now for the user end those scaled points are not relevant for that reason we will bring back those landmarks to the original state.

image_height, image_width, _ = sample_img.shape

if results.multi_hand_landmarks:

    for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        
        print(f'HAND NUMBER: {hand_no+1}')
        print('-----------------------')
        
        for i in range(2):    
            print(f'{mp_hands.HandLandmark(i).name}:') 
            print(f'x: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].x * image_width}')
            print(f'y: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].y * image_height}')
            print(f'z: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].z * image_width}n')

Output:

Code-breakdown:

There is only one extra step that we need to perform here i.e. we will attain the original width and height of the image from the sample image that we defined and then all the steps will be the same as we have done previously only difference in the output will be that now the landmarks points are not scaled specifically.

Draw the landmarks on the image

As we have got the hands landmarks from the above preprocessing now it’s time to execute our final step which is to draw the points on the image so that we can visually see how our hand landmarks detection model is performing.

img_copy = sample_img.copy()

if results.multi_hand_landmarks:

    for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        
        mp_drawing.draw_landmarks(image = img_copy, landmark_list = hand_landmarks,
                                  connections = mp_hands.HAND_CONNECTIONS)
    fig = plt.figure(figsize = [10, 10])

    plt.title("Resultant Image");plt.axis('off');plt.imshow(img_copy[:,:,::-1]);plt.show()

Output:

Code-breakdown:

First, we will create a copy of the original image, this step is done for safety purposes as we don’t want to lose the originality of the image.
Then we will take care of the validation thing which we did previously as well.
Then we will loop through each landmark of hands.
Finally, with the help of mp_drawing.draw_landmarks function, we will draw the landmarks on the image.
It’s time to plot the image using matplotlib so first, we will give the figure size (here, width-10 and height-10), then at last plot, the image using imshow function after converting the BGR format into RGB format because for the end-user RGB format makes more sense.

Conclusion

In the complete pipeline, we first have initialise the model then we read the image to have a look at our input image later for the preprocessing we have scaled down the landmarks point yet those points are not relevant for the user to end so for that we have reverted it to the original state at the last we will draw the landmarks points on the image.

Endnotes

Here’s the repo link to this article. Hope you liked my article on Hands landmarks detection using Mediapipe. If you have any opinions or questions, then comment below.

Read on AV Blog about various predictions using Machine Learning.

About Me

Greeting to everyone, I’m currently working in TCS and previously, I worked as a Data Science Analyst in Zorba Consulting India. Along with full-time work, I’ve got an immense interest in the same field, i.e. Data Science, along with its other subsets of Artificial Intelligence such as Computer Vision, Machine Learning, and Deep learning; feel free to collaborate with me on any project on the domains mentioned above (LinkedIn).

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Aman Preet

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Hand landmarks detection on an image using Mediapipe

Overview

Industry use cases

Import the Libraries

Initializing the hand’s landmarks detection model using Mediapipe

Read an Image

Perform Hands Landmarks Detection

Draw the landmarks on the image

Conclusion

Endnotes

About Me

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap