Facial Emotion Detection Using CNN

Gunjan Agarwal Last Updated : 16 Oct, 2024
9 min read

Introduction

Technology is constantly advancing and becoming and more integrated into daily human life and it’s as important now as ever to comprehend human feelings. Therefore, the use of facial emotion detection and facial expression recognition technologies are some of the advancement at a forefront of this endeavor in an ability to interpret feelings through facial gestures. However, different facial expression detection is not limited to the above efforts. With regard to this, integrating emotion recognition from speech with emotion sensing facial re cognition, we can develop an integrated system that may reflect the humanity emotions attract as we know them.

This is a unique approach that builds on the complementary relationship between vocal intonations and body language in an attempt to uncover expanded emotional experiences. It ranges from applications in the field of mental health to enriching human interface with technology, the combination of these technologies will completely transform the concept of emotion recognition and intervention.

In our previous article, we have explained emotion detectionin the text which is very beneficial for mainly few use cases, you can read our previous article here. In this article I show to you Model using Tensorflow which can even identify your emotions by your picture or Live webcam.

Facial Emotion Detection Using CNN

Learning Outcomes

  • Understanding the concepts and ush of facial emotion recognition in artificial intelligent systems.
  • Learn the methods involved in the detection and analysis of facial expressions together with their role in human and computer interaction.
  • Discuss the processes used in face landmarks recognition for reliable emotions recognition.
  • Find out how emotion sensing facial recognition improves user experience in different interfaces.
  • Learn about the benefits of speech-based emotion recognition extending the facial emotion recognition approach.
  • Critically discuss the applicability of emotion recognition when using facial expressions in varied areas, including medicine and defense.
  • Understand how to apply preprocessing techniques using OpenCV for live feed testing.

This article was published as a part of the Data Science Blogathon.

Getting Started with Facial Emotion Detection

Let’s dive straight into the implementation part of Facial Emotion Detection.

Getting Data

We will be using the dataset fer-2013 which is publically available on Kaggle. it has 48*48 pixels gray-scale images of faces along with their emotion labels.

This dataset contains 7 Emotions :- (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)

Start by importing pandas and some essential libraries and then loading the dataset.

Python Code:

import matplotlib.pyplot as plt
import numpy as np
import scipy
import pandas as pd
df = pd.read_csv('fer2013.csv')
print(df.head())

This dataset contains 3 columns, emotion, pixels and Usage. Emotion column contains integer encoded emotions and pixels column contains pixels in the form of a string seperated by spaces, and usage
tells if data is made for training or testing purpose.

Preparing Data

You see data is not in the right format. we need to pre-process the data. Here X_train, X_test contains pixels, and y_testy_train contains emotions.

X_train = []
y_train = []
X_test = []
y_test = []
for index, row in df.iterrows():
    k = row['pixels'].split(" ")
    if row['Usage'] == 'Training':
        X_train.append(np.array(k))
        y_train.append(row['emotion'])
    elif row['Usage'] == 'PublicTest':
        X_test.append(np.array(k))
        y_test.append(row['emotion'])
Output

At this stage X_train, X_test contains pixel’s number is in the form of a string, converting it into numbers is easy, we just need to typecast.

X_train = np.array(X_train, dtype = 'uint8')
y_train = np.array(y_train, dtype = 'uint8')
X_test = np.array(X_test, dtype = 'uint8')
y_test = np.array(y_test, dtype = 'uint8')

y_test, y_train contains 1D integer encoded labels, we need to connect them into categorical data for efficient training.

import keras
from keras.utils import to_categorical
y_train= to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)

num_classes = 7 shows that we have 7 classes to classify.

Reshaping Data

You need to convert the data in the form of a 4d tensor (row_num, width, height, channel) for training purposes.

X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)

Here 1 tells us that training data is in grayscale form, at this stage, we have successfully preprocessed our data into X_train, X_test, y_trainy_test.

Image Augmentation for Facial Emotion Detection

The augmentation of images is applied in order to enhance the capabilities of the model and the degree of its non-specialization. Usually it is better to apply some sort of data augmentation before feeding it to the model which can be achieved with ImageDataGenetrator from Keras.

from keras.preprocessing.image import ImageDataGenerator 
datagen = ImageDataGenerator( 
    rescale=1./255,
    rotation_range = 10,
    horizontal_flip = True,
    width_shift_range=0.1,
    height_shift_range=0.1,
    fill_mode = 'nearest')
testgen = ImageDataGenerator(rescale=1./255)
datagen.fit(X_train)
batch_size = 64
  • rescale: It normalizes the pixel value by dividing it by 255.
  • horizontal_flip: It flips the image horizontally.
  • fill_mode: It fills the image if not available after some cropping.
  • rotation_range: It rotates the image by 0–90 degrees.

On testing data, we will only apply rescaling(normalization).

Fitting the Generator to Our Data

We will use batch_size of 64 and after fitting our data to our image generator, data will be generated in the batch size of 64. Using a data generator is the best way to train a large amount of data.

train_flow = datagen.flow(X_train, y_train, batch_size=batch_size) 
test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)

train_flow contains our X_train and y_train while test_flow contains our X_test and y_test.

Building Facial Emotion Detection Model using CNN

Designing the CNN model for emotion detection using functional API. We are creating blocks using Conv2D layer, Batch-Normalization, Max-Pooling2D, Dropout, Flatten, and then stacking them together and at the end-use Dense Layer for output, you can read more on how to design CNN models

from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.optimizers import Adam, SGD
from keras.regularizers import l1, l2
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix

FER_model takes input size and returns model for training. Now let’s define the architecture of the model.

def FER_Model(input_shape=(48,48,1)):
    # first input model
    visible = Input(shape=input_shape, name='input')
    num_classes = 7
    #the 1-st block
    conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_1')(visible)
    conv1_1 = BatchNormalization()(conv1_1)
    conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_2')(conv1_1)
    conv1_2 = BatchNormalization()(conv1_2)
    pool1_1 = MaxPooling2D(pool_size=(2,2), name = 'pool1_1')(conv1_2)
    drop1_1 = Dropout(0.3, name = 'drop1_1')(pool1_1)#the 2-nd block
    conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_1')(drop1_1)
    conv2_1 = BatchNormalization()(conv2_1)
    conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_2')(conv2_1)
    conv2_2 = BatchNormalization()(conv2_2)
    conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_3')(conv2_2)
    conv2_2 = BatchNormalization()(conv2_3)
    pool2_1 = MaxPooling2D(pool_size=(2,2), name = 'pool2_1')(conv2_3)
    drop2_1 = Dropout(0.3, name = 'drop2_1')(pool2_1)#the 3-rd block
    conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_1')(drop2_1)
    conv3_1 = BatchNormalization()(conv3_1)
    conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_2')(conv3_1)
    conv3_2 = BatchNormalization()(conv3_2)
    conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_3')(conv3_2)
    conv3_3 = BatchNormalization()(conv3_3)
    conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_4')(conv3_3)
    conv3_4 = BatchNormalization()(conv3_4)
    pool3_1 = MaxPooling2D(pool_size=(2,2), name = 'pool3_1')(conv3_4)
    drop3_1 = Dropout(0.3, name = 'drop3_1')(pool3_1)#the 4-th block
    conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_1')(drop3_1)
    conv4_1 = BatchNormalization()(conv4_1)
    conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_2')(conv4_1)
    conv4_2 = BatchNormalization()(conv4_2)
    conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_3')(conv4_2)
    conv4_3 = BatchNormalization()(conv4_3)
    conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_4')(conv4_3)
    conv4_4 = BatchNormalization()(conv4_4)
    pool4_1 = MaxPooling2D(pool_size=(2,2), name = 'pool4_1')(conv4_4)
    drop4_1 = Dropout(0.3, name = 'drop4_1')(pool4_1)
    
    #the 5-th block
    conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_1')(drop4_1)
    conv5_1 = BatchNormalization()(conv5_1)
    conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_2')(conv5_1)
    conv5_2 = BatchNormalization()(conv5_2)
    conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_3')(conv5_2)
    conv5_3 = BatchNormalization()(conv5_3)
    conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_4')(conv5_3)
    conv5_3 = BatchNormalization()(conv5_3)
    pool5_1 = MaxPooling2D(pool_size=(2,2), name = 'pool5_1')(conv5_4)
    drop5_1 = Dropout(0.3, name = 'drop5_1')(pool5_1)#Flatten and output
    flatten = Flatten(name = 'flatten')(drop5_1)
    ouput = Dense(num_classes, activation='softmax', name = 'output')(flatten)# create model 
    model = Model(inputs =visible, outputs = ouput)
    # summary layers
    print(model.summary())
    
    return model

Compiling the Facial Emotion Detection Model

Compiling model using Adam optimizer keeping lr= 0.001, if the model’s accuracy doesn’t improve after some epochs learning rate decreases by decay factor.

model = FER_Model()
opt = Adam(lr=0.0001, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

Training the Facial Emotion Detection Model

To train the model you need to write the following line of code.

num_epochs = 100  
history = model.fit_generator(train_flow, 
                    steps_per_epoch=len(X_train) / batch_size, 
                    epochs=num_epochs,  
                    verbose=1,  
                    validation_data=test_flow,validation_steps=len(X_test) / batch_size)
  • steps_per_epoch = TotalTrainingSamples / TrainingBatchSize
  • validation_steps = TotalvalidationSamples / ValidationBatchSize

Training takes at least 20 minutes for 100 epochs.

Training the Facial Emotion Detection Model

Saving the Model

Saving our model’s architecture into JSON and model’s weight into .h5.

model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")

Download the saved model and weights in a directory.

Testing the Model using Webcam Feed

In this part, we will test our model in real-time using face detection.

Loading the Saved Model

Let’s start by loading the trained model architecture and weights so that it can be used further to make predictions.

from tensorflow.keras.models import model_from_json
model = model_from_json(open("model_arch.json", "r").read())
model.load_weights('model.h5')

Loading Har-Cascade for Face Detection

We are using Haar-cascade for the detection position of faces and after getting position we will crop the faces.

haarcascade_frontalface_default can be downloaded using the link.

import cv2
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

Read Frames and Apply Preprocessing using OpenCV

Use OpenCV to read frames and for image processing.

cap=cv2.VideoCapture(0)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shapegray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
    try:
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            roi_gray=cv2.resize(roi_gray,(48,48))
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
  • here emotion_prediction returns the label of emotion.
  • normalize test images by dividing them by 255.
  • np.expand_dims convert a 3D matrix into a 4D tensor.
  • (x,y,w,h) are the coordinates of faces in the input frame.
  • haar_cascade takes only grayscale images.

Adding Overlay

Adding an overlay on the output frame and displaying the prediction with confidence gives a better look.

cap=cv2.VideoCapture(1)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shape#---------------------------------------------------------------------------
    # Creating an Overlay window to write prediction and cofidencesub_img = frame[0:int(height/6),0:int(width)]black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0
    res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
    FONT = cv2.FONT_HERSHEY_SIMPLEX
    FONT_SCALE = 0.8
    FONT_THICKNESS = 2
    lable_color = (10, 10, 255)
    lable = "Emotion Detection made by Abhishek"
    lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
    textX = int((res.shape[1] - lable_dimension[0]) / 2)
    textY = int((res.shape[0] + lable_dimension[1]) / 2)
    cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)# prediction part --------------------------------------------------------------------------gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
    try:
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            roi_gray=cv2.resize(roi_gray,(48,48))
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
            cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5), FONT,0.7, lable_color,2)
            lable_violation = 'Confidence: {}'.format(str(np.round(np.max(predictions[0])*100,1))+ "%")
            violation_text_dimension = cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
            violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
            cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7, lable_color,2)
    except :
        pass
    frame[0:int(height/6),0:int(width)] = res
    cv2.imshow('frame', frame)if cv2.waitKey(1) & 0xFF == ord('q'):
        breakcap.release()
cv2.destroyAllWindows

Now run it !!!

Conclusion

Facial emotion recognition and detection is a powerful tool that leverages advanced deep learning techniques to identify emotions from facial expressions. With applications in various fields, such as real-time emotion sensing facial recognition and emotion recognition using speech, it plays a pivotal role in improving human-computer interactions. GitHub repositories make it easy for developers to access and experiment with face emotion recognition models, while ongoing advancements, such as EEG emotion detection, continue to push the boundaries of emotion sensing technologies. Combining these approaches opens the door to more accurate, multi-modal emotion recognition systems for a range of applications.

Download source codes from here.

Key Takeaways

  • Facial Emotion Detection uses computer vision to identify human emotions from images.
  • FER-2013 Dataset publicly available dataset containing grayscale images for emotion classification.
  • Data Preprocessing reshaping and normalizing data is crucial for training the model.
  • Image Augmentation enhances model performance with techniques like rotation and horizontal flips.
  • CNN Model Architecture employs Conv2D, MaxPooling, and Dropout layers for emotion classification.
  • TensorFlow is used to train and test the CNN model on facial emotion data.
  • Real-time Detection can implement webcam integration for live emotion detection.

Frequently Asked Questions

Q1. What is facial emotion detection?

A. Facial emotion detection is the process of using computer vision to analyze facial features and recognize emotions from images or videos.

Q2. How does facial expression recognition differ from facial emotion detection?

A. Facial expression recognition identifies specific facial movements (like a smile or frown), while facial emotion detection interprets these expressions to determine emotions.

Q3. What is emotion sensing facial recognition?

A. Emotion sensing facial recognition combines facial recognition technology with emotion detection, allowing systems to identify a person and their emotional state simultaneously.

Q4. Can emotion recognition from speech be combined with facial emotion recognition?

A. Yes, integrating emotion recognition from speech with facial emotion recognition enhances the accuracy of determining someone’s emotional state by analyzing both voice and facial cues.

Q5. How does facial emotion recognition and detection work using deep learning?

A. Facial emotion recognition and detection use Convolutional Neural Networks (CNNs) to analyze facial features and classify emotions based on trained data, such as the FER-2013 dataset.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Applied Machine Learning Engineer skilled in Computer Vision/Deep Learning Pipeline Development, creating machine learning models, retraining systems and transforming data science prototypes to production-grade solutions. Consistently optimizes and improves real-time systems by evaluating strategies and testing on real world scenarios.

Responses From Readers

Clear

Neelesh Ajmani
Neelesh Ajmani

I am interested in using this model for a non-profit for detecting Dementia patients emotions during Reminiscence Therapy. What are the next steps to proceed on this? Please advise.

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details