Technology is constantly advancing and becoming and more integrated into daily human life and it’s as important now as ever to comprehend human feelings. Therefore, the use of facial emotion detection and facial expression recognition technologies are some of the advancement at a forefront of this endeavor in an ability to interpret feelings through facial gestures. However, different facial expression detection is not limited to the above efforts. With regard to this, integrating emotion recognition from speech with emotion sensing facial re cognition, we can develop an integrated system that may reflect the humanity emotions attract as we know them.
This is a unique approach that builds on the complementary relationship between vocal intonations and body language in an attempt to uncover expanded emotional experiences. It ranges from applications in the field of mental health to enriching human interface with technology, the combination of these technologies will completely transform the concept of emotion recognition and intervention.
In our previous article, we have explained emotion detectionin the text which is very beneficial for mainly few use cases, you can read our previous article here. In this article I show to you Model using Tensorflow which can even identify your emotions by your picture or Live webcam.
This article was published as a part of the Data Science Blogathon.
Let’s dive straight into the implementation part of Facial Emotion Detection.
We will be using the dataset fer-2013
which is publically available on Kaggle. it has 48*48 pixels gray-scale images of faces along with their emotion labels.
This dataset contains 7 Emotions :- (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)
Start by importing pandas and some essential libraries and then loading the dataset.
Python Code:
import matplotlib.pyplot as plt
import numpy as np
import scipy
import pandas as pd
df = pd.read_csv('fer2013.csv')
print(df.head())
This dataset contains 3 columns, emotion, pixels and Usage. Emotion column contains integer encoded emotions and pixels column contains pixels in the form of a string seperated by spaces, and usage
tells if data is made for training or testing purpose.
You see data is not in the right format. we need to pre-process the data. Here X_train, X_test contains pixels, and y_test, y_train contains emotions.
X_train = []
y_train = []
X_test = []
y_test = []
for index, row in df.iterrows():
k = row['pixels'].split(" ")
if row['Usage'] == 'Training':
X_train.append(np.array(k))
y_train.append(row['emotion'])
elif row['Usage'] == 'PublicTest':
X_test.append(np.array(k))
y_test.append(row['emotion'])
At this stage X_train, X_test contains pixel’s number is in the form of a string, converting it into numbers is easy, we just need to typecast.
X_train = np.array(X_train, dtype = 'uint8')
y_train = np.array(y_train, dtype = 'uint8')
X_test = np.array(X_test, dtype = 'uint8')
y_test = np.array(y_test, dtype = 'uint8')
y_test, y_train contains 1D integer encoded labels, we need to connect them into categorical data for efficient training.
import keras
from keras.utils import to_categorical
y_train= to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)
num_classes = 7 shows that we have 7 classes to classify.
You need to convert the data in the form of a 4d tensor (row_num, width, height, channel) for training purposes.
X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)
Here 1 tells us that training data is in grayscale form, at this stage, we have successfully preprocessed our data into X_train, X_test, y_train, y_test.
The augmentation of images is applied in order to enhance the capabilities of the model and the degree of its non-specialization. Usually it is better to apply some sort of data augmentation before feeding it to the model which can be achieved with ImageDataGenetrator from Keras.
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
rotation_range = 10,
horizontal_flip = True,
width_shift_range=0.1,
height_shift_range=0.1,
fill_mode = 'nearest')
testgen = ImageDataGenerator(rescale=1./255)
datagen.fit(X_train)
batch_size = 64
On testing data, we will only apply rescaling(normalization).
We will use batch_size of 64 and after fitting our data to our image generator, data will be generated in the batch size of 64. Using a data generator is the best way to train a large amount of data.
train_flow = datagen.flow(X_train, y_train, batch_size=batch_size)
test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)
train_flow contains our X_train and y_train while test_flow contains our X_test and y_test.
Designing the CNN model for emotion detection using functional API. We are creating blocks using Conv2D layer, Batch-Normalization, Max-Pooling2D, Dropout, Flatten, and then stacking them together and at the end-use Dense Layer for output, you can read more on how to design CNN models.
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.optimizers import Adam, SGD
from keras.regularizers import l1, l2
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
FER_model takes input size and returns model for training. Now let’s define the architecture of the model.
def FER_Model(input_shape=(48,48,1)):
# first input model
visible = Input(shape=input_shape, name='input')
num_classes = 7
#the 1-st block
conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_1')(visible)
conv1_1 = BatchNormalization()(conv1_1)
conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_2')(conv1_1)
conv1_2 = BatchNormalization()(conv1_2)
pool1_1 = MaxPooling2D(pool_size=(2,2), name = 'pool1_1')(conv1_2)
drop1_1 = Dropout(0.3, name = 'drop1_1')(pool1_1)#the 2-nd block
conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_1')(drop1_1)
conv2_1 = BatchNormalization()(conv2_1)
conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_2')(conv2_1)
conv2_2 = BatchNormalization()(conv2_2)
conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_3')(conv2_2)
conv2_2 = BatchNormalization()(conv2_3)
pool2_1 = MaxPooling2D(pool_size=(2,2), name = 'pool2_1')(conv2_3)
drop2_1 = Dropout(0.3, name = 'drop2_1')(pool2_1)#the 3-rd block
conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_1')(drop2_1)
conv3_1 = BatchNormalization()(conv3_1)
conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_2')(conv3_1)
conv3_2 = BatchNormalization()(conv3_2)
conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_3')(conv3_2)
conv3_3 = BatchNormalization()(conv3_3)
conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_4')(conv3_3)
conv3_4 = BatchNormalization()(conv3_4)
pool3_1 = MaxPooling2D(pool_size=(2,2), name = 'pool3_1')(conv3_4)
drop3_1 = Dropout(0.3, name = 'drop3_1')(pool3_1)#the 4-th block
conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_1')(drop3_1)
conv4_1 = BatchNormalization()(conv4_1)
conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_2')(conv4_1)
conv4_2 = BatchNormalization()(conv4_2)
conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_3')(conv4_2)
conv4_3 = BatchNormalization()(conv4_3)
conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_4')(conv4_3)
conv4_4 = BatchNormalization()(conv4_4)
pool4_1 = MaxPooling2D(pool_size=(2,2), name = 'pool4_1')(conv4_4)
drop4_1 = Dropout(0.3, name = 'drop4_1')(pool4_1)
#the 5-th block
conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_1')(drop4_1)
conv5_1 = BatchNormalization()(conv5_1)
conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_2')(conv5_1)
conv5_2 = BatchNormalization()(conv5_2)
conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_3')(conv5_2)
conv5_3 = BatchNormalization()(conv5_3)
conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_4')(conv5_3)
conv5_3 = BatchNormalization()(conv5_3)
pool5_1 = MaxPooling2D(pool_size=(2,2), name = 'pool5_1')(conv5_4)
drop5_1 = Dropout(0.3, name = 'drop5_1')(pool5_1)#Flatten and output
flatten = Flatten(name = 'flatten')(drop5_1)
ouput = Dense(num_classes, activation='softmax', name = 'output')(flatten)# create model
model = Model(inputs =visible, outputs = ouput)
# summary layers
print(model.summary())
return model
Compiling model using Adam optimizer keeping lr= 0.001, if the model’s accuracy doesn’t improve after some epochs learning rate decreases by decay factor.
model = FER_Model()
opt = Adam(lr=0.0001, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
To train the model you need to write the following line of code.
num_epochs = 100
history = model.fit_generator(train_flow,
steps_per_epoch=len(X_train) / batch_size,
epochs=num_epochs,
verbose=1,
validation_data=test_flow,validation_steps=len(X_test) / batch_size)
=
TotalTrainingSamples / TrainingBatchSize
TotalvalidationSamples / ValidationBatchSizeTraining takes at least 20 minutes for 100 epochs.
Saving our model’s architecture into JSON and model’s weight into .h5.
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")
Download the saved model and weights in a directory.
In this part, we will test our model in real-time using face detection.
Let’s start by loading the trained model architecture and weights so that it can be used further to make predictions.
from tensorflow.keras.models import model_from_json
model = model_from_json(open("model_arch.json", "r").read())
model.load_weights('model.h5')
We are using Haar-cascade for the detection position of faces and after getting position we will crop the faces.
haarcascade_frontalface_default can be downloaded using the link.
import cv2
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
Use OpenCV to read frames and for image processing.
cap=cv2.VideoCapture(0)while cap.isOpened():
res,frame=cap.read()height, width , channel = frame.shapegray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_haar_cascade.detectMultiScale(gray_image )
try:
for (x,y, w, h) in faces:
cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2)
roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
roi_gray=cv2.resize(roi_gray,(48,48))
image_pixels = img_to_array(roi_gray)
image_pixels = np.expand_dims(image_pixels, axis = 0)
image_pixels /= 255
predictions = model.predict(image_pixels)
max_index = np.argmax(predictions[0])
emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
emotion_prediction = emotion_detection[max_index]
takes only grayscale images.Adding an overlay on the output frame and displaying the prediction with confidence gives a better look.
cap=cv2.VideoCapture(1)while cap.isOpened():
res,frame=cap.read()height, width , channel = frame.shape#---------------------------------------------------------------------------
# Creating an Overlay window to write prediction and cofidencesub_img = frame[0:int(height/6),0:int(width)]black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0
res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
FONT = cv2.FONT_HERSHEY_SIMPLEX
FONT_SCALE = 0.8
FONT_THICKNESS = 2
lable_color = (10, 10, 255)
lable = "Emotion Detection made by Abhishek"
lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
textX = int((res.shape[1] - lable_dimension[0]) / 2)
textY = int((res.shape[0] + lable_dimension[1]) / 2)
cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)# prediction part --------------------------------------------------------------------------gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_haar_cascade.detectMultiScale(gray_image )
try:
for (x,y, w, h) in faces:
cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2)
roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
roi_gray=cv2.resize(roi_gray,(48,48))
image_pixels = img_to_array(roi_gray)
image_pixels = np.expand_dims(image_pixels, axis = 0)
image_pixels /= 255
predictions = model.predict(image_pixels)
max_index = np.argmax(predictions[0])
emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
emotion_prediction = emotion_detection[max_index]
cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5), FONT,0.7, lable_color,2)
lable_violation = 'Confidence: {}'.format(str(np.round(np.max(predictions[0])*100,1))+ "%")
violation_text_dimension = cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7, lable_color,2)
except :
pass
frame[0:int(height/6),0:int(width)] = res
cv2.imshow('frame', frame)if cv2.waitKey(1) & 0xFF == ord('q'):
breakcap.release()
cv2.destroyAllWindows
Now run it !!!
Facial emotion recognition and detection is a powerful tool that leverages advanced deep learning techniques to identify emotions from facial expressions. With applications in various fields, such as real-time emotion sensing facial recognition and emotion recognition using speech, it plays a pivotal role in improving human-computer interactions. GitHub repositories make it easy for developers to access and experiment with face emotion recognition models, while ongoing advancements, such as EEG emotion detection, continue to push the boundaries of emotion sensing technologies. Combining these approaches opens the door to more accurate, multi-modal emotion recognition systems for a range of applications.
Download source codes from here.
A. Facial emotion detection is the process of using computer vision to analyze facial features and recognize emotions from images or videos.
A. Facial expression recognition identifies specific facial movements (like a smile or frown), while facial emotion detection interprets these expressions to determine emotions.
A. Emotion sensing facial recognition combines facial recognition technology with emotion detection, allowing systems to identify a person and their emotional state simultaneously.
A. Yes, integrating emotion recognition from speech with facial emotion recognition enhances the accuracy of determining someone’s emotional state by analyzing both voice and facial cues.
A. Facial emotion recognition and detection use Convolutional Neural Networks (CNNs) to analyze facial features and classify emotions based on trained data, such as the FER-2013 dataset.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
I am interested in using this model for a non-profit for detecting Dementia patients emotions during Reminiscence Therapy. What are the next steps to proceed on this? Please advise.