Face Detection & Emotion Recognition
Face Detection & Emotion Recognition
PROJECT REPORT ON
‘FACE DETECTION AND EMOTION RECOGNITION’
SUBMITTED BY
NISHU TIWARI 170420117053
SUBJECT
Project -2
PROJECT GUIDE
PROF. NANDKISHOR JOSHI
DEPARTMENT
INSTRUMENTATION AND CONTROL
Introduction
• A face is detected and identified within an acquired digital image.
• One or more features of the face is/are extracted from the digital image, including two
independent eyes or subsets of features of each of the two eyes, or lips or partial lips or one or
more other mouth features and one or both eyes, or both.
• A model including multiple shape parameters is applied to the two independent eyes or
subsets of features of each of the two eyes, and/or to the lips or partial lips or one or more
other mouth features and one or both eyes.
• One or more similarities between the one or more features of the face and a library of
reference feature sets is/are determined.
• A probable facial expression is identified based on the determining of the one or more
similarities.
• “Perhaps the most compelling argument for [facial recognition software] is that it can make
law enforcement more efficient,” Shannon Togawa Mercer and Ashley Deeks write on Lawfare.
Face Detection
We have used Keras as a tool for detecting emotions through webcam using python.
LIBRARIES AND APPLICATIONS USED
• Keras - Keras is a neural networks library written in Python that is high-level in nature –
which makes it extremely simple and intuitive to use. It works as a wrapper to low-level
libraries like TensorFlow or Theano high-level neural networks library, written
in Python that works as a wrapper to TensorFlow or Theano.
• Tensorflow - TensorFlow is the open-source library for a number of various tasks in
machine learning. TensorFlow provides both high-level and low-level APIs.
• Face_emotion_recognition – fer in csv file
• MTCNN - MTCNN (Multi-task Cascaded Convolutional Neural Networks) is an
algorithm consisting of 3 stages, which detects the bounding boxes of faces in an image
along with their 5 Point Face Landmarks .
EPOCH
• One Epoch is when an entire dataset passes forward and backward through the
neural network only ONCE. Since one epoch is too big to feed to the computer at
once we divide it in several small batches.
Code for emotion recognition:
•# print(df.info())
•# print(df["Usage"].value_counts())
•# print(df.head())
Emotion recognition.py •X_train,train_y,X_test,test_y=[],[],[],[]
•import sys, os •
•import pandas as pd •for index, row in df.iterrows():
•import numpy as np • val=row['pixels'].split(" ")
• • try:
•from keras.models import Sequential • if 'Training' in row['Usage']:
•from keras.layers import Dense, Dropout, Activation, Flatten • X_train.append(np.array(val,'float32'))
• train_y.append(row['emotion'])
•from keras.layers import Conv2D, MaxPooling2D, BatchNormalizati
on,AveragePooling2D • elif 'PublicTest' in row['Usage']:
•from keras.losses import categorical_crossentropy • X_test.append(np.array(val,'float32'))
•from keras.optimizers import Adam • test_y.append(row['emotion'])
• except:
•from keras.regularizers import l2
• print(f"error occured at index :{index} and row:{row}")
•from keras.utils import np_utils
•num_features = 64
•# pd.set_option('display.max_rows', 500)
•num_labels = 7
•# pd.set_option('display.max_columns', 500)
•batch_size = 64
•# pd.set_option('display.width', 1000)
•epochs = 30
• •width, height = 48, 48
fileName = input("fer2013.csv")
•fileScan = open(fer2013.csv, 'r')
•df=pd.read_csv('fer2013.csv')
•X_train = np.array(X_train,'float32') •#2nd convolution layer
•train_y = np.array(train_y,'float32') •model.add(Conv2D(64, (3, 3), activation='relu'))
•X_test = np.array(X_test,'float32') •model.add(Conv2D(64, (3, 3), activation='relu'))
•test_y = np.array(test_y,'float32') •# model.add(BatchNormalization())
•train_y=np_utils.to_categorical(train_y, num_classes=num_labels) •model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2)))
•test_y=np_utils.to_categorical(test_y, num_classes=num_labels) •model.add(Dropout(0.5))
• #cannot produce •#3rd convolution layer
•#normalizing data between oand 1 •model.add(Conv2D(128, (3, 3), activation='relu'))
•X_train -= np.mean(X_train, axis=0) •model.add(Conv2D(128, (3, 3), activation='relu'))
•X_train /= np.std(X_train, axis=0) •# model.add(BatchNormalization())
•X_test -= np.mean(X_test, axis=0) •model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2)))
•X_test /= np.std(X_test, axis=0) •model.add(Flatten())
•X_train = X_train.reshape(X_train.shape[0], 48, 48, 1) •#fully connected neural networks
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1) •model.add(Dense(1024, activation='relu'))
• # print(f"shape:{X_train.shape}") •model.add(Dropout(0.2))
•##designing the cnn •model.add(Dense(1024, activation='relu'))
•#1st convolution layer •model.add(Dropout(0.2))
•model = Sequential() •model.add(Dense(num_labels, activation='softmax'))
•model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(X_tr •# model.summary()
ain.shape[1:]))) • #Compliling the model
•model.add(Conv2D(64,kernel_size= (3, 3), activation='relu')) •model.compile(loss=categorical_crossentropy,
•# model.add(BatchNormalization()) • optimizer=Adam(),
•model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2))) • metrics=['accuracy'])
•model.add(Dropout(0.5)) •
•#Training the model •while True:
•model.fit(X_train, train_y,
• ret,test_img=cap.read()# captures frame and returns boolean
• batch_size=batch_size,
• epochs=epochs,
value and captured image
• verbose=1, • if not ret:
• validation_data=(X_test, test_y), • continue
• shuffle=True) • gray_img= cv2.cvtColor(test_img, cv2.COLOR_BGR2GRAY)
• #Saving the model to use it later on •
•fer_json = model.to_json()
•with open("fer.json", "w") as json_file:
• faces_detected = face_haar_cascade.detectMultiScale(gray_im
• json_file.write(fer_json) g, 1.32, 5)
•model.save_weights("fer.h5") •
•import os •
•import cv2 • for (x,y,w,h) in faces_detected:
•import numpy as np
• cv2.rectangle(test_img,(x,y),(x+w,y+h),
•from keras.models import model_from_json
(255,0,0),thickness=7)
•from keras.preprocessing import image
• #load model • roi_gray=gray_img[y:y+w,x:x+h]#cropping region of inter
•model = model_from_json(open("fer.json", "r").read()) est i.e. face area from image
•#load weights • roi_gray=cv2.resize(roi_gray,(48,48))
•model.load_weights('fer.h5') • img_pixels = image.img_to_array(roi_gray)
•face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xm • img_pixels = np.expand_dims(img_pixels, axis = 0)
l')
•cap=cv2.VideoCapture(0) • img_pixels /= 255
predictions = model.predict(img_pixels)
#find max indexed array
max_index = np.argmax(predictions[0])
emotions = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
predicted_emotion = emotions[max_index]
cv2.putText(test_img, predicted_emotion, (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
resized_img = cv2.resize(test_img, (1000, 700))
cv2.imshow('Facial emotion analysis ',resized_img)
if cv2.waitKey(10) == ord('q'):#wait until 'q' key is pressed
break
cap.release()
cv2.destroyAllWindows
Videotester.py
•import os
•import cv2
•import numpy as np
•from keras.models import model_from_json
•from keras.preprocessing import image
•
#load model
•model = model_from_json(open("fer.json", "r").read())
•#load weights
•model.load_weights('fer.h5')
•face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
•cap=cv2.VideoCapture(0)
•
while True:
• ret,test_img=cap.read() # captures frame and returns boolean value and captured image
• if not ret:
• continue
• gray_img= cv2.cvtColor(test_img, cv2.COLOR_BGR2GRAY)
• faces_detected = face_haar_cascade.detectMultiScale(gray_img, 1.32, 5)
• for (x,y,w,h) in faces_detected:
• cv2.rectangle(test_img,(x,y),(x+w,y+h),(255,0,0),thickness=7)
• roi_gray=gray_img[y:y+w,x:x+h]#cropping region of interest i.e. face area from image
• roi_gray=cv2.resize(roi_gray,(48,48))
• img_pixels = image.img_to_array(roi_gray)
• img_pixels = np.expand_dims(img_pixels, axis = 0)
• img_pixels /= 255
•
predictions = model.predict(img_pixels)
•
#find max indexed array
• max_index = np.argmax(predictions[0])
•
emotions = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
• predicted_emotion = emotions[max_index]
•
cv2.putText(test_img, predicted_emotion, (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
•
resized_img = cv2.resize(test_img, (1000, 700))
• cv2.imshow('Facial emotion analysis ',resized_img)
•
if cv2.waitKey(10) == ord('q'):#wait until 'q' key is pressed
• break
•
cap.release()
•cv2.destroyAllWindows
OUTPUT:
LIMITATIONS OF PROJECT:-
1. Poor Image Quality Limits Facial Recognition's Effectiveness
Image quality affects how well facial-recognition algorithms work. The image quality of scanning video is
quite low compared with that of a digital camera. Even high-definition video is, at best, 1080p (progressive
scan); usually, it is 720p. These values are equivalent to about 2MP and 0.9MP, respectively, while an
inexpensive digital camera attains 15MP. The difference is quite noticeable.
2. Small Image Sizes Make Facial Recognition More Difficult
When a face-detection algorithm finds a face in an image or in a still from a video capture, the relative size of
that face compared with the enrolled image size affects how well the face will be recognized. An already small
image size, coupled with a target distant from the camera, means that the detected face is only 100 to 200 pixels
on a side. Further, having to scan an image for varying face sizes is a processor-intensive activity. Most
algorithms allow specification of a face-size range to help eliminate false positives on detection and speed up
image processing.
3. Different Face Angles Can Throw Off Facial Recognition's Reliability
The relative angle of the target’s face influences the recognition score profoundly. When a face is enrolled
in the recognition software, usually multiple angles are used (profile, frontal and 45-degree are common).
Anything less than a frontal view affects the algorithm’s capability to generate a template for the face. The more
direct the image (both enrolled and probe image) and the higher its resolution, the higher the score of any
resulting matches.
4. Data Processing and Storage Can Limit Facial Recognition Tech
Even though high-definition video is quite low in resolution when compared with digital camera images, it still
occupies significant amounts of disk space. Processing every frame of video is an enormous undertaking, so
usually only a fraction (10 percent to 25 percent) is actually run through a recognition system. To minimize total
processing time, agencies can use clusters of computers. However, adding computers involves considerable data
transfer over a network, which can be bound by input-output restrictions, further limiting processing speed.
OVERCOMING PROBLEMS:-
As technology improves, higher-definition cameras will become available. Computer networks will be able to move more
data, and processors will work faster. Facial-recognition algorithms will be better able to pick out faces from an image and
recognize them in a database of enrolled individuals. The simple mechanisms that defeat today’s algorithms, such as obscuring
parts of the face with sunglasses and masks or changing one’s hairstyle, will be easily overcome.
An immediate way to overcome many of these limitations is to change how images are captured. Using checkpoints, for
example, requires subjects to line up and funnel through a single point. Cameras can then focus on each person closely, yielding
far more useful frontal, higher-resolution probe images. However, wide-scale implementation increases the number of cameras
required.
Evolving biometrics applications are promising. They include not only facial recognition but also gestures, expressions, gait
and vascular patterns, as well as iris, retina, palm print, ear print, voice recognition and scent signatures. A combination of
modalities is superior because it improves a system’s capacity to produce results with a higher degree of confidence. Associated
efforts focus on improving capabilities to collect information from a distance where the target is passive and often unknowing.
Clearly, privacy concerns surround this technology and its use. Finding a balance between national security and individuals’
privacy rights will be the subject of increasing discussion, especially as technology progresses.
CONCLUSION:-
Thus we conclude that, if this project is successfully implemented and working then it would be a great achievement for us as a
team. There could be variations and advancements in the project in future .
In a companion disclosure we describe an enhanced face model derived from active appearance model (AAM) tech niques which
employs a differential spatial Subspace to pro vide an enhanced real-time depth map. Employing tech niques from advanced
AAM face model generation 31 and the information available from an enhanced depth map we can generate a real-time 3D face
model.
The next step, based on the 3D face model is to generate a 3D avatar that can mimic the face of a user in real time. We are
currently exploring various approaches to implement Such a system using our real-time stereoscopic imaging system.
We would like to thank all our faculty of department , mentors ,parents and also those who have been a part of this project.
THANK YOU