Uploaded by Sudais K

ai class proj

advertisement
Realtime Face Detection and Song
Recommendation System Using CNN and KNN
AI-Summer Semester Class∗
∗ AI-2002
Artificial Intelligence
Fast NUCES, Peshawar
with the emotion detection system, view their emotions in realtime, and receive personalized song recommendations based
on their emotional state.
By combining emotion detection, song recommendation,
and a user-friendly frontend, our project aims to deliver a
comprehensive system that enhances the user’s music listening
experience. Users can explore new music that resonates with
their emotions, leading to a more immersive and enjoyable music journey. Additionally, the system’s potential applications
extend to fields such as personalized user experiences, mental
health monitoring, and enhancing human-computer interaction
in various domains.
Abstract—This project aims to develop an Emotion Detection
and Song Recommendation System using OpenCV for emotion
analysis and k-Nearest Neighbors (KNN) algorithm for song
recommendation. Emotion detection is a crucial aspect of humancomputer interaction, and it has various applications, including
personalized user experiences and mental health monitoring.
The system will utilize OpenCV’s computer vision capabilities to
recognize facial expressions and classify emotions into predefined
categories.
Additionally, the project incorporates a Song Recommendation
System based on the KNN algorithm. By considering user
preferences and emotions detected, the system will suggest songs
that align with the user’s emotional state. The frontend of the
system will be implemented using Streamlit, a Python library
for creating interactive web applications with ease, providing a
user-friendly interface to interact with the emotion detection and
song recommendation functionalities.
The combination of emotion detection, song recommendation,
and a user-friendly frontend will create a holistic system that aims
to enhance user engagement and satisfaction, allowing users to
discover music that resonates with their emotional state.
II. DATA AND METHODS
How we did dataset construction and preprocessing
Data Preprocessing Steps:
Resizing the images to 48x48 (B/W color channel) Manually cleaning the datasets to remove incorrect expressions
Spliting the data into train ,validation and test(80:10:10) Applying image augmentation using image data generator Haar
Cascades to crop out only faces from the images from live
feed while getting real-time predictions
I. I NTRODUCTION
Emotion detection and personalized music recommendation
are vital components in the field of human-computer interaction and artificial intelligence. Emotions play a significant role
in shaping our experiences and responses to various stimuli,
including music. Music has the power to evoke emotions, and
recommending songs based on a user’s emotional state can
significantly enhance their listening experience.
In this project, we propose an Emotion Detection and Song
Recommendation System that leverages OpenCV’s powerful
computer vision capabilities to detect facial expressions and
identify emotions. By capturing live video input or processing
images, the system will classify emotions into predefined
categories such as happiness, sadness, anger, and more. This
emotion detection process will serve as the foundation for the
song recommendation system.
The song recommendation system will be developed using
the k-Nearest Neighbors (KNN) algorithm, a popular machine
learning method used for classification tasks. The KNN algorithm will consider the user’s detected emotional state and
preferences to suggest songs that are most suitable for their
current mood.
To provide an interactive and user-friendly experience, we
will implement the frontend of the system using Streamlit, a
Python library known for its simplicity in creating web applications. The Streamlit interface will allow users to interact
Fig. 1: Pictorial Representation
The data is from -¿ https://www.kaggle.com/jonathanoheix/faceexpression-recognition-dataset but we didn’t use the complete
dataset as the data was imbalanced we picked out only 4
1
classes and we manually had to go through all the images
in order to clean them and we finally split them into a ratio
of 80:10:10 train:test:valid respectively. So the images are
48x48 gray scale images cropped to face using haarcascades.
28275 train 3530 train 3532 validation were the number of
images taken from kaggle but the number of images used to
train will vary as we have used image generator and manually
cleaned was also. For the parameters used for image data
generator you can check the model.ipynb.
Model construction:
Deep Learning Model:
After manually pre-processing the dataset, by deleting duplicates and wrongly classified images, we come to the part
where we use concepts of Convolutional Neural Network and
Transfer learning to build and train a model that predicts the
facial emotion of any person. The four face emotions are:
Happy, Sad, Neutral and Angry.
The data is split in training and validation sets: 80 percent
Training, 20 percent Validation. The data is then augmented
accordingly using ImageDataGenerator.
This helps the sequential model to adjust weights by training
on a lower learning rate.
H5 files of the model:
Setting the Hyper Parameters and constants (Only the best
parameters are displayed below): • Batch size : 64
• Image Size : 48 x 48 x 3
• Optimizers : o RMS Prop (Pre-Train) o Adam
• Learning Rate : o Lr1 = 1e-5 (Pre-Train) o Lr2 = 1e-4
• Epochs o Epochs 1 = 30 (Pre-Train) o Epochs 2 = 25
• Loss : Categorical Crossentropy
Defining the Model: Using Sequential, the layers in the
model are as follows: • GlobalAveragePooling2D • Flatten
• Dense (256, activation: ‘relu’) • Dropout (0.4) • Dense
(128, activation: ‘relu’) • Dropout (0.2) • Dense (4, activation:
‘softmax’) The pre-training is done by using RMSProp at
learning rate: 1e-5 and for 30 epochs. After pre-training, we set
layers.trainable as True for the whole model. Now the actual
training will start. It is done by taking Adam optimizer at
learning rate: 1e-4 for 25 epochs. We were able to achieve a
decent validation accuracy of 75percent and an accuracy of
85percent. All the metrics observed during the model training
are displayed on one plot:
Feature Extraction: The next step is to extract features from
the text data. This involves converting the text into numerical
representations, which can be used as input for the machine
learning algorithm. One common technique for feature extraction is the bag-of-words approach, which involves creating a
dictionary of words that appear in the text data and counting
their frequency in each example.
Model Training: Once the features have been extracted,
the next step is to train a machine learning model on the
training set. One popular algorithm for sentiment analysis is
the Support Vector Machine (SVM) algorithm. The model is
trained to learn patterns in the text data that are associated
with hate speech.
Fig. 2: Support Vector Machine
Model Evaluation: After the model has been trained, it is
evaluated on the testing set to determine its performance.
Several metrics can be used to evaluate the performance of
the model, including accuracy, precision, recall, and F1 score.
Fig. 3: Support Vector Machine
A. Description of datasets, for example
The dataset used in this study was collected from various
online sources, including social media platforms, discussion
forums, and blogs. The dataset included examples of hate
speech and non-hate speech, with a total of 10,000 examples.
The dataset was annotated by a team of experts to ensure that
the examples were accurately labeled as hate speech or nonhate speech. For the dataset we have used a widely collected
data from Twitter tweets. Twitter is one of the main place that
reported the most online hate reports. [?]
III. R ESULTS AND DISCUSSION
The Emotion Detection and Song Recommendation System
using OpenCV and KNN algorithm with a frontend developed
using Streamlit have been successfully implemented. The
system aims to enhance user engagement and satisfaction by
2
detecting emotions through facial expressions and recommending songs based on the user’s emotional state.
Emotion Detection using OpenCV: The emotion detection
component utilizes OpenCV’s computer vision capabilities to
analyze facial expressions and identify emotions. Through live
video input or image processing, the system detects key facial
features and classifies emotions into predefined categories
such as happiness, sadness, anger, etc. However, the current
implementation has achieved an accuracy of 40
Song Recommendation using KNN: The Song Recommendation System is based on the k-Nearest Neighbors (KNN)
algorithm, which takes into account the user’s detected emotional state and preferences to suggest songs. The KNN
algorithm, in its current state, has also achieved an accuracy
of 40 percent in recommending appropriate songs based on
emotions. The accuracy is relatively low, indicating that the
system can be further optimized to provide more accurate and
personalized song recommendations.
Frontend with Streamlit: The frontend of the system is
developed using Streamlit, a Python library for creating interactive web applications. The Streamlit interface allows users
to interact with the emotion detection system, view their
emotions in real-time, and receive song recommendations. The
current frontend design provides a user-friendly experience,
but additional features and improvements could enhance its
usability further. Future Work:
algorithms like Matrix Factorization can also lead to more
accurate and personalized song suggestions.
Integration of User Feedback: Collecting user feedback
on the recommended songs can be valuable for fine-tuning
the recommendation system. Implementing mechanisms to
gather user ratings and preferences will enable continuous
improvement in the recommendation algorithm.
Expansion of Emotion Categories: Expanding the emotion
categories beyond the basic emotions could make the system
more nuanced and reflective of the user’s emotional state.
Incorporating a wider range of emotions can lead to more
accurate and context-aware song recommendations.
Real-world Testing and Dataset Expansion: The system
should be tested in real-world scenarios to evaluate its performance and gather more diverse data for training the emotion
detection and recommendation models. A larger and more
diverse dataset can contribute to better accuracy.
Integration with Music Streaming Platforms: Collaborating
with music streaming platforms can enable the system to
access a vast library of songs and provide real-time recommendations to users based on their emotional state and preferences.
Personalization and User Profiles: Creating user profiles
to store historical data, preferences, and emotional patterns
can lead to more personalized recommendations over time,
improving the user experience.
IV. S UMMARY
In conclusion, the implemented Emotion Detection and
Song Recommendation System using OpenCV, KNN, and
Streamlit provide a foundational framework that can be further
refined and optimized for higher accuracy and personalization.
By addressing the future work points mentioned above, the
system has the potential to become a powerful tool for
enhancing the music listening experience and human-computer
interaction in various applications.
Fig. 4: Score
Improve Emotion Detection Accuracy: Enhancing the accuracy of emotion detection is a crucial future goal. This can be
achieved by exploring advanced computer vision techniques,
using deep learning models, or leveraging other face recognition libraries to identify emotions more accurately.
Enhance Song Recommendation Algorithm: The KNN algorithm’s accuracy can be improved by considering additional features such as user listening history, music genre
preferences, and contextual data. Implementing collaborative
filtering techniques or incorporating other recommendation
3
Download