Realtime Face Detection and Song Recommendation System Using CNN and KNN AI-Summer Semester Class∗ ∗ AI-2002 Artificial Intelligence Fast NUCES, Peshawar with the emotion detection system, view their emotions in realtime, and receive personalized song recommendations based on their emotional state. By combining emotion detection, song recommendation, and a user-friendly frontend, our project aims to deliver a comprehensive system that enhances the user’s music listening experience. Users can explore new music that resonates with their emotions, leading to a more immersive and enjoyable music journey. Additionally, the system’s potential applications extend to fields such as personalized user experiences, mental health monitoring, and enhancing human-computer interaction in various domains. Abstract—This project aims to develop an Emotion Detection and Song Recommendation System using OpenCV for emotion analysis and k-Nearest Neighbors (KNN) algorithm for song recommendation. Emotion detection is a crucial aspect of humancomputer interaction, and it has various applications, including personalized user experiences and mental health monitoring. The system will utilize OpenCV’s computer vision capabilities to recognize facial expressions and classify emotions into predefined categories. Additionally, the project incorporates a Song Recommendation System based on the KNN algorithm. By considering user preferences and emotions detected, the system will suggest songs that align with the user’s emotional state. The frontend of the system will be implemented using Streamlit, a Python library for creating interactive web applications with ease, providing a user-friendly interface to interact with the emotion detection and song recommendation functionalities. The combination of emotion detection, song recommendation, and a user-friendly frontend will create a holistic system that aims to enhance user engagement and satisfaction, allowing users to discover music that resonates with their emotional state. II. DATA AND METHODS How we did dataset construction and preprocessing Data Preprocessing Steps: Resizing the images to 48x48 (B/W color channel) Manually cleaning the datasets to remove incorrect expressions Spliting the data into train ,validation and test(80:10:10) Applying image augmentation using image data generator Haar Cascades to crop out only faces from the images from live feed while getting real-time predictions I. I NTRODUCTION Emotion detection and personalized music recommendation are vital components in the field of human-computer interaction and artificial intelligence. Emotions play a significant role in shaping our experiences and responses to various stimuli, including music. Music has the power to evoke emotions, and recommending songs based on a user’s emotional state can significantly enhance their listening experience. In this project, we propose an Emotion Detection and Song Recommendation System that leverages OpenCV’s powerful computer vision capabilities to detect facial expressions and identify emotions. By capturing live video input or processing images, the system will classify emotions into predefined categories such as happiness, sadness, anger, and more. This emotion detection process will serve as the foundation for the song recommendation system. The song recommendation system will be developed using the k-Nearest Neighbors (KNN) algorithm, a popular machine learning method used for classification tasks. The KNN algorithm will consider the user’s detected emotional state and preferences to suggest songs that are most suitable for their current mood. To provide an interactive and user-friendly experience, we will implement the frontend of the system using Streamlit, a Python library known for its simplicity in creating web applications. The Streamlit interface will allow users to interact Fig. 1: Pictorial Representation The data is from -¿ https://www.kaggle.com/jonathanoheix/faceexpression-recognition-dataset but we didn’t use the complete dataset as the data was imbalanced we picked out only 4 1 classes and we manually had to go through all the images in order to clean them and we finally split them into a ratio of 80:10:10 train:test:valid respectively. So the images are 48x48 gray scale images cropped to face using haarcascades. 28275 train 3530 train 3532 validation were the number of images taken from kaggle but the number of images used to train will vary as we have used image generator and manually cleaned was also. For the parameters used for image data generator you can check the model.ipynb. Model construction: Deep Learning Model: After manually pre-processing the dataset, by deleting duplicates and wrongly classified images, we come to the part where we use concepts of Convolutional Neural Network and Transfer learning to build and train a model that predicts the facial emotion of any person. The four face emotions are: Happy, Sad, Neutral and Angry. The data is split in training and validation sets: 80 percent Training, 20 percent Validation. The data is then augmented accordingly using ImageDataGenerator. This helps the sequential model to adjust weights by training on a lower learning rate. H5 files of the model: Setting the Hyper Parameters and constants (Only the best parameters are displayed below): • Batch size : 64 • Image Size : 48 x 48 x 3 • Optimizers : o RMS Prop (Pre-Train) o Adam • Learning Rate : o Lr1 = 1e-5 (Pre-Train) o Lr2 = 1e-4 • Epochs o Epochs 1 = 30 (Pre-Train) o Epochs 2 = 25 • Loss : Categorical Crossentropy Defining the Model: Using Sequential, the layers in the model are as follows: • GlobalAveragePooling2D • Flatten • Dense (256, activation: ‘relu’) • Dropout (0.4) • Dense (128, activation: ‘relu’) • Dropout (0.2) • Dense (4, activation: ‘softmax’) The pre-training is done by using RMSProp at learning rate: 1e-5 and for 30 epochs. After pre-training, we set layers.trainable as True for the whole model. Now the actual training will start. It is done by taking Adam optimizer at learning rate: 1e-4 for 25 epochs. We were able to achieve a decent validation accuracy of 75percent and an accuracy of 85percent. All the metrics observed during the model training are displayed on one plot: Feature Extraction: The next step is to extract features from the text data. This involves converting the text into numerical representations, which can be used as input for the machine learning algorithm. One common technique for feature extraction is the bag-of-words approach, which involves creating a dictionary of words that appear in the text data and counting their frequency in each example. Model Training: Once the features have been extracted, the next step is to train a machine learning model on the training set. One popular algorithm for sentiment analysis is the Support Vector Machine (SVM) algorithm. The model is trained to learn patterns in the text data that are associated with hate speech. Fig. 2: Support Vector Machine Model Evaluation: After the model has been trained, it is evaluated on the testing set to determine its performance. Several metrics can be used to evaluate the performance of the model, including accuracy, precision, recall, and F1 score. Fig. 3: Support Vector Machine A. Description of datasets, for example The dataset used in this study was collected from various online sources, including social media platforms, discussion forums, and blogs. The dataset included examples of hate speech and non-hate speech, with a total of 10,000 examples. The dataset was annotated by a team of experts to ensure that the examples were accurately labeled as hate speech or nonhate speech. For the dataset we have used a widely collected data from Twitter tweets. Twitter is one of the main place that reported the most online hate reports. [?] III. R ESULTS AND DISCUSSION The Emotion Detection and Song Recommendation System using OpenCV and KNN algorithm with a frontend developed using Streamlit have been successfully implemented. The system aims to enhance user engagement and satisfaction by 2 detecting emotions through facial expressions and recommending songs based on the user’s emotional state. Emotion Detection using OpenCV: The emotion detection component utilizes OpenCV’s computer vision capabilities to analyze facial expressions and identify emotions. Through live video input or image processing, the system detects key facial features and classifies emotions into predefined categories such as happiness, sadness, anger, etc. However, the current implementation has achieved an accuracy of 40 Song Recommendation using KNN: The Song Recommendation System is based on the k-Nearest Neighbors (KNN) algorithm, which takes into account the user’s detected emotional state and preferences to suggest songs. The KNN algorithm, in its current state, has also achieved an accuracy of 40 percent in recommending appropriate songs based on emotions. The accuracy is relatively low, indicating that the system can be further optimized to provide more accurate and personalized song recommendations. Frontend with Streamlit: The frontend of the system is developed using Streamlit, a Python library for creating interactive web applications. The Streamlit interface allows users to interact with the emotion detection system, view their emotions in real-time, and receive song recommendations. The current frontend design provides a user-friendly experience, but additional features and improvements could enhance its usability further. Future Work: algorithms like Matrix Factorization can also lead to more accurate and personalized song suggestions. Integration of User Feedback: Collecting user feedback on the recommended songs can be valuable for fine-tuning the recommendation system. Implementing mechanisms to gather user ratings and preferences will enable continuous improvement in the recommendation algorithm. Expansion of Emotion Categories: Expanding the emotion categories beyond the basic emotions could make the system more nuanced and reflective of the user’s emotional state. Incorporating a wider range of emotions can lead to more accurate and context-aware song recommendations. Real-world Testing and Dataset Expansion: The system should be tested in real-world scenarios to evaluate its performance and gather more diverse data for training the emotion detection and recommendation models. A larger and more diverse dataset can contribute to better accuracy. Integration with Music Streaming Platforms: Collaborating with music streaming platforms can enable the system to access a vast library of songs and provide real-time recommendations to users based on their emotional state and preferences. Personalization and User Profiles: Creating user profiles to store historical data, preferences, and emotional patterns can lead to more personalized recommendations over time, improving the user experience. IV. S UMMARY In conclusion, the implemented Emotion Detection and Song Recommendation System using OpenCV, KNN, and Streamlit provide a foundational framework that can be further refined and optimized for higher accuracy and personalization. By addressing the future work points mentioned above, the system has the potential to become a powerful tool for enhancing the music listening experience and human-computer interaction in various applications. Fig. 4: Score Improve Emotion Detection Accuracy: Enhancing the accuracy of emotion detection is a crucial future goal. This can be achieved by exploring advanced computer vision techniques, using deep learning models, or leveraging other face recognition libraries to identify emotions more accurately. Enhance Song Recommendation Algorithm: The KNN algorithm’s accuracy can be improved by considering additional features such as user listening history, music genre preferences, and contextual data. Implementing collaborative filtering techniques or incorporating other recommendation 3