Music Recommendation System Using Real-Time Parameters

2023 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI) | 979-8-3503-3742-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/RAEEUCCI57140.2023.10134257
Shivam Dawar
Soumitra Chatterjee
Dept. of Electronics and Communication
SRM Institute of Science and Technology
Chennai, India
Dept. of Electronics and Communication
SRM Institute of Science and Technology
Chennai, India
Mohammed Fardin Hossain
Dr. Malarvizhi S
Dept. of Electronics and Communication
SRM Institute of Science and Technology
Chennai, India
Dept. of Electronics and Communication
SRM Institute of Science and Technology
Chennai, India
Abstract—The use of music as a means of calming, energizing,
and motivating oneself is widely popular. With the help of modern
technology, the goal is to recommend suitable music for various
situations. This project aims to achieve this by utilizing realtime data such as time, location, weather, facial expressions,
artists, and audio attributes to accurately determine the user’s
emotional state. Machine learning algorithms like CNN and DNN
will be used to create a model that can classify song samples
into different genres and Spotify API and LastFM API are
connected to the user’s database to recommend music based on
the individual’s parameters. The ultimate objective is to provide
personalized music recommendations to users.
Index Terms—music, music recommender, mood detection,
music genre classification, CNN, DNN, Spotify API, LastFM API
The role of music in people’s lives cannot be overemphasized, as it serves as a medium for individuals to express
themselves, understand their emotions, and connect with others. However, there are times when individuals may not be
fully aware of their current mood or may find it challenging
to comprehend their feelings, and music can play a crucial
role in helping them to achieve this. Therefore, this project
seeks to create a music recommendation system that leverages
the latest technologies to assist millions of music lovers in
discovering the perfect music for their needs. The proposed
music recommendation system utilizes real-time parameters
such as the user’s location, time, artist, weather, and facial
expression to provide personalized recommendations. By taking these parameters into account, the system can accurately
determine the user’s current mood and behaviour, making
it easier to recommend a song or playlist that aligns with
their preferences. To achieve this, the music recommendation
system comprises three primary components. The first component is centred on understanding human emotions and moods
through facial expressions. By analyzing facial expressions,
the system can accurately detect and interpret the user’s current
emotional state, providing valuable insights that can be used
to make personalized music recommendations. The second
component is centred on creating user music profiles based on
their favourite artists and music genres. The system analyzes
the user’s listening habits, taking into account the frequency
with which they listen to certain artists and music genres. This
information is then used to create a comprehensive music profile that provides insights into the user’s musical preferences,
making it easier to suggest suitable songs and playlists. The
third and final component of the system involves integrating
the data obtained from the first and second components to
make personalized music recommendations. By combining the
user’s current mood and music profile, the system can provide
a suitable mix of music that aligns with the user’s preferences.
The system also takes into account the user’s feedback on the
recommended songs and playlists, making it possible to refine
and improve the recommendations over time. In summary, the
objective of the music recommendation system is to offer
users a customized music experience by utilizing machine
learning algorithms and real-time data to make precise music
recommendations. This personalized approach helps users to
comprehend their emotions better, improve their mood, and
ultimately, enhance their overall well-being. This system’s
accuracy surpasses that of other models because it links a
user’s preferences with their current emotions, resulting in
exponentially improved accuracy.
Emotion is detected using facial gestures, Humans are able
to express their emotions with a lot of gestures given by their
face. The facial emotions are detected through the computer’s
camera using technology such as Convolution Neural Network
and OpenCV [1]. Here the music is recommended based on
creating a personal profile of social media usage based on
various views in the social media apps and then suggesting
a list of songs many models are used like CNN and VGG
networks and for framework ubuntu, pyspark is used to
develop such model[2]. This is a project where emotion is
detected in a very modern manner by using a user’s social
media data it may be ads or the post done by the user or seen
and liked by the user, some useful technology is used to make
advancements in this like graph, tree, LSTM and many other
frameworks this helped us understand the various ways the
emotion of a user can be analysed [3]. The audio signal beats
are classified based on the different genres with the help of
CNN. The user can get recommendations for a similar type of
music based on their previous music history with the help of
the collaborative filter algorithm. Data visualization has been
used to visualize the feature of the music such as danceability,
loudness, energy, liveness, acousticness, energy, tempo, and
speechiness [4]. After understanding the requirements of music
categorization the authors found the need to categorize the
songs using the audio files so they improved the categorization
using CNN model after trying many other models on the data
set of GTZAN, which is a set of audio files[5].
To ensure accurate music recommendations, the model
relies on frequently collected data parameters. This means
the system continuously gathers and analyzes real-time information such as the user’s location, time, weather, facial
expressions, and preferred artists and music genres. We have
used many models like 3,5 and 7-layer CNN models and many
attributes to find the model with which we will get the highest
accuracy after comparison hence such models are used to build
this project. Using this information, the model can build a
comprehensive music profile for the user and integrate it with
mood detection analysis data to provide the best combination
of song suggestions. The advantage of using frequently collected data is that it enables the system to adapt and update
its recommendations based on the user’s changing preferences
and moods. Compared to previous models, this approach
utilizing multiple attribute data helps improve the model’s
detection accuracy and enhances its overall effectiveness. The
music recommendation system creates a comprehensive music
profile of the user, which includes their musical preferences.
This profile is then combined with the mood detection analysis
data to generate a highly personalized and accurate playlist.
The system selects songs that match both the user’s preferences and their current mood, resulting in a playlist that is
tailored to their exact needs. The utilization of Spotify API
and Lastfm API is a unique feature that distinguishes this
project from others. The integration of Spotify API enables
us to access the user’s song profile from the day they started
using Spotify, providing us with long-term data to analyze
efficiently. This approach differs from other projects that only
gather data from the point of connection. By incorporating
Spotify API and Lastfm API, we enhance the accuracy and
effectiveness of the music recommendation system.
Our proposal is to develop a music recommendation system
that generates playlists based on real-time parameters such
as the user’s video feed, time, and location. This project is
divided into three parts. The first part focuses on preparing
the recommender system by detecting the user’s mood. The
second part involves creating a deep neural network (DNN) to
establish a user song profile. Finally, we integrate the results
from the first two parts to build a recommender system that
generates personalized playlists as the final output.
A. Mood Detection
In the first phase of our work, we aim to detect the mood
of the user, which is a critical aspect of our music recommendation system. As people tend to listen to music that aligns
with their current mood, accurately detecting their mood can
significantly improve the song suggestion accuracy. Although
emotions are intangible, we can detect them using facial
recognition and questionnaires. A questionnaire is a set of a
few questions asked to understand the mood of the user. For
our project, we have employed the Haar Cascade algorithm,
open CV, and various CNN models with 3, 5, and 7 layers
to detect mood through facial recognition from the camera
feed and survey. We found that the five-layer CNN model
provides the highest accuracy among these models. CNN, or
convolutional neural network, is a deep learning algorithm that
classifies images into different categories, objects, or lists. It
is particularly useful in detecting patterns in images, making it
an effective method for mood detection from images or video
feeds compared to other models such as the Histogram of
Oriented Gradients (HOG), which can provide misleading data
after detection. Therefore, we have chosen CNN as the best
method to detect mood in our system.
Fig. 1. Flowchart for Mood Detection using Facial Expression
B. User Song Profile
In the next phase of our project, we aim to create a user
song profile that provides insights into the user’s musical
preferences based on factors such as artists, genres, song
speed, and duration. This will be useful in enhancing the
accuracy of our music playlist predictions by understanding
the user’s musical tastes. We have leveraged APIs to integrate
with popular music apps like Spotify and LastFM to gather
user data by connecting the API to our project. Authentication
through 3rd party apps is necessary, and various key links
are used to obtain the required data, which is then saved
to form a user-specific dataset. Once we have a complete
user music profile, we analyze the profile and mood using
a machine-learning model to classify the best-suited songs for
the occasion.
of each song is fetched using Spotify API. Finally, the URI is
fed into the song profile DNN model.
Fig. 3. 3-Layer DNN Model for Song Profile Prediction
Fig. 2. Flowchart for Mood Detection using Facial Expression
C. Recommender System
In the final phase of our project, we utilize facial expression
detection to analyze human emotions and create a music
preference profile that determines the user’s preferred music
genre and artists. This information is then utilized to develop
a machine-learning model that suggests suitable music or
playlists based on the user’s current context. Our model
incorporates data collected in previous phases, including song
and artist tags, to select a few songs for the suggested playlist.
The user’s feedback on the recommended playlist is used to
refine the model and improve future suggestions.
This section contains the actual implementation of our
proposed system. The first part of the work is that we will be
looking at detecting the person’s mood. We used 3 CNN of 3Layer, 5-Layer, and 7-Layer respectively on the same training
data. The training data consist of 35,886 grayscale images of
48x48 pixel resolution categorized into 5 moods i.e. Happy,
Sad, Neutral, Angry, and Fear. While training we found that
5-Layer with more than 16,000 variable parameters (as shown
in Fig. 4) was the most efficient. The output of the CNN model
is 5 different moods i.e. Neutral, Angry, Sad, Happy, and Fear.
After detecting the mood of the user a user song profile
needs to be made which helps the recommender system to
suggest a song that is similar to the user so the song doesn’t
feel too alien to the user. LastFM API and Spotify API, both
RESTful APIs, are used to collect and scrobble (to record
users’ music preferences) data. A LastFM account is created
and linked to the Spotify account of the user. LastFM is used
to scrobble through the user’s Spotify account. Using LastFM
API user’s top 10 artists are collected in JSON format. Using
Spotify API top 10 songs of each artist, previously fetched, are
fetched. Following that, the URI (Uniform Resource Indicator)
Song Profile DNN model is a 3-Layer DNN model having 10 input nodes and 4 output nodes as shown in Fig.
3. The input parameters are acoustics, danceability, energy,
instrumentals, liveness, valence, loudness, speechiness, tempo,
key, and time-signature which are obtained by requesting it to
Spotify API using the song URI. The output will be the song
profile which is ‘neutral’, ‘Angry’,’Fear’ ‘Sad’, or ‘Happy’
respectively. A sample song profile is shown in Fig. 5.
To give the final playlist of the recommended song we will
be using content-based recommendation. We sort the song
from the user’s top 25 songs on the basis of its emotion. We
search all the tags related to the songs using LastFM API.
We take the top 10 tags and find the top 5 songs of that and
labelled them with those specific songs. We create a similarity
matrix using the cosine similarity formula (1).
sim(A, B) =
(A · B)
(||A|| ∗ ||B||)
where A and B are the feature vectors for two items being
compared, · denotes the dot product of the two vectors, and
||A|| and ||B|| are the Euclidean norms of the two vectors.
We pick the top 5 songs with the highest similarity and
suggest them to the user.
A 5-layer CNN model was accurately detecting facial emotions through a camera, classifying them as sad, happy, calm,
or energetic. User time and location and a question set are
also collected, and the Spotify API is used to gather the user’s
profile information, including their favourite songs and artists,
and convert it into URIs. A deep learning model is then used to
classify whether the songs listened to by the user are energetic,
sad, calm, or happy. The user profile is utilized to recommend
songs based on their emotion, location, and time, and the
LastFM API is used to recommend more songs based on the
user’s preferred genre tags. A pie chart of user’s preferred
songs shows that they mostly listen to happy songs followed
by fear songs and then angry, sad and neutral at 28.7%,
Fig. 5. User’s Song Profile
Fig. 6. Confusion Matrix for Song Profile Detection
25.1%, 21.5%, 17.8% and 6.9% respectively [Fig 5]. The
confusion matrix is used to evaluate the performance of the
classification model, with an overall accuracy of 90%, higher
than most other similar systems [Fig 6]. The mood detection
test to validation dataset also achieves an accuracy of around
94%, demonstrating better categorization than existing mood
detection systems [Fig 7]. An example of mood detection from
a video feed is given [Fig 8].
Accuracy =
(T N + T P )
(T N + F P + F N + T P )
(F P + T P )
P recision =
Fig. 4. 5-Layer CNN Model for Mood Detection
Recall =
(T P + F N )
Fig. 7. Accuracy chart comparison
Fig. 8. Mood detection using a video feed
In (1), (2) and (3) TN stands for true negative, TP stands for
true positive, FP for false positive and FN for false negative.
True positive that is TP is when observation is predicted
positive and is actually positive. False positive FP means when
observation is predicted positive and is actually negative. False
negative FN means when observation is predicted negative
and is actually positive. True negative is when observation is
predicted negatively and is actually negative. So accordingly
above are the two formulas to calculate the accuracy and
precision. Accuracy is the number of times the model is
correct overall whereas precision, on the other hand, means
the number of correct predictions in a category Using (1),(2)
and (3) the values of the table are calculated.
Here we have made 2 tables one for the mood detection
performance matrix and the other one for the song profile
matrix. We have taken moods to be happy, sad, neutral, angry
Song profile
Fig. 9. 5-Layer CNN Model for Mood Detection
Expected Result
Calculated Result
Accuracy (%)
and fearful and a value which is the expected result from
training data and the calculated values from the test data from
which we get the final accuracy in each category and having
an average accuracy of 95.69% [Table I]. In the song profile
the categories are happy, sad, neutral, angry and fearful and
from the confusion matrix in Fig. 6. we have calculated the
average accuracy, recall, and precision of 96.23%,90.61%, and
90.66% respectively [Table II]. In the graph, we can see that
the expected result and calculated values are put in a bar chart
to have a visual representation of the calculated accuracy[Fig
Music plays a significant role in people’s lives, influencing their mood and serving as a source of motivation or
stress relief. However, everyone has unique music preferences,
making it essential to recommend songs based on the user’s
mood, time, and location accurately. In this study, we utilized
different parameters to recommend music more precisely to the
user’s choice. We employed a 5-layer convolution neural network and OpenCV to detect the user’s emotions through their
face and other features, incorporating this information with
time and location to determine their mood and recommend
music accordingly. We utilized Spotify Web API to extract the
user’s music preferences and classify them into four categories
(Happy, Sad, Angry, Neutral and Fear) using deep learning
algorithms. We then used LastFM API to recommend more
songs based on the user’s preferred genre tags. Our results
demonstrate that predicting music using real-time parameters
and emotion leads to more accurate song recommendations.
Future research in this area could help create personalized
playlists for users and recommend background music for social
media platforms such as Instagram reels or stories.
The data which is used for mood detection is
taken from an open-source platform called Kaggle.
https://www.kaggle.com/datasets/jonathanoheix/faceexpression-recognition-dataset The data for the song profile is
built by using Spotify, astFM and a form, which is circulated
among users of different gender, age group, and geographic
