Uploaded by Aayush Chamria

SOP-OCEAN traits extraction from text, questionnaire and social media profile for personality prediction

advertisement
TITLE
OCEAN traits extraction from text, questionnaire and social
media profile for personality prediction
Class of work: Software category
Authors: Mrs. Smita Bharne, Ms. Shital Nehete ,Mr. Jeet Choudhari ,
Ms. Mansi Pable
Organization: Ramrao Adik Institute of Technology, Navi Mumbai
Short abstract:
In recent years, due to the increase of stress levels at different places like home,
organizations, etc problems of mental health of people are increasing. The employment of
less-skilled people results in reducing the outcome of the organization. To solve such kinds of
problems, the personality of the person plays a major role. The system aims to predict the
personality of the person using Big-Five traits which are Openness, Conscientiousness,
Extroversion, Agreeableness, and Neuroticism popularly known as the OCEAN model. The
system predicts the personality of a person using a hybrid model which consists of three
processes i.e., text, questionnaire, and social media profile connections. The machine learning
model used to predict the personality uses the myPersonality dataset. Social media profile
also helps to compare the personality of the person within a friend’s network. This web
application provides a proper interface for the processes and displays the various parameters
to predict the personality of a person for easy access.
Detailed Abstract:
From the last decade, the use of social media is rapidly increasing among every age criterion
of people. The younger generation is spending more time on various social networking sites.
With the evolution of social networks, an excellent diversity of approaches has been
developed to interpret users' personalities.
Also in recent years, the issues of the psychological state of individuals are increasing day by
day due to an increase in stress levels. To solve these problems, the personality of the person
plays a serious role. The goal of this system is to predict the personality of the person using
the OCEAN model (Openness, Conscientiousness, Extraversion, Agreeableness and
Neurotism).
High Openness reflects the degree of curiosity and creativity. Also lack of focus and
unpredictable behavior. For example, a person has multiple talents like dance, art and music.
While, low openness shows people to be practical, data driven and aren't open to new ideas.
For example, a person analyzed the report and he says, it can't be wrong.
High Conscientiousness is a tendency to be organized and dependable. It can also be
characterized by discipline and stubbornness. For example, a person writes an assignment
with great care and keep them organized. While, low conscientiousness means sloppy,
spontaneous and flexible. For example, some persons keep their cupboard messy.
High Extraversion is defined as a behavior where someone enjoys being around people. It
means cheerful, talkative and attention seeker. For example, a person is friend with many
people and is not afraid of risk. While, low Extraversion means shy, reserved and introvert.
People who are high in agreeableness tend to be more cooperative, helpful and trustful. For
example, some person shows kindness and affection towards dogs. While those who are low
in this trait are more like competitive and suspicious.
High Neurotism means worse personality trait. It includes obsessing over what others think or
having a more anxious temperament than others. For example, Rohit gets angry everytime.
Low Neurotism shows confident, excited and hyper active person. For example, I'm excited
for the London trip.
The traits, set of etiquettes, and emotional patterns define one’s personality which plays a
crucial role in making major life decisions. The potential of personality features to forecast
the life upshots has been interrogated due to the presumed repercussion of personality.
Predicting personality with the assistance of OCEAN models is often obliging to screen a
particular person.
Social media is the social interconnection among people that occurs during the creation,
sharing, or exchange of information and ideas in virtual communities and networks. People
are spending more time on online social networks and sharing a lot of personal information
through this medium. This information includes the user's social behavior, user-generated
texts, and language habits, to some extent it will reflect the user's personality. This system
aims to predict the personality of a user through his social media interconnections as one of
the major components without violating his privacy using the proposed hybrid model.
We propose a Personality Prediction web application that consists of a hybrid model of the
three sections namely Text Predictor, Questionnaire, and through Facebook. In most of the
existing state-of-the-art, personality prediction is based on either machine learning algorithms
or either through the questionnaire formats. Our approach uses the hybrid model which
predicts the personality based on input text, questionnaire, and through Facebook friend
network.
The user interface of the web application gives the choice to input a text which is provided as
an input to the OCEAN model for predicting. This is often followed by scanning those text
sentences within the myPersonality sample dataset (250 users, 9917 status updates). In the
questionnaire, the User undergoes 50 questions regarding the O.C.E.A.N model, based on
which radar graph is implemented. It also can be termed a Psychometric Assessment. This
can be further used for comparing personality among social networks. Comparing the
personality among the Facebook connections or any social media site network tab is useful.
Comparison between user’s personality score evaluated from the Questionnaire section and
therefore the model-based prediction is used in this section.
We have used a random forest algorithm to train each of various personality traits as Random
Forest is a flexible, easy-to-use algorithm that provides an excellent result. Additionally, it
takes less training time, predicts output with high accuracy and it can also maintain accuracy
when a large proportion of data is missing. Further, we have used selenium to extract statuses
from Facebook and store them in mongo dB, which is a NoSQL database. Finally,
implemented the web application using react and flask.
This system consists of questionnaires as well as text prediction of personality. The
automated system will give users a better analysis of their personality as it compares the
scores with the friends, he/she is on Facebook. It can also provide self-analysis to the user for
the field he/she is lacking concerning another. Combining these methods can help the user to
get the result on only one platform instead of searching for another.
Statement of Particulars:
1. Objectives
The objective of this project is to predict personality based on the 5 features
(BIG5 – O.C.E.A.N.) which stand for Openness, Conscientiousness, Extroversion,
Agreeableness, and Neuroticism to diagnose psychological problems as well as
assessment of candidates. Specifically, the following were defined to achieve the
objective of the study:
1. To diagnose psychological problems.
2. Develop an application to screen candidates for college, employment.
3. Focusing on strengths, weaknesses, temperament, and style of leadership.
4. Predict personality at the workplace as well as for personal inventories.
2. Structure of the proposed system
Figure 1: Architecture of the Proposed System
Figure 1 shows the architecture of the proposed system. It consists of 3 sections
Text Predictor, Questionnaire, and Facebook. The model is trained on the
myPersonality dataset. For each personality attribute, the Models generate a
predicted personality score using the regression model and a probability of the
binary class using the classification model. In the first section personality of a
person can be predicted by inputting any text with the help of a text predictor.
This section is based on the models trained earlier. The user interface of the web
application gives the option to predict based on the text. The models trained will
give a score on each of the 5 personality traits.
In the Questionnaire section user undergoes 50 questions Big 5 personality test
regarding the O.C.E.A.N model, User will get options like agree, disagree and
neutral on basis of that certain score will be generated, which then displays your
corresponding percentile scores for each personality trait and based on which
radar graph is implemented. It is like a psychometric assessment. The score
generated in this section is used to compare personality obtained in the Facebook
section.
Facebook is used for comparing the personality among Facebook or any social
media site network. Whenever use login with his Facebook his credential, the
system will predict his personality from the various post he created he likes,
dislikes. The system will extract automation the statuses of user’s friends are and
stored them in a database. Here the user itself compares his personality within his
friend’s network. Hence privacy of user’s data is maintained. The system is not
violating any privacy of the user. The model which is trained earlier is used to
give a prediction on the data which is stored in a database. This data is also used
for comparison between the user’s personality score evaluated from the
questionnaire section and the model-based prediction.
3. Functioning
1. Text Predictor: First, the text inputted by the user on the frontend is given for
predicting through flask, and then it is converted into vectors using tf-idf
vectorization. Second, the vectors of the text are provided as inputs to the five
models trained on the myPersonality dataset by the Random Forest algorithm.
Third, the scores are obtained by the regression model and probabilities are
evaluated from the classification models and are displayed on the screen
according to the respective personality traits.
2. Questionnaire: First, the questions are prepared according to the Big-Five
personality traits, with 10 questions for each personality trait. Thus, the user
has to answer 50 questions for the evaluation of the personality. Second, the
answer to the questions are stored in the dictionary and the score is averaged
according to the answer. Third, the percentile is also evaluated based on the
user’s answer and the dataset of values for each personality trait. Forth, the
score and the percentile are displayed on the user interface. The radar graph is
displayed according to the score for visual representation.
3. Facebook: First, the user provides credentials for the Facebook account and
are stored in yaml file. Second, the selenium platform is used for automation
for extracting the Facebook statuses of 5 friends from the friend list provided
in the Friends section on the Facebook account. Third, the extracted statues are
stored in MongoDB. Forth, the statues are used to evaluate the personality
traits scores, percentiles, and probabilities of each friend according to the
Random Forest model. Fifth, the calculated personality is used to compare
with the user’s personality in the questionnaire section. Sixth, the visual
comparison can be done using by clicking on the compare button on the user
interface. The radar graph and the difference between the scores and
percentiles are visible on the user interface. This will help the user know the
strength and weaknesses more clearly.
4. The novelty of the approach
The salient features of the proposed system are:
•
•
•
•
•
•
•
The system uses the hybrid model which predicts the personality based
on input text, questionnaire, and through Facebook friend network. Instate of art either of the techniques is used to predict the personality of
a person.
The study's originality is to use OCEAN features extraction from text,
questionnaires, and social media profiles for personality prediction in
order to disclose people's personalities.
The Questionnaire section helps to know the personality of the person
from the questions’ answers.
The Text Predictor predicts the personality from the text the user
inputs and scores and percentiles are displayed for the user’s
understanding.
Facebook friend’s statuses are extracted is entirely automated helpful
to predict the personality of a person without violating any user
privacy.
The comparison tab can compare the personality of two different
people, so the analysis can help know the strengths and weaknesses
concerning others.
The hybrid model can help the user with one platform for overall
prediction and will not need to go search for another.
5. Statement of Further Particulars (SoFP):
1. Research methodologies and technologies
•
•
•
•
•
•
•
•
Text predictor is generated using the random forest algorithm using tf-idf
vectorization.
React is used to design the user interface as it is an efficient and versatile
JavaScript library. The Web App which can be created using React.js is
going to be using backend as Flask and MongoDB.
Python is used as a programming language. Python is simple and easy to
find syntax emphasizes readability and thus reduces the value of program
maintenance.
MongoDB is used to store Facebook statuses from web scraping and it is
further used for prediction. The predictions made by the models are again
stored back into MongoDB. These are used for comparison.
Selenium automated browser will scrape statuses from Facebook by taking
the login credentials from the user and add them to a database in
MongoDB.
Node.js is used to run the web app that runs in React. React requires
Node.js installed for running the project on port number 3000.
Flask is used for connecting the React frontend to the MongoDB backend.
It takes the values from MongoDB and posts them on port 5000 as well as
takes the data from the frontend and provides it to the backend. So, it is
mainly used for interactive frontend.
Random forest models provide a prediction of personality score using the
regression model and a probability of binary class using the classification
model for each personality characteristic.
2. Benefits to the society and advantages
a. Diagnose mental health problems
In case anyone is facing a mental health issue, this can be helpful in some
way for the person giving treatment to the victim to know the exact issue. This
can speed up the process of curing the issue.
b. Screening of the candidates based on predicted personality with
the help of questionnaire:
During the placement activities, to predict a candidate's personality, to
check whether that candidate is suitable for that particular job role or not, this
project can be used in which personality can be predicted based on five
personality traits which are sufficient enough to predict one's personality.
These five personality traits can be listed as Openness, Conscientiousness,
Extroversion, Agreeableness, Neuroticism. The personality test consists of 50
personality-based questions having scores associated with each option and
based on the option which the user will be selecting, probabilities or the scores
will be generated. Hence this can be very helpful in the hiring of any job role,
especially for the HR department. From the candidate's point of view, if he/she
wants to make his/her profile suitable for the respective job role, this test can
be taken and changes made in personality can lead to good results.
c. To help the user compare his personality with others:
In day-to-day life, users can compare his personality with others like his
friends, colleagues, etc. to him he assumes idol. This will give chance to the
user to make improvements in his personality based on his strength and
weakness. The My Network section given in this application can be very
useful in this case where the user can compare his personality (taken from the
questionnaire section) with his Facebook network. The icing on the cake is
that the results are displayed in the form of a radar graph.
3. Strength of the proposed system
•
•
•
•
The average of three different methods can give more precise results.
50 questions are given in the questionnaire for a better analysis of
personality.
The comparison tab gives self-analysis to the user.
It can be used by educational organizational, the employment sector, and
knowing the patients before the diagnosis for psychological problems.
Download