Uploaded by Ravi Rajbhure

Report

advertisement
Personality Development Using Machine Learning
(2022-2023)
CHAPTER 1
INTRODUCTION
1.1
Introduction
Personality is defined as the aspect with the set of perception, feeling and behavioural patterns that
develop from botanical and external factors. Generally, there is no proper approval for definition
of personality, mainly they focus on provocation and conceptual interactions. Even personality can
be defined as traits that predict a person’s behaviours. Personality identification was the old
approach to identify the user’s personality but now with the help of data mining techniques
accuracy of this prediction has improved a way lot than old techniques.[1] Data mining is the
technique of finding pattern in huge data sets involving methods at the interaction of statistics,
database systems and machine learning. Its overall goal is to produce information from datasets
and transfer information. The automated personality consists of comparing user’s personality
against standard personality tests taken. Mainly personality prediction depends on person’s nature.
Several tests will be taken by asking set of questions and depending on the answers chosen by the
user, personality will be predicted. Classification algorithm used is N-closest neighbourhood. It is
very important to process large volume of data and this can be done by Classification
algorithm.[2] The major goal of this paper is to give the outline for the growth of personality
prediction depending upon the respective questions been answered. The outline of this paper is to
predict personality of respective user and suitable career options.
1.2
Personality Traits
In SenseCare, we have conducted a study on personality traits and basic emotions, assessed by
both subjective emotional self-ratings and the automated classification of emotions from initial
emotion analysis [1]. The study’s procedure was as follows: First, participants filled out the Big
Five Aspect Scale. Second, participants then watched 12 video clips taken from movies and TV
series, which were designed to cause an emotional response. There were 2 videos per Basic
Emotion (i.e. Disgust, Anxiety/Fear, Anger, Surprise, Sadness, and Joy). The video clips lasted
between 1 and 5 minutes. Third, after each video clip, participants were asked to rate on a Likertscale from 1-5 how much of each basic emotion they felt whilst watching the video (1 being “not
Dept. of CSE, MGI-COET, Shegaon
1
Personality Development Using Machine Learning
(2022-2023)
at all” and 5 being “a great deal”). Participants were debriefed about the nature of the study.
Overall, it lasted for an hour and 30 participants took part.
A. Relationship Between Personality Traits and Subjective Emotional Feedback
A Pearson’s correlation was conducted on each personality trait variable with each distinct
emotion self-report rating. Pearson correlations measure the relationship between two variables
and output the strength of the relationship with the effect size, r. This relationship can range from 1 to +1. A score of -1 represents a perfect negative correlation between two variables. If variables
A and B have a negative correlation of -1, every increase of A is matched with an equivalent
decrease in B, and vice versa. The opposite is true for a correlation of +1. A correlation of 0 means
no relationship exists between the two variables. For social sciences, an effect size of 0.10 is
considered small, 0.30 medium, and 0.50 a large effect size. The vast majority of these
relationship are negatively correlated (88/105), meaning that increases in scores of each trait
tended to decrease the reported amount of each emotion. The strongest relationship between two
variables, was between the personality trait Agreeableness and the emotion Joy. This seemed to
particularly stem from the sub-trait Compassion, which also had a moderate negative correlation
with Joy. In relation to the hypotheses, Extraversion and its sub-traits Assertiveness and
Enthusiasm showed no significant or strong relationship with Joy; Neuroticism was negatively
related to the experience of each negative emotion, Anger, Fear, Anxiety, Sadness, Disgust;
Agreeableness, both Compassion and Politeness was negatively associated with Anger;
Conscientiousness, both Industriousness and Orderliness, was negatively associated with Disgust;
Openness to Experience had a weak-to-moderate negative correlation with Anger, Fear, Sadness,
Surprise, Anxiety, and Disgust, suggesting it is associated with a wide range of emotional
experiences. However, it had no relationship with Joy.
B. Relationship between Personality Traits and Automated Emotional Expressions
The highest correlation was between Agreeableness and ML Sadness, with a moderate negative
relationship, r = -0.34. This means that the more agreeable a person scored, the less likely they
were to be classified as experiencing sadness during the experiment. Some of the most interesting
findings were, ML Joy was positively correlated with Extroversion, both Assertiveness and
Enthusiasm; ML Anger was positively correlated with Agreeableness, and both Compassion and
Politeness, in contrast to SR Anger which was negatively correlated; ML Disgust was negatively
associated with Conscientiousness, and both Industriousness and Orderliness, which was similar
Dept. of CSE, MGI-COET, Shegaon
2
Personality Development Using Machine Learning
(2022-2023)
to SR Disgust; ML Fear/Anxiety was weakly positively correlated with Neuroticism, both
Volatility and Withdrawal; Every ML emotion bar Fear/Anxiety had a > .10 (both plus and
negative) relationship with Openness to Experience, suggesting a breadth of emotional experience.
However, the strength of each relationship was not matched in the sub-trait aspects of Openness to
Experience, Intellect and Openness.
C. Relationship between Subjective Emotions and Facial Expressions
For example, self-reported Joy is positively correlated with ML Anger. ML Joy is positively
correlated with Fear, Anger, Anxiety, and Disgust. SR Disgust is negatively associated with ML
Disgust, albeit the size of the correlation is weak. Fear is weakly positively correlated with ML
Fear/Anxiety. SR Anger is negatively and weakly correlated with ML Anger. There is a weak and
positive relationship between SR and ML Joy. There is a weak-to-moderate positive relationship
between SR Anxiety and ML Anxiety/Fear. There is a weak positive relationship between SR and
ML Sadness. There is a moderate-to-strong positive relationship between SR and ML Surprise.
Overall, there are both consistent and inconsistent results between the two emotions measures.
D. Taxonomy Management System
During the study and its outcome, we have derived various requirements that should be supported
by a Taxonomy Management System. First, we have investigated the data produced during the
study. Secondly, we have derived requirements towards the Taxonomy Management System and
modelled its constituents. Last but not least we have implemented and experimentally tested the
SenseCare Taxonomy Management System
1.3 Motivation
The term “personality” is derived from Latin word “persona”, which refers to the mask put on by
the actors in theatre. The attributes which characterize a person such as emotions, behavior, mind
and temperament define a personality. Due to the variety of attributes, gauging personality is
crucial as there is no predefined structure for classifying and comparing people. The set of human
emotions is vast, due to which a similar problem arises when we try to identify sentiments
embedded in the message (also known as “sentiment analysis”), causing difficulty in choosing
significant emotions for classification. Thus in order to automate task of sentiment analysis, a
number of researchers accepted a simpler sentiments representation by means of their “polarity”
(negative or positive) . Similarly for personality prediction, several researchers have identified the
Dept. of CSE, MGI-COET, Shegaon
3
Personality Development Using Machine Learning
(2022-2023)
important characteristics essential for building a personality model. Personality is the coherent
patterning of affect, cognition, and desires (goals) as they lead to behavior. To study personality is
to study how people feel, how they think, what they want, and finally, what they do. Personality is
an important aspect of human life and is important for understanding yourself and other people.
The preeminent personality model in personality psychology is the Big 5 model. Personality is an
important aspect of human life and is important for understanding yourself and other people. The
preeminent personality model in personality psychology is the Big 5 model or the O.C.E.A.N
model. The Big 5 model was derived through factor analysis of questions based on common
descriptive adjectives. This analysis produced five distinct traits of personality Openness to
experience, conscientiousness, extraversion, agreeableness and neuroticism. This paper aims in
classifying personalities based on a set of attributes or commonly known as the big five model
from a given text using Machine learning concepts and learning algorithms.
1.4 Objectives

The objective is classifying personalities and analyzing them based on the big five model with
a given data set using classification algorithms and advanced data mining concepts. 

Using and exhibiting the data mining concepts and automate personality classification using
python data science libraries.

To extract the personality of an individual on the social networking websites in addition, it was
to classify personalities and analyze them based on the big five model with a given data set
using classification algorithms and advanced data mining concepts which is served and also
for Using and exhibiting the data mining concepts and automate personality classification
using python data science libraries that has been provided.
1.5 Data Mining
The process of digging through data to discover hidden connections and predict future trends has a
long history. Sometimes referred to as "knowledge discovery in databases," the term "data
mining" wasn’t coined until the 1990s. But its foundation comprises three intertwined scientific
disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like
intelligence displayed by software and/or machines) and machine learning (algorithms that can
learn from data to make predictions). What was old is new again, as data mining technology keeps
Dept. of CSE, MGI-COET, Shegaon
4
Personality Development Using Machine Learning
(2022-2023)
evolving to keep pace with the limitless potential of big data and affordable computing power.
Over the last decade, advances in processing power and speed have enabled us to move beyond
manual, tedious and time-consuming practices to quick, easy and automated data analysis. The
more complex the data sets collected, the more potential there is to uncover relevant insights.
Retailers, banks, manufacturers, telecommunications providers and insurers, among others, are
using data mining to discover relationships among everything from price optimization, promotions
and demographics to how the economy, risk, competition and social media are affecting their
business models, revenues, operations and customer relationships. The term "data mining" is
a misnomer because the goal is the extraction of patterns and knowledge from large amounts of
data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to
any form of large-scale data or information processing (collection, extraction, warehousing,
analysis, and statistics) as well as any application of computer decision support system,
including artificial intelligence (e.g., machine learning) and business intelligence.
The book Data mining: Practical machine learning tools and techniques with Java[8] (which
covers mostly machine learning material) was originally to be named just Practical machine
learning, and the term data mining was only added for marketing reasons.[9] Often the more
general
terms
(large
scale) data
analysis and analytics—or,
when
referring
to
actual
methods, artificial intelligence and machine learning—are more appropriate. The actual data
mining task is the semi-automatic or automatic analysis of large quantities of data to extract
previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual
records (anomaly detection), and dependencies (association rule mining, sequential pattern
mining). This usually involves using database techniques such as spatial indices. These patterns
can then be seen as a kind of summary of the input data, and may be used in further analysis or,
for example, in machine learning and predictive analytics. For example, the data mining step
might identify multiple groups in the data, which can then be used to obtain more accurate
prediction results by a decision support system. Neither the data collection, data preparation, nor
result interpretation and reporting is part of the data mining step, but do belong to the overall KDD
process as additional steps. The difference between data analysis and data mining is that data
analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a
marketing campaign, regardless of the amount of data; in contrast, data mining uses machine
learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.
The related terms data dredging, data fishing, and data snooping refer to the use of data mining
methods to sample parts of a larger population data set that are (or may be) too small for reliable
Dept. of CSE, MGI-COET, Shegaon
5
Personality Development Using Machine Learning
(2022-2023)
statistical inferences to be made about the validity of any patterns discovered. These methods can,
however, be used in creating new hypotheses to test against the larger data populations.
1.5.1 Algorithm in Data Mining
An algorithm in data mining (or machine learning) is a set of heuristics and calculations that
creates a model from data. To create a model, the algorithm first analyzes the data you provide,
looking for specific types of patterns or trends. The algorithm uses the results of this analysis over
many iterations to find the optimal parameters for creating the mining model. These parameters
are then applied across the entire data set to extract actionable patterns and detailed statistics.
The mining model that an algorithm creates from your data can take various forms, including:
 A set of clusters that describe how the cases in a dataset are related.
 A decision tree that predicts an outcome, and describes how different criteria affect that
outcome.
 A mathematical model that forecasts sales.
 A set of rules that describe how products are grouped together in a transaction, and the
probabilities that products are purchased together.
The algorithms provided in SQL Server Data Mining are the most popular, well-researched
methods of deriving patterns from data. To take one example, K-means clustering is one of the
oldest clustering algorithms and is available widely in many different tools and with many
different implementations and options. However, the particular implementation of K-means
clustering used in SQL Server Data Mining was developed by Microsoft Research and then
optimized for performance with SQL Server Analysis Services. All of the Microsoft data mining
algorithms can be extensively customized and are fully programmable, using the provided APIs.
1.5.2 Choosing an Algorithm by Type
SQL Server Data Mining includes the following algorithm types:

Classification algorithms predict one or more discrete variables, based on the other attributes
in the dataset.

Regression algorithms predict one or more continuous numeric variables, such as profit or
loss, based on other attributes in the dataset.
Dept. of CSE, MGI-COET, Shegaon
6
Personality Development Using Machine Learning

(2022-2023)
Segmentation algorithms divide data into groups, or clusters, of items that have similar
properties.

Association algorithms find correlations between different attributes in a dataset. The most
common application of this kind of algorithm is for creating association rules, which can be
used in a market basket analysis.

Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a
series of clicks in a web site, or a series of log events preceding machine maintenance.
However, there is no reason that you should be limited to one algorithm in your solutions.
Experienced analysts will sometimes use one algorithm to determine the most effective inputs
(that is, variables), and then apply a different algorithm to predict a specific outcome based on that
data. SQL Server Data Mining lets you build multiple models on a single mining structure, so
within a single data mining solution you could use a clustering algorithm, a decision trees model,
and a Naïve Bayes model to get different views on your data. You might also use multiple
algorithms within a single solution to perform separate tasks: for example, you could use
regression to obtain financial forecasts, and use a neural network algorithm to perform an analysis
of factors that influence forecasts.
Dept. of CSE, MGI-COET, Shegaon
7
Personality Development Using Machine Learning
(2022-2023)
CHAPTER 2
LITERATURE SURVEY
2. Literature Survey
“Using an Affective Computing Taxonomy Management System to Support Data Management in
Personality Traits” [1] states that Affective Computing is a rather new and multidisciplinary
research field that seeks sophisticated automation in emotion detection for later analysis.
However, the automated emotion detection and analysis require as well comprehensive data
management support, e.g. to keep control of data produced, and to enable its efficient reuse
through classification with established terminology. This paper contributes to data management
aspects in Affective Computing and to automation support in emotion classification on the basis of
a personal traits analysis. Hence, it describe the implementation of a taxonomy management
system, derived from requirements of a case study that investigates the relationship between
personality and emotions in Affective Computing. The study makes use of machine learning
software developed by Sense Care, an EU- funded R&D project that applies Affective Computing
to enhance and advance future healthcare processes and Systems “Agile Person Identification
through Personality Test and kNN Classification Technique” states that Agile methodology is a
famous software development methodology. The methodology stresses on adaptation and
collaboration between people. Here, software project managers should agree to an idea of putting
the right people in the right jobs. This research puts forward an idea of applying Big Five
Personality Traits to predict how people suitable for the Agile methodology. A predicting method
is driven by using kNearest Neighbour (kNN) classification technique. Results of a pilot study are
presented and shown that the selected classification technique can be used for the prediction.
“Happiness Recognition from Mobile Phone Data” states that first evidence that daily happiness
of individuals can be automatically recognized using an extensive set of indicators obtained from
the mobile phone usage data (call log, sms and Bluetooth proximity data) and “background noise”
indicators coming from the weather factor and personality traits. Final machine learning model,
based on the Random Forest classifier, obtains an accuracy score of 80.81% for a 3-class daily
happiness recognition problem. Moreover, it identify and discuss the indicators, which have strong
Dept. of CSE, MGI-COET, Shegaon
8
Personality Development Using Machine Learning
(2022-2023)
predictive power in the source and the feature spaces, discuss different approaches, machine
learning models and provide an insight for future research.
“Machine Prediction of Personality from Facebook Profiles” states that An increasing
number of Americans use social networking sites such as Facebook, but few fully appreciate the
amount of information they share with the world as a result. Although studies exist on the sharing
of specific types of information (photos, posts, etc.), one area that has been less explored is how
Facebook profiles can share personality information in a broad, machine-readable fashion. In this
study, it apply data mining and machine learning techniques to predict users’ personality traits
(specifically, the traits of the Big Five personality model) using only demographic and text-based
attributes extracted from their profiles. Then use these predictions to rank individuals in terms of
the five traits, predicting which users will appear in the top or bottom 5% or 10% of these traits.
Results show that when using certain models, can find the top 10% most Open individuals with
nearly 75% accuracy, and across all traits and directions, we can predict the top 10% with at least
34.5% accuracy (exceeding 21.8%, which is the best accuracy when using just the bestperforming profile attribute). These results have privacy implications in terms of allowing
advertisers and other groups to focus on a specific subset of individuals based on their personality
traits. [2]
Personality is a unique trait that distinguishes an individual. It includes an ensemble of
peculiarities on how people think, feel, and behave that affects the interactions and relationships
of people. Personality is useful in diverse areas such as marketing, training, education, and human
resource management. There are various approaches for personality recognition and different
psychological models. Preceding work indicates that linguistic analysis is a promising way to
recognize personality. In this work, a proposal for personality recognition relying on the
dominance, influence, steadiness, and compliance (DISC) model and statistical methods for
language analysis is presented. To build the model, a survey was conducted with 120 participants.
The survey consisted in the completion of a personality test and handwritten paragraphs. The
study resulted in a dataset that was used to train several machine learning algorithms. It was found
that the AdaBoost classifier achieved the best results followed by Random Forest. In both cases a
feature selection preprocess with Pearson’s Correlation was conducted. AdaBoost classifier
obtained the average scores: accuracy = 0.782, precision = 0.795, recall = 0.782, F-measure =
0.786, receiver operating characteristic (ROC) area = 0.939. Personality has been recognized as a
driver of decisions and behavior; it consists of singular characteristics on how individuals think,
feel, and behave. Understanding personality provides a way to comprehend how the different traits
Dept. of CSE, MGI-COET, Shegaon
9
Personality Development Using Machine Learning
(2022-2023)
of an individual merge as a unit, since personality is a mixture of traits and behavior that people
have to cope with situations. Personality influences selections and decisions (e.g., movies, music,
and books). Personality guides the interactions among people, relationships, and the conditions
around them. Personality has been shown to be related to any form of interaction. In addition, it
has been shown to be useful in predicting job satisfaction, success in professional relationships,
and even preference for different user interfaces. Previous research on user interfaces and
personality has found more receptiveness and confidence in users when the interfaces take
personality into account. When personality is predicted from the social media profile of users,
applications can use it to personalize presentations and messages [3].
Researchers have recognized that every person has a personality that usually remains
consistent over time. Consequently, personality assessment can be used as an important measure.
Various psychological models of personality have been proposed, such as the Five-factor model,
the psychoticism, extraversion, and neuroticism (PEN) model, the Myers–Briggs type inventory,
and the dominance, influence, steadiness, and compliance (DISC) model. Typically, these models
propose direct methods such as questionnaires to recognize personality. Conversely, linguistic
analysis can be used to detect personality. Linguistic analysis can produce useful patterns for
establishing relationships between writing characteristics and personality. Researchers in natural
language processing have proposed several methods of linguistic analysis to recognize
personality, and machine learning has been one of the most investigated approaches. Machine
learning techniques are useful in the recognition of personality since they provide mechanisms to
automatize processes that are based on a set of examples. Several proposals for personality
recognition based on machine learning can be found in the literature. Machine learning algorithms
use computational methods to learn directly from data without relying on a predetermined
equation as a model. The algorithms adaptively improve their performance as the number of
instances available for learning increases. Several efforts in personality prediction from the
linguistic analysis approach have been carried out. However, they have focused mostly on the
English language and are based on the five-factor model. This model (also called big five model)
has been used as a standard for applications that need personality modeling. To contribute to the
advancement and understanding of the relationship between personality and language, we have
developed a predictive model for personality recognition based on the DISC personality model
and a machine learning approach. We performed a personality survey with 120 participants.
Recently, Cognitive-based Sentiment Analysis with emphasis on automatic detection of
user behaviour, such as personality traits, based on online social media text has gained a lot of
Dept. of CSE, MGI-COET, Shegaon
10
Personality Development Using Machine Learning
(2022-2023)
attention. However, most of the existing works are based on conventional techniques, which are
not sufficient to get promising results. In this research work, we propose a hybrid Deep Learningbased model, namely Convolutional Neural Network concatenated with Long Short-Term
Memory, to show the effectiveness of the proposed model for 8 important personality traits
(Introversion-Extroversion, Intuition-Sensing, Thinking- Feeling, Judging-Perceiving). We
implemented our experimental evaluations on the benchmark dataset to accomplish the personality
trait classi_cation task. Evaluations of the datasets have shown better results, which demonstrates
that the proposed model can effectively classify the user's personality traits as compared to the
state-of-the-art techniques. Finally, we evaluate the effectiveness of our approach through
statistical analysis. With the knowledge obtained from this research, organizations are capable of
making their decisions regarding the recruitment of personals in an efficient way. Moreover, they
can implement the information obtained from this research as best practices for the selection,
management, and optimization of their policies, services, and products.
Cognitive Science is a multidisciplinary area of research that aims at addressing different
cognitive processes and mental states, including learning, thinking, perception, remembering, and
emotions. Among the aforementioned types of cognition, personality plays a pivotal role in
identifying the social behaviour of humans. Computer-based personality detection and
classification have remained an active area of research for a long time. Personality detection can
be performed using multiple media, such as text, images, video, and audio. Cognitive-based
sentiment classification from social media-text is an area that originates from several challenges
for researchers in cognitive computation. In this area, a lot of work has been done and much more
can be investigated. Textual cognitive-based sentiment analysis (SA) is not merely a theoretical
area, rather it has several applied The associate editor coordinating the review of this manuscript
and approving it for publication was Hiram Ponce fields, such as health, education, finance, and
others. Being a merger of cognitive science and human neurology, it can address the gap between
the abstractions of cognitive science and the more emerging area of personality detection from a
person's textual feedback expressed on social media. Social media platforms, such as Twitter,
Facebook, and Instagram, have experienced an unexpected worldwide spread in recent years. For
example, by the 3rd quarter of 2019, Twitter had over 330 million active users per month. Progress
in natural language processing and text analytics gives researchers an opportunity to use big-data
sources for extracting and analysing the textual personality traits expressed by users while using
social media, as long as the data scientists working on social media content are able to address the
challenging issues specific to such content. In the last few years, cognitive-based SA applications
Dept. of CSE, MGI-COET, Shegaon
11
Personality Development Using Machine Learning
(2022-2023)
have become popular among online communities for knowing about the opinions and personality
traits of individuals pertaining to different issues, policies, and others. However, due to the diverse
nature of social media content, it is tedious to analyse text using existing techniques to detect
personality traits from such content. Therefore, extraction and analysis of social media content has
become essential through automatic classification of personality traits. A lot of work has been
carried out in the fields of text-based SA lexicon construction, cognition, aspect-based SA, and
visual SA. However, more work needs to be done in the context of cognitive-based social media,
with an emphasis on extracting and classifying personality traits from social media content. The
aforementioned issues often result in incorrect classification of cognitive-based sentiment
classification of social media content. Therefore, it is necessary to develop a method to classify
personality traits in social media by automatic classification of such content.
Deep learning is comprised of a group of algorithms that mimic the functionality and
structure of the human brain. In simple terms, it contains a set of neurons to receive input and also
a set of neurons to transmit output signals. The deep learning-based models can assist with
different tasks like speech recognition, computer vision, natural language processing, and
handwriting generation. A model has been proposed for the human Big Five personality traits
prediction, which needs 8 times less data. An embedding layer used for word extraction
underlying user tweets is named the GloVe model. The model's training and testing are performed
using the provided Twitter data. Moving forward, the testing of data is performed on three fusions
(i) LIWC along with GP, (ii) 3-Gram along with GP, and (iii) GloVe along with RR. The
proposed model outperformed the state-of- the-art work with a mean correlation of 0.33 across the
Big Five traits. The present work exploited only English Twitter content that needs to be extended
to further languages. Furthermore, the proposed model's efficiency can be estimated by an
extended number of tweets. A personality identification approach is applied to text data by
exploiting a deep neural network. The present work used a hierarchical scheme called AttRCNN
that is capable of retaining semantic features at a deeper level. The results reveal that the proposed
features effectively perform better than the compared features. [4]
The increasing availability of high-dimensional, fine-grained data about human behavior,
gathered from mobile sensing studies and in the form of digital footprints, is poised to drastically
alter the way personality psychologists perform research and undertake personality assessment.
These new kinds and quantities of data raise important questions about how to analyze the data
and interpret the results appropriately. Machine learning models are well-suited to these kinds of
data, allowing researchers to model highly complex relationships and to evaluate the
Dept. of CSE, MGI-COET, Shegaon
12
Personality Development Using Machine Learning
(2022-2023)
generalizability and robustness of their results using resampling methods. The correct usage of
machine learning models requires specialized methodological training that considers issues
specific to this type of modeling. Here, we first provide a brief overview of past studies using
machine learning in personality psychology. Second, we illustrate the main challenges that
researchers face when building, interpreting, and validating machine learning models. Third, we
discuss the evaluation of personality scales, derived using machine learning methods. Fourth, we
highlight some key issues that arise from the use of latent variables in the modeling process. We
conclude with an outlook on the future role of machine learning models in personality research
and assessment. One debatable way for researchers to boost the reported performance of ML
models is by using classification instead of regression methods. In the analysis of our data
reported above, we fitted a regression random forest, thus predicting continuous values for the
outcome variable extraversion. However, a lot of work in personality computing has focused on
predicting classes (i.e., “low” vs. “high”) of personality traits, rather than continuous trait scores.
This decision to focus on classes can pose a problem when the rationale for creating discrete
classes is not fully transparent. In binary classification, the two classes are often generated around
some fixed central tendency estimate (e.g., median), obtained from the sample under investigation.
In some cases, an arbitrary dividing point is used (e.g., determine that the midpoint of a five-point
rating scale is assigned to the “low” vs. the “high” class), leaving open the possibility that the
decision was made to maximize reported performance [3]
Personality can be defined as the combination of behavior, emotion, motivation, and thoughts that
aim at describing various aspects of human behavior based on a few stable and measurable
characteristics. Considering the fact that our personality has a remarkable influence in our daily
life, automatic recognition of a person's personality attributes can provide many essential practical
applications in various aspects of cognitive science. Although various methods have been recently
proposed for the task of personality recognition, most of them have mainly focused on humandesigned statistical features and they did not make use of rich semantic information existing in
users' generated texts while not only these contents can demonstrate its writer's internal thought
and emotion but also can be assumed as the most direct way for people to state their feeling and
opinion in an understandable form. In order to make use of this valuable semantic information as
well as overcoming the complexity and handcraft feature requirement of previous methods, a deep
learning based method for the task of personality recognition from text is proposed in this paper.
Among various deep neural networks, Convolutional Neural Networks (CNN) have demonstrated
profound efficiency in natural language processing and especially personality detection. Owing to
Dept. of CSE, MGI-COET, Shegaon
13
Personality Development Using Machine Learning
(2022-2023)
the fact that various filter sizes in CNN may influence its performance, we decided to combine
CNN with AdaBoost, a classical ensemble algorithm, to consider the possibility of using the
contribution of various filter lengths and gasp their potential in the final classification via
combining various classifiers with respective filter size using AdaBoost. Our proposed method
was validated on the Essay dataset by conducting a series of experiments and the empirical results
demonstrated the superiority of our proposed method compared to both machine learning and deep
learning methods for the task of personality recognition. By taking the significance of textual data
into account, a small number of studies have focused on using text generated by people to predict
their personality. In this regard, machine learning based methods have been also utilized but their
obtained results were not satisfactory because the majority of them were based on statistical or
hand-craft linguistic features and were not able to consider the rich user-generated textual
information and extract features from them automatically while these words and text are the most
valuable features for determining the emotion and personality.
By the development of deep neural networks, they demonstrated remarkable performance in
various Natural Language Processing (NLP) tasks including opinion mining and sentiment
analysis. It must be noted that personality recognition is very similar to NLP applications while
they both focus on mining users' attributes from texts. Accordingly, employing powerful text
modeling techniques that have been efficiently utilized in the NLP domain can be the most
intuitive and straightforward idea for improving the performance of personality recognition.
Having the mentioned limitations besides the potential of deep learning in our mind, we proposed
a deep learning based method for personality recognition that tries to make use of both
Convolutional Neural Network (CNN) and AdaBoost algorithm. Although CNN has been
successfully utilized for various NLP tasks and extracting local features can be considered as its
potential, using various filter lengths may have a negative influence on the efficiency of the CNN
classifier. To this end, we decided to combine CNN with AdaBoost algorithm to investigate the
possibility of leveraging the contribution of different filter lengths and gasp their potential for
personality recognition by combining classifiers with respective filter sizes. The reason behind
choosing AdaBoost is that it is a Meta algorithm that can be used in conjunction with other
learning algorithms to improve classification accuracy. Based on this algorithm, the classification
of each new stage is adjusted in favor of incorrectly classified samples in the previous stages. In
fact, with the help of AdaBoost algorithm, the classification process is repeated until the
classification error is minimized. [4]
Dept. of CSE, MGI-COET, Shegaon
14
Personality Development Using Machine Learning
(2022-2023)
Social media is one of the most popular platforms and people from all the diverse fields such as
students or professionals explore social media daily. It is a platform where people are available
from different cultures and religions. With the advancement of technologies in every field of life,
there is an increased demand for social media. Whenever people go online they generate rich data
through their smartphones or internet pads. Their texting style, taste in music, books, likes,
dislikes, sharing posts reveal their personality, therefore social media is an ideal platform to study
the human personality. Personality has been considered as an essential factor and it is a
combination of different attributes that make a person unique from one another. In our proposed
work, we used Twitter data and my Personality datasets to perform an objective assessment using
a deep sequential neural network and multi-target regression model for predicting personality
traits. The proposed algorithm is based on the Five-Factor Model (Openness, Conscientiousness,
Extraversion, Agreeableness, Neuroticism). The efficacy of the proposed technique has been
measured by MSE, MAE, Precision, Recall, and F1-Score. Experimental results show that our
model is robust and it has outperformed the existing techniques to predict human personality
traits.
The Five-Factor model is one of the well-established models to recognize personality. It uses
words to identify personality and analyze in which trait a person fits. It characterizes a person into
five traits i-e agreeableness, conscientiousness, extraversion, neuroticism, and openness.
Fig. 1.1 Five-Factor Model attributes
Dept. of CSE, MGI-COET, Shegaon
15
Personality Development Using Machine Learning
(2022-2023)
The Five-Factor model is also called the “BIG5” or “Ocean” model. According to Hirschfeld, the
Five-Factor Model provides the prestigious dimensions of personality and the five traits of “FFM”
or “Big Five” can be stated below:
O- Openness: Openness is a dimension of the Five-Factor Model. Openness relates to an
individual who is creative, imaginative, artistic, understanding, curious, politically liberal,
traditional, competitors, and love to travel new places. They are successful if they pursue the field
of an accountant, auditor, judge, and financial manager.
C- Conscientiousness: Conscientiousness is another dimension or trait of the Five-Factor Model
which has two basic features dependability and accomplishment. The highly conscientious people
are well planned, organized, dutiful, reliable, purposeful, impulse control, workaholic, selfdisciplined, determined, and confident. They tend to be less bound by plans and rules but more
tolerant.
E- Extraversion: Extroversion is a classic dimension of the Five-Factor Model. The highly
extroverted people are affectionate, friendly, excitement seeker, energetic, assertive, optimistic,
outgoing, charismatic, and talkative. Most of the extrovert gain energy from their surroundings.
They become successful in the future if they pursue politics and sales as their career.
A- Agreeableness: Agreeableness is a dimension that is concerned with the nature of one's
associations with others. The highly agreeable people are cooperative, courteous, friendly,
trustworthy, kind, tolerant, pardoning, polite nature, peacekeeper, calm, imaginative, caring, and
raise new ideas. They ignore their needs for others, they are good team worker but don't work as a
leader. They are easily influenced, adopt group opinions, working in the background, and keeping
a positive relationship with others.
N- Neuroticism: Neuroticism is another classic dimension of the Five-Factor Model and they are
reversely referred to as “Emotional Stability”. The individuals who have a higher degree of
neuroticism are equivalent to emotional uncertainty, pessimistic, sensitive, insecure, unstable,
nervous, easily depressed, vulnerable, and irritated, shocked easily, moody, experience negative
emotions like (anxiety, anger, and depression) and never satisfied with their lives. They pursue
pilot, engineering, and manager as their career.
Dept. of CSE, MGI-COET, Shegaon
16
Personality Development Using Machine Learning
(2022-2023)
The Five-Factor model is one of the reliable, predictive, and efficient personality assessment
models. Nowadays with the advancement of the technical world, people mostly communicate on
social media sites. The Digital generation spending more time on social media and every
individual has their account on social networking sites like Twitter, Facebook, Instagram,
Whatsapp, and the list goes on and on through which they communicate with each other.
Whenever people go online they generate data through social media or by using their cell phones.
The language that is used by People on social media is full of psychological content by using it
generates a valid and fast personality assessment. In our research, we used users generated content
to insight the personality traits of users without having them fill out any questionnaire. We are
interested in words used by the users on their profiles to predict their personality traits. We used
two different datasets for our work: a my Personality dataset collected from Facebook (labeled
data) and a Twitter dataset collected from user's Twitter profiles (extract 3200 tweets of a single
user to predict their personality traits).[5]
Dept. of CSE, MGI-COET, Shegaon
17
Personality Development Using Machine Learning
(2022-2023)
CHAPTER 3
SYSTEM DEVELOPMENT
3.1 Existing System
In the existing system fact that people can identify other’s personality with social media profiles
or text message, and some characteristics of Social media applications messages are used by
people to detect others’ personalities but the overlap between social media features that contain
the actual personality cues and features used by people to form personality detection does not have
to be accurate. The probability of missing or misinterpreting the real traits of a person is high,
People tend to carried away by the irrelevant traits in one’s personality rather than classifying
them with actual traits. Humans are commonly prone to biases and prejudices which may affect
the accuracy of their judgments. Also, certain features of the social media text data are difficult for
humans to grasp.
3.2 Existing Technology or Algorithms
In terms of supporting the complex and interdisciplinary knowledge domain of AC in SenseCare,
the taxonomy management system achieved three goals.
(1) The first goal is the development and management of initial emotion taxonomies. Hence,
several taxonomies were imported to the Taxonomy Manager. Examples are the Sentient 26
Emotional Taxonomy, which is an emotional motivation framework for understanding consumer
behaviour, and parts of the WHO’s ICD-10 classification, which classify mental disorders.
Common taxonomies like these two exemplar ones would allow sharing and comparing
information easier by offering standard vocabularies and formats.
(2) Furthermore, these two taxonomies along with others from different knowledge domains are
used to classify scientific content of the SenseCare AC domain stored in the KM-EP’s digital
library, such as e.g. publications, multimedia, and person with dementia records. As a result, the
content can easily be managed and found, which is the second goal of the Taxonomy Manager in
the SenseCareKM-EP.
(3) Finally, the analysis results produced by the emotion detection platform [1] will also have to
be indexed using similar taxonomies from the Taxonomy Manager. This demonstrates that the
Taxonomy Manager not only can be used to collect, classify, and provide access to materials of
Dept. of CSE, MGI-COET, Shegaon
18
Personality Development Using Machine Learning
(2022-2023)
the initial emotion analysis and its results but also supports the work of psychology experts in a
follow up study aiming at training machine learning components to classify personality traits from
vectors of initial emotion classification features. This work could be much more costly without the
classification, annotation, and access support of the Taxonomy Manager in the SenseCare KMEP
supporting scientific research in the domain of AC.
3.3 Hardware and Software Requirements
Software Requirements

Python

Software libraries
Pickle, Argparse, Sys, Numpy, Tensorflow, Tqdm

Interface standards
Django Framework
Hardware Requirements

CORE I5 PROCESSOR

8 GB Ram / 500GB Hardisk
3.4 Proposed System Design
1. To develop an algorithm for unstructured Text Analysis Mechanism
2. To study Image Processing and select the optimal detection method for extracting from the
input modality
3. To compare and contrast the correlation between different modalities
4. To develop an algorithm for fusing different modalities
5. To design a system to improve performance of Multi Class classification in Personality
Prediction Analysis
Dept. of CSE, MGI-COET, Shegaon
19
Personality Development Using Machine Learning
(2022-2023)
Fig 3.1 System Design for Personality Prediction System
3.5 Proposed Algorithms (Implementation Details (Modules))
The following sections describe each of these steps in more detail.

Store Data related to personality traits in database
The personality characteristics are stored in database. Later, when user enters his personality
characteristics his personality is examined in large pre-existing databases and system will
detect the personality of the user.

Collect associated personality characteristics for each participant;
Each user will enter his personality characteristics than system will detect the personality of
the user, based on the previous data stored in database.

Extract relevant features from the texts
System will extract relevant features from the text entered by the user. System will compare
this text with data stored in database. After comparison, system will specify the personality
of the user.

Display features relevant to his personality traits
System will examine the personality of the user based on the personality traits mentioned by
the user. And will provide user with various features which is relevant to his personality
traits.
Dept. of CSE, MGI-COET, Shegaon
20
Personality Development Using Machine Learning

(2022-2023)
Personality Traits Comparison
The relation between personality and user behavior is tested. The hypothesis is that
conscientiousness, agreeableness and neuroticism predict unique variance attitudes.
Fig 3.2 Algorithm explanation with classification algorithm
To overcome the problems of the existing system a personality classification system is proposed
which uses some data mining techniques and machine learning algorithms are used to classify the
personalities of different users and by using different algorithms like Big Five Personality Model,
Logistic regression, Decision Tree and Support Vector Machine. By identifying the past data and
their patterns it is easy to identify the personality by applying new techniques, so it overcomes the
existing system.
3.6
Datasets
(https://www.kaggle.com/code/yonatanilan/big-five-traits-with-personality-labels/data)
Data set mainly consists of single statistical data matrix, in which every column represents a
specific variable and the row represents the possible combinations of answers for the questions. In
this project, the dataset consists of values or responses answered by the user for the given set of
questions, user personality, best career option. The responses are compared with the already
existing training set. User personality such as
Extraversion (E) or
Dept. of CSE, MGI-COET, Shegaon
21
Personality Development Using Machine Learning
(2022-2023)
Introversion (I), or
Sensing (S) or
Intuitive (I), or
Feeling (F) or
Thinking (T),
Judging (J) or
Perceiving (P) are taken.
With the help of these personalities the best career for each personality can be predicted, i.e. if the
person has a combination of ESTJ can be a chef. Kaggle supports a variety of dataset publication
formats, but we strongly encourage dataset publishers to share their data in an accessible, nonproprietary format if possible. Not only are open, accessible data formats better supported on the
platform, they are also easier to work with for more people regardless of their tools.
3.6.1 Supported File Types( CSVs)
The simplest and best-supported file type available on Kaggle is the “Comma-Separated List”, or
CSV, for tabular data. CSVs uploaded to Kaggle should have a header row consisting of humanreadable field names. A CSV representation of a shopping list with a header row, for example,
looks like this:
id,type,quantity
0,bananas,12
1,apples,7
CSVs are the most common of the file formats available on Kaggle and are the best choice for
tabular data.
On the Data tab of a dataset, a preview of the file’s contents is visible in the data explorer. This
makes it significantly easier to understand the contents of a dataset, as it eliminates the need to
open the data in a Notebook or download it locally. CSV files will also have associated column
descriptions and column metadata. The column descriptions allows you to assign descriptions to
individual columns of the dataset, making it easier for users to understand what each column
means. Column metrics, meanwhile, present high-level metrics about individual columns in a
graphic format.
Dept. of CSE, MGI-COET, Shegaon
22
Personality Development Using Machine Learning
(2022-2023)
3.6.2 Data pre-processing
Data preprocessing is a process of preparing the raw data and making it suitable for a machine
learning model. It is the first and crucial step while creating a machine learning model. When
creating a machine learning project, it is not always a case that we come across the clean and
formatted data. And while doing any operation with data, it is mandatory to clean it and put in a
formatted way. So for this, we use data preprocessing task.
Data preprocessing is a process of preparing the raw data and making it suitable for a machine
learning model. It is the first and crucial step while creating a machine learning model. When
creating a machine learning project, it is not always a case that we come across the clean and
formatted data. And while doing any operation with data, it is mandatory to clean it and put in a
formatted way. So for this, we use data preprocessing task.
A real-world data generally contains noises, missing values, and maybe in an unusable format
which cannot be directly used for machine learning models. Data preprocessing is required tasks
for cleaning the data and making it suitable for a machine learning model which also increases the
accuracy and efficiency of a machine learning model.
It involves below steps:

Getting the dataset

Importing libraries

Importing datasets

Finding Missing Data

Encoding Categorical Data

Splitting dataset into training and test set

Feature scaling

To perform the following operation Pandas is the libraries used in machine learning and
working of it is a s follows:

Pandas is an open source library in Python. It provides ready to use high-performance data
structures and data analysis tools.

Pandas module runs on top of NumPy and it is popularly used for data science and data
analytics.

NumPy is a low-level data structure that supports multi-dimensional arrays and a wide range
of mathematical array operations. Pandas has a higher-level interface. It also provides
streamlined alignment of tabular data and powerful time series functionality.
Dept. of CSE, MGI-COET, Shegaon
23
Personality Development Using Machine Learning

(2022-2023)
Data Frame is the key data structure in Pandas. It allows us to store and manipulate tabular
data as a 2-D data structure.

Pandas provides a rich feature-set on the Data Frame. For example, data alignment, data
statistics, slicing, grouping, merging, concatenating data, etc.

All the information in English go through pre-processing level before getting processed. Preprocessing is used to remove all the lower case, symbols, names, spaces etc. for example any
word goes through pre-processing stage and after this word will be processed and converted
into English.
3.7 Feasibility study
Our Proposed system will provide information about the personality of the user. Based on the
personality traits provided by the user, System will match the personality traits with the data
stored in database. System will automatically classify the user’s personality and will match the
pattern with the stored data. System will examine the data stored in database and will match the
personality traits of the user with the data in database. Than system will detect the personality of
the user. Based on the personality traits of the user, system will provide other features that are
relevant to the user’s personality.

Economic Feasibility
This system will help advertisement people to market their products based on the personality
of the user which in turn provide income to the firm who is using this system. This system can
be embedded with social sites, as many users can buy and sell their product using these social
networks.

Operational Feasibility
This system is more reliable, maintainable, affordable and producible. These are the
parameters which are considered during design and development of this project. During
design and development phase of this project there was appropriate and timely application of
engineering and management efforts to meet the previously mentioned parameters.

Technical Feasibility
The back end of this project is Python which processed data related to personality traits and
other details which is related to this project. There are basic requirement of hardware to run
Dept. of CSE, MGI-COET, Shegaon
24
Personality Development Using Machine Learning
(2022-2023)
this application. This system is developed in Django Framework using python libraries. This
system can be accessed by using any device like (Personal Computers, Laptop and with some
hand held devices).
3.8
Django Framework
Django is a high-level Python web framework that encourages rapid development and clean,
pragmatic design. Built by experienced developers, it takes care of much of the hassle of web
development, so you can focus on writing your app without needing to reinvent the wheel. It’s free
and open source. Because Django was developed in a fast-paced newsroom environment, it was
designed to make common web development tasks fast and easy. Here’s an informal overview of
how to write a database-driven web app with Django. Once your models are defined, Django can
automatically create a professional, production ready administrative interface – a website that lets
authenticated users add, change and delete objects.
3.8.1 Configuration and Conventions
Naming of variables is one of the most complex parts of development. Django has many
configuration values, with sensible defaults, and a few conventions when getting started. by
convention, templates and static files are stored in subdirectories within the application’s Python
source tree, with the names templates and static respectively. While this can be changed, you
usually don’t have to, especially when getting started. Once you have Django and running, you’ll
find a variety of extensions available in the community to integrate your project for production. As
your codebase grows, you are free to make the design decisions appropriate for your project.
Django will continue to provide a very simple glue layer to the best that python has to offer.
Django currently supports two interfaces: WSGI and ASGI.

WSGI is the main Python standard for communicating between Web servers and
applications, but it only supports synchronous code.

ASGI is the new, asynchronous-friendly standard that will allow your Django site to use
asynchronous Python features, and asynchronous Django features as they are developed.
Implementation for WSGI is authenticating for generation of interface link for framework. It can
implement advanced patterns in Django-rest-framework libraries which introduce non-relational
data persistence as appropriate, and take advantage of framework-agnostic tools built for WSGI,
the Python web interface.
Dept. of CSE, MGI-COET, Shegaon
25
Personality Development Using Machine Learning
(2022-2023)
3.8.2 Support Vector Machine
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning. The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-dimensional space into classes so that we can easily
put the new data point in the correct category in the future. This best decision boundary is called a
hyperplane. SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine.
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Hyperplane:
There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points.
This best boundary is known as the hyperplane of SVM. The dimensions of the hyperplane
depend on the features present in the dataset, which means if there are 2 features (as shown in
image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane
will be a 2-dimension plane. We always create a hyperplane that has a maximum margin,
which means the maximum distance between the data points.

Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
3.8.3 Logistic Regression

Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.

Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
Dept. of CSE, MGI-COET, Shegaon
26
Personality Development Using Machine Learning

(2022-2023)
Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.

In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).

The curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.

Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.

Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below image
is showing the logistic function:
Fig. 3.3 Logistic regression graphical representation
Logistic Function (Sigmoid Function):

The sigmoid function is a mathematical function used to map the predicted values to
probabilities.

It maps any real value into another value within a range of 0 and 1.

The value of the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or
the logistic function.
Dept. of CSE, MGI-COET, Shegaon
27
Personality Development Using Machine Learning

(2022-2023)
In logistic regression, we use the concept of the threshold value, which defines the probability
of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the
threshold values tends to 0.
Assumptions for Logistic Regression:
o
The dependent variable must be categorical in nature.
o
The independent variable should not have multi-collinearity.
3.8.4 Random Forest
Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification
and Regression problems. It builds decision trees on different samples and takes their majority
vote for classification and average in case of regression. One of the most important features of the
Random Forest Algorithm is that it can handle the data set containing continuous variables as in
the case of regression and categorical variables as in the case of classification. It performs better
results for classification problems. Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for both Classification and Regression
problems in ML. It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the model. As
the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output. The
greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
Below are some points that explain why we should use the Random Forest algorithm:

It takes less training time as compared to other algorithms.

It predicts output with high accuracy, even for the large dataset it runs efficiently.

It can also maintain accuracy when a large proportion of data is missing.
Steps involved in Support Vector Machine algorithm:

Step 1: In SVM number of fixed records are taken from the data set having k number of
records.
Dept. of CSE, MGI-COET, Shegaon
28
Personality Development Using Machine Learning
(2022-2023)

Step 2: Individual decision trees are constructed for each sample.

Step 3: Each decision tree will generate an output.

Step 4: Final output is considered based on Majority Voting or Averaging for Classification
and regression respectively.
Implementation Steps are given below:

Data Pre-processing step

Fitting the Logistic Regression algorithm to the Training set

Predicting the test result

Test accuracy of the result (Creation of Confusion matrix)

Visualizing the test set result.
3.8.4 Decision Tree

Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a treestructured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.

In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.

It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.

In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.

A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
Dept. of CSE, MGI-COET, Shegaon
29
Personality Development Using Machine Learning
(2022-2023)
Fig.3.4 Decision tree structure
Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.

Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.

Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.

Branch/Sub Tree: A tree formed by splitting the tree.

Pruning: Pruning is the process of removing the unwanted branches from the tree.

Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
Working of decision tree
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps to the next node.
Dept. of CSE, MGI-COET, Shegaon
30
Personality Development Using Machine Learning
(2022-2023)
For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

Step-3: Divide the S into subsets that contains possible values for the best attributes.

Step-4: Generate the decision tree node, which contains the best attribute.

Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Dept. of CSE, MGI-COET, Shegaon
31
Personality Development Using Machine Learning
(2022-2023)
CHAPTER 4
IMPLEMENTATION AND RESULT
4.1 Implementation Steps for execution of projects
1.
The editor we used here is VS Code which is an universal editor to perform the operations
related to any learning platform
2.
In VS Code an interpreter related to language need to install or provide the path to run the
python code.
3.
Basically for VS Code an interpreter is required to execute the python code
4.
Once the installation is done we need to trained the dataset for creation of model and to do so
we use Support Vector Machine for feature extraction and Logistic Regression for
classification
5.
For logistic regression we use
from sklearn.linear_model import LogisticRegression
6.
For Support Vector Machine we use
from sklearn.model_selection import cross_val_score
from sklearn import svm
7.
Processing of data was done using following code:
8. def
filter(label,gender,age,openness,neuroticism,conscientiousness,agreeableness,e
xtraversion):
9.
Daat splitting with various input was done and its filtration done as
# Serious label
serious_list_main
=
random_selection(len(serious_list),label,serious_list,len(responsible_list))
10. Same has to be done for other labels
Here in the system, the dataset consisting of various inputs of .csv file which trained to
generate the model with model.pkl extension
11. As the system need to be interactive we make the use of web based framework that was
applicable to run in python
12. For framework Django framework libraries and packages needed to install in VS Code for
python interpreter
Dept. of CSE, MGI-COET, Shegaon
32
Personality Development Using Machine Learning
(2022-2023)
13. Once the libraries installed for framework we need to activate the environment associated to
it. Installation of libraries was done done using ‘pip’ command and was written as
pip install [module name]
14. During execution of main file it will provide a back end virtual server with help of seaborn
libraries and a window got open in browser in localhost where we insert all the required fields
15. After filling the required fields the system will compared the input with model we created
early and predict the personality of an individual using
def predict_datapoint(model,datapoint):
temp=[]
if datapoint[0]=="Male":
temp.append(0)
else:
temp.append(1)
data = scaler.transform([datapoint[1:]])
temp = temp + data.reshape(data.shape[1]).tolist()
label = model.predict(np.array([temp]))
return label[0]
finaloutput = personality[predict_datapoint(model, data_point)]
return render(request, 'output.html',{'output' : finaloutput})
Dept. of CSE, MGI-COET, Shegaon
33
Personality Development Using Machine Learning
(2022-2023)
4.2 Screenshots
Fig. 4.1 VS code Editor Window for coding
Fig. 4. 2. Virtual environment activation
Dept. of CSE, MGI-COET, Shegaon
34
Personality Development Using Machine Learning
(2022-2023)
Fig. 4.3.Activating the dedicated environment
Fig. 4.4 Web page got opened
Dept. of CSE, MGI-COET, Shegaon
35
Personality Development Using Machine Learning
(2022-2023)
Fig.4.5 Web page got opened with questionaries’ test based on personality traits
Fig.4.6.Prediction based on questionaries’ about personalities
Dept. of CSE, MGI-COET, Shegaon
36
Personality Development Using Machine Learning
(2022-2023)
Fig.4.7 Prediction output based on input data
Dept. of CSE, MGI-COET, Shegaon
37
Personality Development Using Machine Learning
(2022-2023)
CHAPTER 5
CONCLUSION AND FUTURE WORK
5.1 Conclusion
Personality analysis and prediction has increased very much in the recent times. Extracting the
personality of the user using the current system is very much helpful in various fields, for instance,
recruitment process, medical counselling, and likewise. Personality detection from survey means
to extract the behavior characteristics of the users taking the survey. This paper focuses on
providing a state-of-art review of an emerging field i.e. personality detection from survey. This
paper also discusses the state-of-art methods for personality detection and prediction. With regard
to the objectives established previously, it can be concluded that all the requirements have been
successfully met, except for one: the requirement for the presentation of results in the mobile
application. To meet this requirement, it will be necessary to fully develop the classifier and the
prototype but this work is still in an early stage although it is a great starting point for future
analysis. A lot of work needs to be done prior having more conclusive results. However, so far, we
are proud to say that the first objective is achieved. A lot of time has been needed for collaborating
with psychologists, understanding the possible ways of identifying personalities and their link
with happiness.
5.2 Future Work
Recent multimodal deep learning techniques have performed well and are starting to make reliable
personality predictions. Deep learning offers away to harness the large amount of data and
computation power at our disposal with little engineering by hand. Various deep models have
become the new state-of-the-art methods not only for personality detection, but in other fields as
well. We expect this trend to continue with deeper models and new architectures which are able to
map very complex functions. We expect to see more personality detection architectures that rely
on efficient multimodal fusion. Finally, the neurotic people use terms like regret, loosing, confuse,
upset, mess, etc. It is obvious from the literature review that there has been tremendous work in
the field of personality assessment but it still needs more research as personality prediction is a
broad domain. In our research, We are predicting human personality through text analysis and the
work can be extended by using images, videos, audio content that the social media users share on
Dept. of CSE, MGI-COET, Shegaon
38
Personality Development Using Machine Learning
(2022-2023)
their accounts. Apart from Twitter, there are several social media sites like Instagram, Youtube,
Linkedlnthat can be used to explore personalities for the proposed technique. Social media users
belong to different cultures and religions so they speak different languages, therefore the language
barrier is one of the ultimate problems while predicting a user's personality.
Dept. of CSE, MGI-COET, Shegaon
39
Personality Development Using Machine Learning
(2022-2023)
REFERENCES
[1]
Machine Learning Approach for Personality Recognition in Spanish Texts Yasmín
Hernández * , Alicia Martínez *, Hugo Estrada, Javier Ortiz and Carlos Acevedo Appl. Sci.
2022, 12, 2985. https://doi.org/10.3390/app12062985 Received August 10, 2021, accepted
September 27, 2021, date of publication October 21, 2021, date of current version
November 3, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3121791
[2]
A Hybrid Deep Learning Technique for Personality Trait Classification From Text
HUSSAIN Ahmad1, Muhammad Usama Asghar 1, Muhammad Zubair Asghar 1, Aurangzeb
Khan2, And Amir H. Mosavi
[3]
Personality Research and Assessment in the Era of Machine Learning Clemens Stachl1, *,
Florian Pargent2, *, Sven Hilbert3, Gabriella M. Harari1, Ramona Schoedel2, Sumer
Vaid1,
Samuel
D.
Gosling4,5,
&
Markus
Bühner2
(2020)
per.2257.
https://doi.org/10.1002/per.2257
[4]
Using Machine Learning-Based Models for Personality Recognition Fatemeh Mohades
Deilami1, Hossein Sadr2,* , Mozhdeh Nazari3
[5]
International Journal Of Scientific & Technology Research Vo`Lume 10, Issue 05, MAY
2021 ISSN 2277-8616 139 IJSTR©2021 www.ijstr.org Analysis Of Personality Assessment
Based On The Five-Factor Model Through Machine Learning Noureen Aslam, Khalid
Masood Khan, Afrozah Nadeem,Sundus Munir, and JavairyaNadeem
[6]
Manasi Ombhase, Student, PCE, Prajakta Gogate, Student, PCE, Tejas Patil, Student, PCE,
Karan Nair, Student, PCE and Prof. Gayatri Hegde, Faculty, PCE, “Automated Personality
Classification Using Data Mining Techniques”
[7]
Sayali D. Jadhav1, H. P. Channe2“Comparative Study of K-NN, Naive Bayes and Decision
Tree Classification Techniques”, Department of Computer Engineering, Pune Institute of
Computer Technology, Savitribai Phule Pune University, Pune, India
[8]
Anisha Yata1, Prasanna Kante2, T Sravani3, B Malathi4,” Personality Recognition using
Multi-Label Classification”2018.
[9]
Veronica Ong, Anneke D. S. Rahmanto, Williem and Derwin Suhartono,” Exploring
Personality Prediction from Text on Social Media”: A Literature Review 2017.
[10] Tommy Tandera, Hendro, Derwin Suhartono*, Rini Wongso, and Yen Lina Prasetio
“Personality Prediction System from Facebook Users” Computer Science Department,
Dept. of CSE, MGI-COET, Shegaon
40
Personality Development Using Machine Learning
(2022-2023)
School of Computer Science, Bina Nusantara University, Jl. K. H. Syahdan No. 9
Kemanggisan, Jakarta 11480, Indonesia
[11] Avnish Kumar1, Akshat Gawankar2, Kunal Borge3 & Mr Nilesh M Patil4 .1 2 3B.E IT
Student, “Student Profile & Personality Prediction using Data Mining Algorithms”
Information Technology, Rajiv Gandhi Institute of Technology, Mumbai, India 4 Assistant
Professor, Information Technology, Rajiv Gandhi Institute of Technology, Maharashtra,
India
[12] Fazel Keshtkar, Candice Burkett, Haiying Li and Arthur C. Graesser, “Using Data Mining
Techniques to Detect the Personality of Players in an Educational Game”.
[13] Janhavi Pednekar1, Shraddha Dubey2 1,2Symbiosis, “Identifying Personality Trait using
Social Media”: A Data Mining Approach Institute of Computer Studies and Research,
Symbiosis International University, {janhavi. pednekar, shraddha.dubey}@sicsr.ac.in
[14] T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification”, IEEE Transactions
on Information Theory, vol. 13, No. 1, pp. 21-27, 1967.
[15] J. Han and M. Kamber, “Data Mining Concepts and Techniques”, Elevier, 2011.
[16] K. P. Soman, “Insight into Data Mining Theory and Practice”, New Delhi: PHI, 2006.
[17] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”,
Informatica, vol. 31, pp. 249-268, 2007.
[18] Bhavesh Patankar and Dr. Vijay Chavda, “A Comparative Study of Decision Tree, Naive
Bayesian and k-nn Classifiers in Data Mining”, International Journal of Advanced
Research in Computer Science and Software Engineering, Vol. 4, Issue 12, December 2014.
Dept. of CSE, MGI-COET, Shegaon
41
Download