Personality Development Using Machine Learning (2022-2023) CHAPTER 1 INTRODUCTION 1.1 Introduction Personality is defined as the aspect with the set of perception, feeling and behavioural patterns that develop from botanical and external factors. Generally, there is no proper approval for definition of personality, mainly they focus on provocation and conceptual interactions. Even personality can be defined as traits that predict a person’s behaviours. Personality identification was the old approach to identify the user’s personality but now with the help of data mining techniques accuracy of this prediction has improved a way lot than old techniques.[1] Data mining is the technique of finding pattern in huge data sets involving methods at the interaction of statistics, database systems and machine learning. Its overall goal is to produce information from datasets and transfer information. The automated personality consists of comparing user’s personality against standard personality tests taken. Mainly personality prediction depends on person’s nature. Several tests will be taken by asking set of questions and depending on the answers chosen by the user, personality will be predicted. Classification algorithm used is N-closest neighbourhood. It is very important to process large volume of data and this can be done by Classification algorithm.[2] The major goal of this paper is to give the outline for the growth of personality prediction depending upon the respective questions been answered. The outline of this paper is to predict personality of respective user and suitable career options. 1.2 Personality Traits In SenseCare, we have conducted a study on personality traits and basic emotions, assessed by both subjective emotional self-ratings and the automated classification of emotions from initial emotion analysis [1]. The study’s procedure was as follows: First, participants filled out the Big Five Aspect Scale. Second, participants then watched 12 video clips taken from movies and TV series, which were designed to cause an emotional response. There were 2 videos per Basic Emotion (i.e. Disgust, Anxiety/Fear, Anger, Surprise, Sadness, and Joy). The video clips lasted between 1 and 5 minutes. Third, after each video clip, participants were asked to rate on a Likertscale from 1-5 how much of each basic emotion they felt whilst watching the video (1 being “not Dept. of CSE, MGI-COET, Shegaon 1 Personality Development Using Machine Learning (2022-2023) at all” and 5 being “a great deal”). Participants were debriefed about the nature of the study. Overall, it lasted for an hour and 30 participants took part. A. Relationship Between Personality Traits and Subjective Emotional Feedback A Pearson’s correlation was conducted on each personality trait variable with each distinct emotion self-report rating. Pearson correlations measure the relationship between two variables and output the strength of the relationship with the effect size, r. This relationship can range from 1 to +1. A score of -1 represents a perfect negative correlation between two variables. If variables A and B have a negative correlation of -1, every increase of A is matched with an equivalent decrease in B, and vice versa. The opposite is true for a correlation of +1. A correlation of 0 means no relationship exists between the two variables. For social sciences, an effect size of 0.10 is considered small, 0.30 medium, and 0.50 a large effect size. The vast majority of these relationship are negatively correlated (88/105), meaning that increases in scores of each trait tended to decrease the reported amount of each emotion. The strongest relationship between two variables, was between the personality trait Agreeableness and the emotion Joy. This seemed to particularly stem from the sub-trait Compassion, which also had a moderate negative correlation with Joy. In relation to the hypotheses, Extraversion and its sub-traits Assertiveness and Enthusiasm showed no significant or strong relationship with Joy; Neuroticism was negatively related to the experience of each negative emotion, Anger, Fear, Anxiety, Sadness, Disgust; Agreeableness, both Compassion and Politeness was negatively associated with Anger; Conscientiousness, both Industriousness and Orderliness, was negatively associated with Disgust; Openness to Experience had a weak-to-moderate negative correlation with Anger, Fear, Sadness, Surprise, Anxiety, and Disgust, suggesting it is associated with a wide range of emotional experiences. However, it had no relationship with Joy. B. Relationship between Personality Traits and Automated Emotional Expressions The highest correlation was between Agreeableness and ML Sadness, with a moderate negative relationship, r = -0.34. This means that the more agreeable a person scored, the less likely they were to be classified as experiencing sadness during the experiment. Some of the most interesting findings were, ML Joy was positively correlated with Extroversion, both Assertiveness and Enthusiasm; ML Anger was positively correlated with Agreeableness, and both Compassion and Politeness, in contrast to SR Anger which was negatively correlated; ML Disgust was negatively associated with Conscientiousness, and both Industriousness and Orderliness, which was similar Dept. of CSE, MGI-COET, Shegaon 2 Personality Development Using Machine Learning (2022-2023) to SR Disgust; ML Fear/Anxiety was weakly positively correlated with Neuroticism, both Volatility and Withdrawal; Every ML emotion bar Fear/Anxiety had a > .10 (both plus and negative) relationship with Openness to Experience, suggesting a breadth of emotional experience. However, the strength of each relationship was not matched in the sub-trait aspects of Openness to Experience, Intellect and Openness. C. Relationship between Subjective Emotions and Facial Expressions For example, self-reported Joy is positively correlated with ML Anger. ML Joy is positively correlated with Fear, Anger, Anxiety, and Disgust. SR Disgust is negatively associated with ML Disgust, albeit the size of the correlation is weak. Fear is weakly positively correlated with ML Fear/Anxiety. SR Anger is negatively and weakly correlated with ML Anger. There is a weak and positive relationship between SR and ML Joy. There is a weak-to-moderate positive relationship between SR Anxiety and ML Anxiety/Fear. There is a weak positive relationship between SR and ML Sadness. There is a moderate-to-strong positive relationship between SR and ML Surprise. Overall, there are both consistent and inconsistent results between the two emotions measures. D. Taxonomy Management System During the study and its outcome, we have derived various requirements that should be supported by a Taxonomy Management System. First, we have investigated the data produced during the study. Secondly, we have derived requirements towards the Taxonomy Management System and modelled its constituents. Last but not least we have implemented and experimentally tested the SenseCare Taxonomy Management System 1.3 Motivation The term “personality” is derived from Latin word “persona”, which refers to the mask put on by the actors in theatre. The attributes which characterize a person such as emotions, behavior, mind and temperament define a personality. Due to the variety of attributes, gauging personality is crucial as there is no predefined structure for classifying and comparing people. The set of human emotions is vast, due to which a similar problem arises when we try to identify sentiments embedded in the message (also known as “sentiment analysis”), causing difficulty in choosing significant emotions for classification. Thus in order to automate task of sentiment analysis, a number of researchers accepted a simpler sentiments representation by means of their “polarity” (negative or positive) . Similarly for personality prediction, several researchers have identified the Dept. of CSE, MGI-COET, Shegaon 3 Personality Development Using Machine Learning (2022-2023) important characteristics essential for building a personality model. Personality is the coherent patterning of affect, cognition, and desires (goals) as they lead to behavior. To study personality is to study how people feel, how they think, what they want, and finally, what they do. Personality is an important aspect of human life and is important for understanding yourself and other people. The preeminent personality model in personality psychology is the Big 5 model. Personality is an important aspect of human life and is important for understanding yourself and other people. The preeminent personality model in personality psychology is the Big 5 model or the O.C.E.A.N model. The Big 5 model was derived through factor analysis of questions based on common descriptive adjectives. This analysis produced five distinct traits of personality Openness to experience, conscientiousness, extraversion, agreeableness and neuroticism. This paper aims in classifying personalities based on a set of attributes or commonly known as the big five model from a given text using Machine learning concepts and learning algorithms. 1.4 Objectives The objective is classifying personalities and analyzing them based on the big five model with a given data set using classification algorithms and advanced data mining concepts. Using and exhibiting the data mining concepts and automate personality classification using python data science libraries. To extract the personality of an individual on the social networking websites in addition, it was to classify personalities and analyze them based on the big five model with a given data set using classification algorithms and advanced data mining concepts which is served and also for Using and exhibiting the data mining concepts and automate personality classification using python data science libraries that has been provided. 1.5 Data Mining The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as "knowledge discovery in databases," the term "data mining" wasn’t coined until the 1990s. But its foundation comprises three intertwined scientific disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like intelligence displayed by software and/or machines) and machine learning (algorithms that can learn from data to make predictions). What was old is new again, as data mining technology keeps Dept. of CSE, MGI-COET, Shegaon 4 Personality Development Using Machine Learning (2022-2023) evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time-consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Retailers, banks, manufacturers, telecommunications providers and insurers, among others, are using data mining to discover relationships among everything from price optimization, promotions and demographics to how the economy, risk, competition and social media are affecting their business models, revenues, operations and customer relationships. The term "data mining" is a misnomer because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence (e.g., machine learning) and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java[8] (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons.[9] Often the more general terms (large scale) data analysis and analytics—or, when referring to actual methods, artificial intelligence and machine learning—are more appropriate. The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps. The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data. The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable Dept. of CSE, MGI-COET, Shegaon 5 Personality Development Using Machine Learning (2022-2023) statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations. 1.5.1 Algorithm in Data Mining An algorithm in data mining (or machine learning) is a set of heuristics and calculations that creates a model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis over many iterations to find the optimal parameters for creating the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics. The mining model that an algorithm creates from your data can take various forms, including: A set of clusters that describe how the cases in a dataset are related. A decision tree that predicts an outcome, and describes how different criteria affect that outcome. A mathematical model that forecasts sales. A set of rules that describe how products are grouped together in a transaction, and the probabilities that products are purchased together. The algorithms provided in SQL Server Data Mining are the most popular, well-researched methods of deriving patterns from data. To take one example, K-means clustering is one of the oldest clustering algorithms and is available widely in many different tools and with many different implementations and options. However, the particular implementation of K-means clustering used in SQL Server Data Mining was developed by Microsoft Research and then optimized for performance with SQL Server Analysis Services. All of the Microsoft data mining algorithms can be extensively customized and are fully programmable, using the provided APIs. 1.5.2 Choosing an Algorithm by Type SQL Server Data Mining includes the following algorithm types: Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. Regression algorithms predict one or more continuous numeric variables, such as profit or loss, based on other attributes in the dataset. Dept. of CSE, MGI-COET, Shegaon 6 Personality Development Using Machine Learning (2022-2023) Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a series of clicks in a web site, or a series of log events preceding machine maintenance. However, there is no reason that you should be limited to one algorithm in your solutions. Experienced analysts will sometimes use one algorithm to determine the most effective inputs (that is, variables), and then apply a different algorithm to predict a specific outcome based on that data. SQL Server Data Mining lets you build multiple models on a single mining structure, so within a single data mining solution you could use a clustering algorithm, a decision trees model, and a Naïve Bayes model to get different views on your data. You might also use multiple algorithms within a single solution to perform separate tasks: for example, you could use regression to obtain financial forecasts, and use a neural network algorithm to perform an analysis of factors that influence forecasts. Dept. of CSE, MGI-COET, Shegaon 7 Personality Development Using Machine Learning (2022-2023) CHAPTER 2 LITERATURE SURVEY 2. Literature Survey “Using an Affective Computing Taxonomy Management System to Support Data Management in Personality Traits” [1] states that Affective Computing is a rather new and multidisciplinary research field that seeks sophisticated automation in emotion detection for later analysis. However, the automated emotion detection and analysis require as well comprehensive data management support, e.g. to keep control of data produced, and to enable its efficient reuse through classification with established terminology. This paper contributes to data management aspects in Affective Computing and to automation support in emotion classification on the basis of a personal traits analysis. Hence, it describe the implementation of a taxonomy management system, derived from requirements of a case study that investigates the relationship between personality and emotions in Affective Computing. The study makes use of machine learning software developed by Sense Care, an EU- funded R&D project that applies Affective Computing to enhance and advance future healthcare processes and Systems “Agile Person Identification through Personality Test and kNN Classification Technique” states that Agile methodology is a famous software development methodology. The methodology stresses on adaptation and collaboration between people. Here, software project managers should agree to an idea of putting the right people in the right jobs. This research puts forward an idea of applying Big Five Personality Traits to predict how people suitable for the Agile methodology. A predicting method is driven by using kNearest Neighbour (kNN) classification technique. Results of a pilot study are presented and shown that the selected classification technique can be used for the prediction. “Happiness Recognition from Mobile Phone Data” states that first evidence that daily happiness of individuals can be automatically recognized using an extensive set of indicators obtained from the mobile phone usage data (call log, sms and Bluetooth proximity data) and “background noise” indicators coming from the weather factor and personality traits. Final machine learning model, based on the Random Forest classifier, obtains an accuracy score of 80.81% for a 3-class daily happiness recognition problem. Moreover, it identify and discuss the indicators, which have strong Dept. of CSE, MGI-COET, Shegaon 8 Personality Development Using Machine Learning (2022-2023) predictive power in the source and the feature spaces, discuss different approaches, machine learning models and provide an insight for future research. “Machine Prediction of Personality from Facebook Profiles” states that An increasing number of Americans use social networking sites such as Facebook, but few fully appreciate the amount of information they share with the world as a result. Although studies exist on the sharing of specific types of information (photos, posts, etc.), one area that has been less explored is how Facebook profiles can share personality information in a broad, machine-readable fashion. In this study, it apply data mining and machine learning techniques to predict users’ personality traits (specifically, the traits of the Big Five personality model) using only demographic and text-based attributes extracted from their profiles. Then use these predictions to rank individuals in terms of the five traits, predicting which users will appear in the top or bottom 5% or 10% of these traits. Results show that when using certain models, can find the top 10% most Open individuals with nearly 75% accuracy, and across all traits and directions, we can predict the top 10% with at least 34.5% accuracy (exceeding 21.8%, which is the best accuracy when using just the bestperforming profile attribute). These results have privacy implications in terms of allowing advertisers and other groups to focus on a specific subset of individuals based on their personality traits. [2] Personality is a unique trait that distinguishes an individual. It includes an ensemble of peculiarities on how people think, feel, and behave that affects the interactions and relationships of people. Personality is useful in diverse areas such as marketing, training, education, and human resource management. There are various approaches for personality recognition and different psychological models. Preceding work indicates that linguistic analysis is a promising way to recognize personality. In this work, a proposal for personality recognition relying on the dominance, influence, steadiness, and compliance (DISC) model and statistical methods for language analysis is presented. To build the model, a survey was conducted with 120 participants. The survey consisted in the completion of a personality test and handwritten paragraphs. The study resulted in a dataset that was used to train several machine learning algorithms. It was found that the AdaBoost classifier achieved the best results followed by Random Forest. In both cases a feature selection preprocess with Pearson’s Correlation was conducted. AdaBoost classifier obtained the average scores: accuracy = 0.782, precision = 0.795, recall = 0.782, F-measure = 0.786, receiver operating characteristic (ROC) area = 0.939. Personality has been recognized as a driver of decisions and behavior; it consists of singular characteristics on how individuals think, feel, and behave. Understanding personality provides a way to comprehend how the different traits Dept. of CSE, MGI-COET, Shegaon 9 Personality Development Using Machine Learning (2022-2023) of an individual merge as a unit, since personality is a mixture of traits and behavior that people have to cope with situations. Personality influences selections and decisions (e.g., movies, music, and books). Personality guides the interactions among people, relationships, and the conditions around them. Personality has been shown to be related to any form of interaction. In addition, it has been shown to be useful in predicting job satisfaction, success in professional relationships, and even preference for different user interfaces. Previous research on user interfaces and personality has found more receptiveness and confidence in users when the interfaces take personality into account. When personality is predicted from the social media profile of users, applications can use it to personalize presentations and messages [3]. Researchers have recognized that every person has a personality that usually remains consistent over time. Consequently, personality assessment can be used as an important measure. Various psychological models of personality have been proposed, such as the Five-factor model, the psychoticism, extraversion, and neuroticism (PEN) model, the Myers–Briggs type inventory, and the dominance, influence, steadiness, and compliance (DISC) model. Typically, these models propose direct methods such as questionnaires to recognize personality. Conversely, linguistic analysis can be used to detect personality. Linguistic analysis can produce useful patterns for establishing relationships between writing characteristics and personality. Researchers in natural language processing have proposed several methods of linguistic analysis to recognize personality, and machine learning has been one of the most investigated approaches. Machine learning techniques are useful in the recognition of personality since they provide mechanisms to automatize processes that are based on a set of examples. Several proposals for personality recognition based on machine learning can be found in the literature. Machine learning algorithms use computational methods to learn directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of instances available for learning increases. Several efforts in personality prediction from the linguistic analysis approach have been carried out. However, they have focused mostly on the English language and are based on the five-factor model. This model (also called big five model) has been used as a standard for applications that need personality modeling. To contribute to the advancement and understanding of the relationship between personality and language, we have developed a predictive model for personality recognition based on the DISC personality model and a machine learning approach. We performed a personality survey with 120 participants. Recently, Cognitive-based Sentiment Analysis with emphasis on automatic detection of user behaviour, such as personality traits, based on online social media text has gained a lot of Dept. of CSE, MGI-COET, Shegaon 10 Personality Development Using Machine Learning (2022-2023) attention. However, most of the existing works are based on conventional techniques, which are not sufficient to get promising results. In this research work, we propose a hybrid Deep Learningbased model, namely Convolutional Neural Network concatenated with Long Short-Term Memory, to show the effectiveness of the proposed model for 8 important personality traits (Introversion-Extroversion, Intuition-Sensing, Thinking- Feeling, Judging-Perceiving). We implemented our experimental evaluations on the benchmark dataset to accomplish the personality trait classi_cation task. Evaluations of the datasets have shown better results, which demonstrates that the proposed model can effectively classify the user's personality traits as compared to the state-of-the-art techniques. Finally, we evaluate the effectiveness of our approach through statistical analysis. With the knowledge obtained from this research, organizations are capable of making their decisions regarding the recruitment of personals in an efficient way. Moreover, they can implement the information obtained from this research as best practices for the selection, management, and optimization of their policies, services, and products. Cognitive Science is a multidisciplinary area of research that aims at addressing different cognitive processes and mental states, including learning, thinking, perception, remembering, and emotions. Among the aforementioned types of cognition, personality plays a pivotal role in identifying the social behaviour of humans. Computer-based personality detection and classification have remained an active area of research for a long time. Personality detection can be performed using multiple media, such as text, images, video, and audio. Cognitive-based sentiment classification from social media-text is an area that originates from several challenges for researchers in cognitive computation. In this area, a lot of work has been done and much more can be investigated. Textual cognitive-based sentiment analysis (SA) is not merely a theoretical area, rather it has several applied The associate editor coordinating the review of this manuscript and approving it for publication was Hiram Ponce fields, such as health, education, finance, and others. Being a merger of cognitive science and human neurology, it can address the gap between the abstractions of cognitive science and the more emerging area of personality detection from a person's textual feedback expressed on social media. Social media platforms, such as Twitter, Facebook, and Instagram, have experienced an unexpected worldwide spread in recent years. For example, by the 3rd quarter of 2019, Twitter had over 330 million active users per month. Progress in natural language processing and text analytics gives researchers an opportunity to use big-data sources for extracting and analysing the textual personality traits expressed by users while using social media, as long as the data scientists working on social media content are able to address the challenging issues specific to such content. In the last few years, cognitive-based SA applications Dept. of CSE, MGI-COET, Shegaon 11 Personality Development Using Machine Learning (2022-2023) have become popular among online communities for knowing about the opinions and personality traits of individuals pertaining to different issues, policies, and others. However, due to the diverse nature of social media content, it is tedious to analyse text using existing techniques to detect personality traits from such content. Therefore, extraction and analysis of social media content has become essential through automatic classification of personality traits. A lot of work has been carried out in the fields of text-based SA lexicon construction, cognition, aspect-based SA, and visual SA. However, more work needs to be done in the context of cognitive-based social media, with an emphasis on extracting and classifying personality traits from social media content. The aforementioned issues often result in incorrect classification of cognitive-based sentiment classification of social media content. Therefore, it is necessary to develop a method to classify personality traits in social media by automatic classification of such content. Deep learning is comprised of a group of algorithms that mimic the functionality and structure of the human brain. In simple terms, it contains a set of neurons to receive input and also a set of neurons to transmit output signals. The deep learning-based models can assist with different tasks like speech recognition, computer vision, natural language processing, and handwriting generation. A model has been proposed for the human Big Five personality traits prediction, which needs 8 times less data. An embedding layer used for word extraction underlying user tweets is named the GloVe model. The model's training and testing are performed using the provided Twitter data. Moving forward, the testing of data is performed on three fusions (i) LIWC along with GP, (ii) 3-Gram along with GP, and (iii) GloVe along with RR. The proposed model outperformed the state-of- the-art work with a mean correlation of 0.33 across the Big Five traits. The present work exploited only English Twitter content that needs to be extended to further languages. Furthermore, the proposed model's efficiency can be estimated by an extended number of tweets. A personality identification approach is applied to text data by exploiting a deep neural network. The present work used a hierarchical scheme called AttRCNN that is capable of retaining semantic features at a deeper level. The results reveal that the proposed features effectively perform better than the compared features. [4] The increasing availability of high-dimensional, fine-grained data about human behavior, gathered from mobile sensing studies and in the form of digital footprints, is poised to drastically alter the way personality psychologists perform research and undertake personality assessment. These new kinds and quantities of data raise important questions about how to analyze the data and interpret the results appropriately. Machine learning models are well-suited to these kinds of data, allowing researchers to model highly complex relationships and to evaluate the Dept. of CSE, MGI-COET, Shegaon 12 Personality Development Using Machine Learning (2022-2023) generalizability and robustness of their results using resampling methods. The correct usage of machine learning models requires specialized methodological training that considers issues specific to this type of modeling. Here, we first provide a brief overview of past studies using machine learning in personality psychology. Second, we illustrate the main challenges that researchers face when building, interpreting, and validating machine learning models. Third, we discuss the evaluation of personality scales, derived using machine learning methods. Fourth, we highlight some key issues that arise from the use of latent variables in the modeling process. We conclude with an outlook on the future role of machine learning models in personality research and assessment. One debatable way for researchers to boost the reported performance of ML models is by using classification instead of regression methods. In the analysis of our data reported above, we fitted a regression random forest, thus predicting continuous values for the outcome variable extraversion. However, a lot of work in personality computing has focused on predicting classes (i.e., “low” vs. “high”) of personality traits, rather than continuous trait scores. This decision to focus on classes can pose a problem when the rationale for creating discrete classes is not fully transparent. In binary classification, the two classes are often generated around some fixed central tendency estimate (e.g., median), obtained from the sample under investigation. In some cases, an arbitrary dividing point is used (e.g., determine that the midpoint of a five-point rating scale is assigned to the “low” vs. the “high” class), leaving open the possibility that the decision was made to maximize reported performance [3] Personality can be defined as the combination of behavior, emotion, motivation, and thoughts that aim at describing various aspects of human behavior based on a few stable and measurable characteristics. Considering the fact that our personality has a remarkable influence in our daily life, automatic recognition of a person's personality attributes can provide many essential practical applications in various aspects of cognitive science. Although various methods have been recently proposed for the task of personality recognition, most of them have mainly focused on humandesigned statistical features and they did not make use of rich semantic information existing in users' generated texts while not only these contents can demonstrate its writer's internal thought and emotion but also can be assumed as the most direct way for people to state their feeling and opinion in an understandable form. In order to make use of this valuable semantic information as well as overcoming the complexity and handcraft feature requirement of previous methods, a deep learning based method for the task of personality recognition from text is proposed in this paper. Among various deep neural networks, Convolutional Neural Networks (CNN) have demonstrated profound efficiency in natural language processing and especially personality detection. Owing to Dept. of CSE, MGI-COET, Shegaon 13 Personality Development Using Machine Learning (2022-2023) the fact that various filter sizes in CNN may influence its performance, we decided to combine CNN with AdaBoost, a classical ensemble algorithm, to consider the possibility of using the contribution of various filter lengths and gasp their potential in the final classification via combining various classifiers with respective filter size using AdaBoost. Our proposed method was validated on the Essay dataset by conducting a series of experiments and the empirical results demonstrated the superiority of our proposed method compared to both machine learning and deep learning methods for the task of personality recognition. By taking the significance of textual data into account, a small number of studies have focused on using text generated by people to predict their personality. In this regard, machine learning based methods have been also utilized but their obtained results were not satisfactory because the majority of them were based on statistical or hand-craft linguistic features and were not able to consider the rich user-generated textual information and extract features from them automatically while these words and text are the most valuable features for determining the emotion and personality. By the development of deep neural networks, they demonstrated remarkable performance in various Natural Language Processing (NLP) tasks including opinion mining and sentiment analysis. It must be noted that personality recognition is very similar to NLP applications while they both focus on mining users' attributes from texts. Accordingly, employing powerful text modeling techniques that have been efficiently utilized in the NLP domain can be the most intuitive and straightforward idea for improving the performance of personality recognition. Having the mentioned limitations besides the potential of deep learning in our mind, we proposed a deep learning based method for personality recognition that tries to make use of both Convolutional Neural Network (CNN) and AdaBoost algorithm. Although CNN has been successfully utilized for various NLP tasks and extracting local features can be considered as its potential, using various filter lengths may have a negative influence on the efficiency of the CNN classifier. To this end, we decided to combine CNN with AdaBoost algorithm to investigate the possibility of leveraging the contribution of different filter lengths and gasp their potential for personality recognition by combining classifiers with respective filter sizes. The reason behind choosing AdaBoost is that it is a Meta algorithm that can be used in conjunction with other learning algorithms to improve classification accuracy. Based on this algorithm, the classification of each new stage is adjusted in favor of incorrectly classified samples in the previous stages. In fact, with the help of AdaBoost algorithm, the classification process is repeated until the classification error is minimized. [4] Dept. of CSE, MGI-COET, Shegaon 14 Personality Development Using Machine Learning (2022-2023) Social media is one of the most popular platforms and people from all the diverse fields such as students or professionals explore social media daily. It is a platform where people are available from different cultures and religions. With the advancement of technologies in every field of life, there is an increased demand for social media. Whenever people go online they generate rich data through their smartphones or internet pads. Their texting style, taste in music, books, likes, dislikes, sharing posts reveal their personality, therefore social media is an ideal platform to study the human personality. Personality has been considered as an essential factor and it is a combination of different attributes that make a person unique from one another. In our proposed work, we used Twitter data and my Personality datasets to perform an objective assessment using a deep sequential neural network and multi-target regression model for predicting personality traits. The proposed algorithm is based on the Five-Factor Model (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism). The efficacy of the proposed technique has been measured by MSE, MAE, Precision, Recall, and F1-Score. Experimental results show that our model is robust and it has outperformed the existing techniques to predict human personality traits. The Five-Factor model is one of the well-established models to recognize personality. It uses words to identify personality and analyze in which trait a person fits. It characterizes a person into five traits i-e agreeableness, conscientiousness, extraversion, neuroticism, and openness. Fig. 1.1 Five-Factor Model attributes Dept. of CSE, MGI-COET, Shegaon 15 Personality Development Using Machine Learning (2022-2023) The Five-Factor model is also called the “BIG5” or “Ocean” model. According to Hirschfeld, the Five-Factor Model provides the prestigious dimensions of personality and the five traits of “FFM” or “Big Five” can be stated below: O- Openness: Openness is a dimension of the Five-Factor Model. Openness relates to an individual who is creative, imaginative, artistic, understanding, curious, politically liberal, traditional, competitors, and love to travel new places. They are successful if they pursue the field of an accountant, auditor, judge, and financial manager. C- Conscientiousness: Conscientiousness is another dimension or trait of the Five-Factor Model which has two basic features dependability and accomplishment. The highly conscientious people are well planned, organized, dutiful, reliable, purposeful, impulse control, workaholic, selfdisciplined, determined, and confident. They tend to be less bound by plans and rules but more tolerant. E- Extraversion: Extroversion is a classic dimension of the Five-Factor Model. The highly extroverted people are affectionate, friendly, excitement seeker, energetic, assertive, optimistic, outgoing, charismatic, and talkative. Most of the extrovert gain energy from their surroundings. They become successful in the future if they pursue politics and sales as their career. A- Agreeableness: Agreeableness is a dimension that is concerned with the nature of one's associations with others. The highly agreeable people are cooperative, courteous, friendly, trustworthy, kind, tolerant, pardoning, polite nature, peacekeeper, calm, imaginative, caring, and raise new ideas. They ignore their needs for others, they are good team worker but don't work as a leader. They are easily influenced, adopt group opinions, working in the background, and keeping a positive relationship with others. N- Neuroticism: Neuroticism is another classic dimension of the Five-Factor Model and they are reversely referred to as “Emotional Stability”. The individuals who have a higher degree of neuroticism are equivalent to emotional uncertainty, pessimistic, sensitive, insecure, unstable, nervous, easily depressed, vulnerable, and irritated, shocked easily, moody, experience negative emotions like (anxiety, anger, and depression) and never satisfied with their lives. They pursue pilot, engineering, and manager as their career. Dept. of CSE, MGI-COET, Shegaon 16 Personality Development Using Machine Learning (2022-2023) The Five-Factor model is one of the reliable, predictive, and efficient personality assessment models. Nowadays with the advancement of the technical world, people mostly communicate on social media sites. The Digital generation spending more time on social media and every individual has their account on social networking sites like Twitter, Facebook, Instagram, Whatsapp, and the list goes on and on through which they communicate with each other. Whenever people go online they generate data through social media or by using their cell phones. The language that is used by People on social media is full of psychological content by using it generates a valid and fast personality assessment. In our research, we used users generated content to insight the personality traits of users without having them fill out any questionnaire. We are interested in words used by the users on their profiles to predict their personality traits. We used two different datasets for our work: a my Personality dataset collected from Facebook (labeled data) and a Twitter dataset collected from user's Twitter profiles (extract 3200 tweets of a single user to predict their personality traits).[5] Dept. of CSE, MGI-COET, Shegaon 17 Personality Development Using Machine Learning (2022-2023) CHAPTER 3 SYSTEM DEVELOPMENT 3.1 Existing System In the existing system fact that people can identify other’s personality with social media profiles or text message, and some characteristics of Social media applications messages are used by people to detect others’ personalities but the overlap between social media features that contain the actual personality cues and features used by people to form personality detection does not have to be accurate. The probability of missing or misinterpreting the real traits of a person is high, People tend to carried away by the irrelevant traits in one’s personality rather than classifying them with actual traits. Humans are commonly prone to biases and prejudices which may affect the accuracy of their judgments. Also, certain features of the social media text data are difficult for humans to grasp. 3.2 Existing Technology or Algorithms In terms of supporting the complex and interdisciplinary knowledge domain of AC in SenseCare, the taxonomy management system achieved three goals. (1) The first goal is the development and management of initial emotion taxonomies. Hence, several taxonomies were imported to the Taxonomy Manager. Examples are the Sentient 26 Emotional Taxonomy, which is an emotional motivation framework for understanding consumer behaviour, and parts of the WHO’s ICD-10 classification, which classify mental disorders. Common taxonomies like these two exemplar ones would allow sharing and comparing information easier by offering standard vocabularies and formats. (2) Furthermore, these two taxonomies along with others from different knowledge domains are used to classify scientific content of the SenseCare AC domain stored in the KM-EP’s digital library, such as e.g. publications, multimedia, and person with dementia records. As a result, the content can easily be managed and found, which is the second goal of the Taxonomy Manager in the SenseCareKM-EP. (3) Finally, the analysis results produced by the emotion detection platform [1] will also have to be indexed using similar taxonomies from the Taxonomy Manager. This demonstrates that the Taxonomy Manager not only can be used to collect, classify, and provide access to materials of Dept. of CSE, MGI-COET, Shegaon 18 Personality Development Using Machine Learning (2022-2023) the initial emotion analysis and its results but also supports the work of psychology experts in a follow up study aiming at training machine learning components to classify personality traits from vectors of initial emotion classification features. This work could be much more costly without the classification, annotation, and access support of the Taxonomy Manager in the SenseCare KMEP supporting scientific research in the domain of AC. 3.3 Hardware and Software Requirements Software Requirements Python Software libraries Pickle, Argparse, Sys, Numpy, Tensorflow, Tqdm Interface standards Django Framework Hardware Requirements CORE I5 PROCESSOR 8 GB Ram / 500GB Hardisk 3.4 Proposed System Design 1. To develop an algorithm for unstructured Text Analysis Mechanism 2. To study Image Processing and select the optimal detection method for extracting from the input modality 3. To compare and contrast the correlation between different modalities 4. To develop an algorithm for fusing different modalities 5. To design a system to improve performance of Multi Class classification in Personality Prediction Analysis Dept. of CSE, MGI-COET, Shegaon 19 Personality Development Using Machine Learning (2022-2023) Fig 3.1 System Design for Personality Prediction System 3.5 Proposed Algorithms (Implementation Details (Modules)) The following sections describe each of these steps in more detail. Store Data related to personality traits in database The personality characteristics are stored in database. Later, when user enters his personality characteristics his personality is examined in large pre-existing databases and system will detect the personality of the user. Collect associated personality characteristics for each participant; Each user will enter his personality characteristics than system will detect the personality of the user, based on the previous data stored in database. Extract relevant features from the texts System will extract relevant features from the text entered by the user. System will compare this text with data stored in database. After comparison, system will specify the personality of the user. Display features relevant to his personality traits System will examine the personality of the user based on the personality traits mentioned by the user. And will provide user with various features which is relevant to his personality traits. Dept. of CSE, MGI-COET, Shegaon 20 Personality Development Using Machine Learning (2022-2023) Personality Traits Comparison The relation between personality and user behavior is tested. The hypothesis is that conscientiousness, agreeableness and neuroticism predict unique variance attitudes. Fig 3.2 Algorithm explanation with classification algorithm To overcome the problems of the existing system a personality classification system is proposed which uses some data mining techniques and machine learning algorithms are used to classify the personalities of different users and by using different algorithms like Big Five Personality Model, Logistic regression, Decision Tree and Support Vector Machine. By identifying the past data and their patterns it is easy to identify the personality by applying new techniques, so it overcomes the existing system. 3.6 Datasets (https://www.kaggle.com/code/yonatanilan/big-five-traits-with-personality-labels/data) Data set mainly consists of single statistical data matrix, in which every column represents a specific variable and the row represents the possible combinations of answers for the questions. In this project, the dataset consists of values or responses answered by the user for the given set of questions, user personality, best career option. The responses are compared with the already existing training set. User personality such as Extraversion (E) or Dept. of CSE, MGI-COET, Shegaon 21 Personality Development Using Machine Learning (2022-2023) Introversion (I), or Sensing (S) or Intuitive (I), or Feeling (F) or Thinking (T), Judging (J) or Perceiving (P) are taken. With the help of these personalities the best career for each personality can be predicted, i.e. if the person has a combination of ESTJ can be a chef. Kaggle supports a variety of dataset publication formats, but we strongly encourage dataset publishers to share their data in an accessible, nonproprietary format if possible. Not only are open, accessible data formats better supported on the platform, they are also easier to work with for more people regardless of their tools. 3.6.1 Supported File Types( CSVs) The simplest and best-supported file type available on Kaggle is the “Comma-Separated List”, or CSV, for tabular data. CSVs uploaded to Kaggle should have a header row consisting of humanreadable field names. A CSV representation of a shopping list with a header row, for example, looks like this: id,type,quantity 0,bananas,12 1,apples,7 CSVs are the most common of the file formats available on Kaggle and are the best choice for tabular data. On the Data tab of a dataset, a preview of the file’s contents is visible in the data explorer. This makes it significantly easier to understand the contents of a dataset, as it eliminates the need to open the data in a Notebook or download it locally. CSV files will also have associated column descriptions and column metadata. The column descriptions allows you to assign descriptions to individual columns of the dataset, making it easier for users to understand what each column means. Column metrics, meanwhile, present high-level metrics about individual columns in a graphic format. Dept. of CSE, MGI-COET, Shegaon 22 Personality Development Using Machine Learning (2022-2023) 3.6.2 Data pre-processing Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across the clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put in a formatted way. So for this, we use data preprocessing task. Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across the clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put in a formatted way. So for this, we use data preprocessing task. A real-world data generally contains noises, missing values, and maybe in an unusable format which cannot be directly used for machine learning models. Data preprocessing is required tasks for cleaning the data and making it suitable for a machine learning model which also increases the accuracy and efficiency of a machine learning model. It involves below steps: Getting the dataset Importing libraries Importing datasets Finding Missing Data Encoding Categorical Data Splitting dataset into training and test set Feature scaling To perform the following operation Pandas is the libraries used in machine learning and working of it is a s follows: Pandas is an open source library in Python. It provides ready to use high-performance data structures and data analysis tools. Pandas module runs on top of NumPy and it is popularly used for data science and data analytics. NumPy is a low-level data structure that supports multi-dimensional arrays and a wide range of mathematical array operations. Pandas has a higher-level interface. It also provides streamlined alignment of tabular data and powerful time series functionality. Dept. of CSE, MGI-COET, Shegaon 23 Personality Development Using Machine Learning (2022-2023) Data Frame is the key data structure in Pandas. It allows us to store and manipulate tabular data as a 2-D data structure. Pandas provides a rich feature-set on the Data Frame. For example, data alignment, data statistics, slicing, grouping, merging, concatenating data, etc. All the information in English go through pre-processing level before getting processed. Preprocessing is used to remove all the lower case, symbols, names, spaces etc. for example any word goes through pre-processing stage and after this word will be processed and converted into English. 3.7 Feasibility study Our Proposed system will provide information about the personality of the user. Based on the personality traits provided by the user, System will match the personality traits with the data stored in database. System will automatically classify the user’s personality and will match the pattern with the stored data. System will examine the data stored in database and will match the personality traits of the user with the data in database. Than system will detect the personality of the user. Based on the personality traits of the user, system will provide other features that are relevant to the user’s personality. Economic Feasibility This system will help advertisement people to market their products based on the personality of the user which in turn provide income to the firm who is using this system. This system can be embedded with social sites, as many users can buy and sell their product using these social networks. Operational Feasibility This system is more reliable, maintainable, affordable and producible. These are the parameters which are considered during design and development of this project. During design and development phase of this project there was appropriate and timely application of engineering and management efforts to meet the previously mentioned parameters. Technical Feasibility The back end of this project is Python which processed data related to personality traits and other details which is related to this project. There are basic requirement of hardware to run Dept. of CSE, MGI-COET, Shegaon 24 Personality Development Using Machine Learning (2022-2023) this application. This system is developed in Django Framework using python libraries. This system can be accessed by using any device like (Personal Computers, Laptop and with some hand held devices). 3.8 Django Framework Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source. Because Django was developed in a fast-paced newsroom environment, it was designed to make common web development tasks fast and easy. Here’s an informal overview of how to write a database-driven web app with Django. Once your models are defined, Django can automatically create a professional, production ready administrative interface – a website that lets authenticated users add, change and delete objects. 3.8.1 Configuration and Conventions Naming of variables is one of the most complex parts of development. Django has many configuration values, with sensible defaults, and a few conventions when getting started. by convention, templates and static files are stored in subdirectories within the application’s Python source tree, with the names templates and static respectively. While this can be changed, you usually don’t have to, especially when getting started. Once you have Django and running, you’ll find a variety of extensions available in the community to integrate your project for production. As your codebase grows, you are free to make the design decisions appropriate for your project. Django will continue to provide a very simple glue layer to the best that python has to offer. Django currently supports two interfaces: WSGI and ASGI. WSGI is the main Python standard for communicating between Web servers and applications, but it only supports synchronous code. ASGI is the new, asynchronous-friendly standard that will allow your Django site to use asynchronous Python features, and asynchronous Django features as they are developed. Implementation for WSGI is authenticating for generation of interface link for framework. It can implement advanced patterns in Django-rest-framework libraries which introduce non-relational data persistence as appropriate, and take advantage of framework-agnostic tools built for WSGI, the Python web interface. Dept. of CSE, MGI-COET, Shegaon 25 Personality Development Using Machine Learning (2022-2023) 3.8.2 Support Vector Machine Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning. The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine. SVM algorithm can be used for Face detection, image classification, text categorization, etc. Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points. Support Vectors: The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector. 3.8.3 Logistic Regression Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. Dept. of CSE, MGI-COET, Shegaon 26 Personality Development Using Machine Learning (2022-2023) Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems. In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1). The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc. Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets. Logistic Regression can be used to classify the observations using different types of data and can easily determine the most effective variables used for the classification. The below image is showing the logistic function: Fig. 3.3 Logistic regression graphical representation Logistic Function (Sigmoid Function): The sigmoid function is a mathematical function used to map the predicted values to probabilities. It maps any real value into another value within a range of 0 and 1. The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function. Dept. of CSE, MGI-COET, Shegaon 27 Personality Development Using Machine Learning (2022-2023) In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0. Assumptions for Logistic Regression: o The dependent variable must be categorical in nature. o The independent variable should not have multi-collinearity. 3.8.4 Random Forest Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables as in the case of regression and categorical variables as in the case of classification. It performs better results for classification problems. Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting. Below are some points that explain why we should use the Random Forest algorithm: It takes less training time as compared to other algorithms. It predicts output with high accuracy, even for the large dataset it runs efficiently. It can also maintain accuracy when a large proportion of data is missing. Steps involved in Support Vector Machine algorithm: Step 1: In SVM number of fixed records are taken from the data set having k number of records. Dept. of CSE, MGI-COET, Shegaon 28 Personality Development Using Machine Learning (2022-2023) Step 2: Individual decision trees are constructed for each sample. Step 3: Each decision tree will generate an output. Step 4: Final output is considered based on Majority Voting or Averaging for Classification and regression respectively. Implementation Steps are given below: Data Pre-processing step Fitting the Logistic Regression algorithm to the Training set Predicting the test result Test accuracy of the result (Creation of Confusion matrix) Visualizing the test set result. 3.8.4 Decision Tree Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a treestructured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. The decisions or the test are performed on the basis of features of the given dataset. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure. In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees. Dept. of CSE, MGI-COET, Shegaon 29 Personality Development Using Machine Learning (2022-2023) Fig.3.4 Decision tree structure Decision Tree Terminologies Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. Branch/Sub Tree: A tree formed by splitting the tree. Pruning: Pruning is the process of removing the unwanted branches from the tree. Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes. Working of decision tree In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the comparison, follows the branch and jumps to the next node. Dept. of CSE, MGI-COET, Shegaon 30 Personality Development Using Machine Learning (2022-2023) For the next node, the algorithm again compares the attribute value with the other sub-nodes and move further. It continues the process until it reaches the leaf node of the tree. The complete process can be better understood using the below algorithm: Step-1: Begin the tree with the root node, says S, which contains the complete dataset. Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM). Step-3: Divide the S into subsets that contains possible values for the best attributes. Step-4: Generate the decision tree node, which contains the best attribute. Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node. Dept. of CSE, MGI-COET, Shegaon 31 Personality Development Using Machine Learning (2022-2023) CHAPTER 4 IMPLEMENTATION AND RESULT 4.1 Implementation Steps for execution of projects 1. The editor we used here is VS Code which is an universal editor to perform the operations related to any learning platform 2. In VS Code an interpreter related to language need to install or provide the path to run the python code. 3. Basically for VS Code an interpreter is required to execute the python code 4. Once the installation is done we need to trained the dataset for creation of model and to do so we use Support Vector Machine for feature extraction and Logistic Regression for classification 5. For logistic regression we use from sklearn.linear_model import LogisticRegression 6. For Support Vector Machine we use from sklearn.model_selection import cross_val_score from sklearn import svm 7. Processing of data was done using following code: 8. def filter(label,gender,age,openness,neuroticism,conscientiousness,agreeableness,e xtraversion): 9. Daat splitting with various input was done and its filtration done as # Serious label serious_list_main = random_selection(len(serious_list),label,serious_list,len(responsible_list)) 10. Same has to be done for other labels Here in the system, the dataset consisting of various inputs of .csv file which trained to generate the model with model.pkl extension 11. As the system need to be interactive we make the use of web based framework that was applicable to run in python 12. For framework Django framework libraries and packages needed to install in VS Code for python interpreter Dept. of CSE, MGI-COET, Shegaon 32 Personality Development Using Machine Learning (2022-2023) 13. Once the libraries installed for framework we need to activate the environment associated to it. Installation of libraries was done done using ‘pip’ command and was written as pip install [module name] 14. During execution of main file it will provide a back end virtual server with help of seaborn libraries and a window got open in browser in localhost where we insert all the required fields 15. After filling the required fields the system will compared the input with model we created early and predict the personality of an individual using def predict_datapoint(model,datapoint): temp=[] if datapoint[0]=="Male": temp.append(0) else: temp.append(1) data = scaler.transform([datapoint[1:]]) temp = temp + data.reshape(data.shape[1]).tolist() label = model.predict(np.array([temp])) return label[0] finaloutput = personality[predict_datapoint(model, data_point)] return render(request, 'output.html',{'output' : finaloutput}) Dept. of CSE, MGI-COET, Shegaon 33 Personality Development Using Machine Learning (2022-2023) 4.2 Screenshots Fig. 4.1 VS code Editor Window for coding Fig. 4. 2. Virtual environment activation Dept. of CSE, MGI-COET, Shegaon 34 Personality Development Using Machine Learning (2022-2023) Fig. 4.3.Activating the dedicated environment Fig. 4.4 Web page got opened Dept. of CSE, MGI-COET, Shegaon 35 Personality Development Using Machine Learning (2022-2023) Fig.4.5 Web page got opened with questionaries’ test based on personality traits Fig.4.6.Prediction based on questionaries’ about personalities Dept. of CSE, MGI-COET, Shegaon 36 Personality Development Using Machine Learning (2022-2023) Fig.4.7 Prediction output based on input data Dept. of CSE, MGI-COET, Shegaon 37 Personality Development Using Machine Learning (2022-2023) CHAPTER 5 CONCLUSION AND FUTURE WORK 5.1 Conclusion Personality analysis and prediction has increased very much in the recent times. Extracting the personality of the user using the current system is very much helpful in various fields, for instance, recruitment process, medical counselling, and likewise. Personality detection from survey means to extract the behavior characteristics of the users taking the survey. This paper focuses on providing a state-of-art review of an emerging field i.e. personality detection from survey. This paper also discusses the state-of-art methods for personality detection and prediction. With regard to the objectives established previously, it can be concluded that all the requirements have been successfully met, except for one: the requirement for the presentation of results in the mobile application. To meet this requirement, it will be necessary to fully develop the classifier and the prototype but this work is still in an early stage although it is a great starting point for future analysis. A lot of work needs to be done prior having more conclusive results. However, so far, we are proud to say that the first objective is achieved. A lot of time has been needed for collaborating with psychologists, understanding the possible ways of identifying personalities and their link with happiness. 5.2 Future Work Recent multimodal deep learning techniques have performed well and are starting to make reliable personality predictions. Deep learning offers away to harness the large amount of data and computation power at our disposal with little engineering by hand. Various deep models have become the new state-of-the-art methods not only for personality detection, but in other fields as well. We expect this trend to continue with deeper models and new architectures which are able to map very complex functions. We expect to see more personality detection architectures that rely on efficient multimodal fusion. Finally, the neurotic people use terms like regret, loosing, confuse, upset, mess, etc. It is obvious from the literature review that there has been tremendous work in the field of personality assessment but it still needs more research as personality prediction is a broad domain. In our research, We are predicting human personality through text analysis and the work can be extended by using images, videos, audio content that the social media users share on Dept. of CSE, MGI-COET, Shegaon 38 Personality Development Using Machine Learning (2022-2023) their accounts. Apart from Twitter, there are several social media sites like Instagram, Youtube, Linkedlnthat can be used to explore personalities for the proposed technique. Social media users belong to different cultures and religions so they speak different languages, therefore the language barrier is one of the ultimate problems while predicting a user's personality. Dept. of CSE, MGI-COET, Shegaon 39 Personality Development Using Machine Learning (2022-2023) REFERENCES [1] Machine Learning Approach for Personality Recognition in Spanish Texts Yasmín Hernández * , Alicia Martínez *, Hugo Estrada, Javier Ortiz and Carlos Acevedo Appl. Sci. 2022, 12, 2985. https://doi.org/10.3390/app12062985 Received August 10, 2021, accepted September 27, 2021, date of publication October 21, 2021, date of current version November 3, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3121791 [2] A Hybrid Deep Learning Technique for Personality Trait Classification From Text HUSSAIN Ahmad1, Muhammad Usama Asghar 1, Muhammad Zubair Asghar 1, Aurangzeb Khan2, And Amir H. Mosavi [3] Personality Research and Assessment in the Era of Machine Learning Clemens Stachl1, *, Florian Pargent2, *, Sven Hilbert3, Gabriella M. Harari1, Ramona Schoedel2, Sumer Vaid1, Samuel D. Gosling4,5, & Markus Bühner2 (2020) per.2257. https://doi.org/10.1002/per.2257 [4] Using Machine Learning-Based Models for Personality Recognition Fatemeh Mohades Deilami1, Hossein Sadr2,* , Mozhdeh Nazari3 [5] International Journal Of Scientific & Technology Research Vo`Lume 10, Issue 05, MAY 2021 ISSN 2277-8616 139 IJSTR©2021 www.ijstr.org Analysis Of Personality Assessment Based On The Five-Factor Model Through Machine Learning Noureen Aslam, Khalid Masood Khan, Afrozah Nadeem,Sundus Munir, and JavairyaNadeem [6] Manasi Ombhase, Student, PCE, Prajakta Gogate, Student, PCE, Tejas Patil, Student, PCE, Karan Nair, Student, PCE and Prof. Gayatri Hegde, Faculty, PCE, “Automated Personality Classification Using Data Mining Techniques” [7] Sayali D. Jadhav1, H. P. Channe2“Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques”, Department of Computer Engineering, Pune Institute of Computer Technology, Savitribai Phule Pune University, Pune, India [8] Anisha Yata1, Prasanna Kante2, T Sravani3, B Malathi4,” Personality Recognition using Multi-Label Classification”2018. [9] Veronica Ong, Anneke D. S. Rahmanto, Williem and Derwin Suhartono,” Exploring Personality Prediction from Text on Social Media”: A Literature Review 2017. [10] Tommy Tandera, Hendro, Derwin Suhartono*, Rini Wongso, and Yen Lina Prasetio “Personality Prediction System from Facebook Users” Computer Science Department, Dept. of CSE, MGI-COET, Shegaon 40 Personality Development Using Machine Learning (2022-2023) School of Computer Science, Bina Nusantara University, Jl. K. H. Syahdan No. 9 Kemanggisan, Jakarta 11480, Indonesia [11] Avnish Kumar1, Akshat Gawankar2, Kunal Borge3 & Mr Nilesh M Patil4 .1 2 3B.E IT Student, “Student Profile & Personality Prediction using Data Mining Algorithms” Information Technology, Rajiv Gandhi Institute of Technology, Mumbai, India 4 Assistant Professor, Information Technology, Rajiv Gandhi Institute of Technology, Maharashtra, India [12] Fazel Keshtkar, Candice Burkett, Haiying Li and Arthur C. Graesser, “Using Data Mining Techniques to Detect the Personality of Players in an Educational Game”. [13] Janhavi Pednekar1, Shraddha Dubey2 1,2Symbiosis, “Identifying Personality Trait using Social Media”: A Data Mining Approach Institute of Computer Studies and Research, Symbiosis International University, {janhavi. pednekar, shraddha.dubey}@sicsr.ac.in [14] T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification”, IEEE Transactions on Information Theory, vol. 13, No. 1, pp. 21-27, 1967. [15] J. Han and M. Kamber, “Data Mining Concepts and Techniques”, Elevier, 2011. [16] K. P. Soman, “Insight into Data Mining Theory and Practice”, New Delhi: PHI, 2006. [17] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, Informatica, vol. 31, pp. 249-268, 2007. [18] Bhavesh Patankar and Dr. Vijay Chavda, “A Comparative Study of Decision Tree, Naive Bayesian and k-nn Classifiers in Data Mining”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, Issue 12, December 2014. Dept. of CSE, MGI-COET, Shegaon 41