TITLE OCEAN traits extraction from text, questionnaire and social media profile for personality prediction Class of work: Software category Authors: Mrs. Smita Bharne, Ms. Shital Nehete ,Mr. Jeet Choudhari , Ms. Mansi Pable Organization: Ramrao Adik Institute of Technology, Navi Mumbai Short abstract: In recent years, due to the increase of stress levels at different places like home, organizations, etc problems of mental health of people are increasing. The employment of less-skilled people results in reducing the outcome of the organization. To solve such kinds of problems, the personality of the person plays a major role. The system aims to predict the personality of the person using Big-Five traits which are Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism popularly known as the OCEAN model. The system predicts the personality of a person using a hybrid model which consists of three processes i.e., text, questionnaire, and social media profile connections. The machine learning model used to predict the personality uses the myPersonality dataset. Social media profile also helps to compare the personality of the person within a friend’s network. This web application provides a proper interface for the processes and displays the various parameters to predict the personality of a person for easy access. Detailed Abstract: From the last decade, the use of social media is rapidly increasing among every age criterion of people. The younger generation is spending more time on various social networking sites. With the evolution of social networks, an excellent diversity of approaches has been developed to interpret users' personalities. Also in recent years, the issues of the psychological state of individuals are increasing day by day due to an increase in stress levels. To solve these problems, the personality of the person plays a serious role. The goal of this system is to predict the personality of the person using the OCEAN model (Openness, Conscientiousness, Extraversion, Agreeableness and Neurotism). High Openness reflects the degree of curiosity and creativity. Also lack of focus and unpredictable behavior. For example, a person has multiple talents like dance, art and music. While, low openness shows people to be practical, data driven and aren't open to new ideas. For example, a person analyzed the report and he says, it can't be wrong. High Conscientiousness is a tendency to be organized and dependable. It can also be characterized by discipline and stubbornness. For example, a person writes an assignment with great care and keep them organized. While, low conscientiousness means sloppy, spontaneous and flexible. For example, some persons keep their cupboard messy. High Extraversion is defined as a behavior where someone enjoys being around people. It means cheerful, talkative and attention seeker. For example, a person is friend with many people and is not afraid of risk. While, low Extraversion means shy, reserved and introvert. People who are high in agreeableness tend to be more cooperative, helpful and trustful. For example, some person shows kindness and affection towards dogs. While those who are low in this trait are more like competitive and suspicious. High Neurotism means worse personality trait. It includes obsessing over what others think or having a more anxious temperament than others. For example, Rohit gets angry everytime. Low Neurotism shows confident, excited and hyper active person. For example, I'm excited for the London trip. The traits, set of etiquettes, and emotional patterns define one’s personality which plays a crucial role in making major life decisions. The potential of personality features to forecast the life upshots has been interrogated due to the presumed repercussion of personality. Predicting personality with the assistance of OCEAN models is often obliging to screen a particular person. Social media is the social interconnection among people that occurs during the creation, sharing, or exchange of information and ideas in virtual communities and networks. People are spending more time on online social networks and sharing a lot of personal information through this medium. This information includes the user's social behavior, user-generated texts, and language habits, to some extent it will reflect the user's personality. This system aims to predict the personality of a user through his social media interconnections as one of the major components without violating his privacy using the proposed hybrid model. We propose a Personality Prediction web application that consists of a hybrid model of the three sections namely Text Predictor, Questionnaire, and through Facebook. In most of the existing state-of-the-art, personality prediction is based on either machine learning algorithms or either through the questionnaire formats. Our approach uses the hybrid model which predicts the personality based on input text, questionnaire, and through Facebook friend network. The user interface of the web application gives the choice to input a text which is provided as an input to the OCEAN model for predicting. This is often followed by scanning those text sentences within the myPersonality sample dataset (250 users, 9917 status updates). In the questionnaire, the User undergoes 50 questions regarding the O.C.E.A.N model, based on which radar graph is implemented. It also can be termed a Psychometric Assessment. This can be further used for comparing personality among social networks. Comparing the personality among the Facebook connections or any social media site network tab is useful. Comparison between user’s personality score evaluated from the Questionnaire section and therefore the model-based prediction is used in this section. We have used a random forest algorithm to train each of various personality traits as Random Forest is a flexible, easy-to-use algorithm that provides an excellent result. Additionally, it takes less training time, predicts output with high accuracy and it can also maintain accuracy when a large proportion of data is missing. Further, we have used selenium to extract statuses from Facebook and store them in mongo dB, which is a NoSQL database. Finally, implemented the web application using react and flask. This system consists of questionnaires as well as text prediction of personality. The automated system will give users a better analysis of their personality as it compares the scores with the friends, he/she is on Facebook. It can also provide self-analysis to the user for the field he/she is lacking concerning another. Combining these methods can help the user to get the result on only one platform instead of searching for another. Statement of Particulars: 1. Objectives The objective of this project is to predict personality based on the 5 features (BIG5 – O.C.E.A.N.) which stand for Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism to diagnose psychological problems as well as assessment of candidates. Specifically, the following were defined to achieve the objective of the study: 1. To diagnose psychological problems. 2. Develop an application to screen candidates for college, employment. 3. Focusing on strengths, weaknesses, temperament, and style of leadership. 4. Predict personality at the workplace as well as for personal inventories. 2. Structure of the proposed system Figure 1: Architecture of the Proposed System Figure 1 shows the architecture of the proposed system. It consists of 3 sections Text Predictor, Questionnaire, and Facebook. The model is trained on the myPersonality dataset. For each personality attribute, the Models generate a predicted personality score using the regression model and a probability of the binary class using the classification model. In the first section personality of a person can be predicted by inputting any text with the help of a text predictor. This section is based on the models trained earlier. The user interface of the web application gives the option to predict based on the text. The models trained will give a score on each of the 5 personality traits. In the Questionnaire section user undergoes 50 questions Big 5 personality test regarding the O.C.E.A.N model, User will get options like agree, disagree and neutral on basis of that certain score will be generated, which then displays your corresponding percentile scores for each personality trait and based on which radar graph is implemented. It is like a psychometric assessment. The score generated in this section is used to compare personality obtained in the Facebook section. Facebook is used for comparing the personality among Facebook or any social media site network. Whenever use login with his Facebook his credential, the system will predict his personality from the various post he created he likes, dislikes. The system will extract automation the statuses of user’s friends are and stored them in a database. Here the user itself compares his personality within his friend’s network. Hence privacy of user’s data is maintained. The system is not violating any privacy of the user. The model which is trained earlier is used to give a prediction on the data which is stored in a database. This data is also used for comparison between the user’s personality score evaluated from the questionnaire section and the model-based prediction. 3. Functioning 1. Text Predictor: First, the text inputted by the user on the frontend is given for predicting through flask, and then it is converted into vectors using tf-idf vectorization. Second, the vectors of the text are provided as inputs to the five models trained on the myPersonality dataset by the Random Forest algorithm. Third, the scores are obtained by the regression model and probabilities are evaluated from the classification models and are displayed on the screen according to the respective personality traits. 2. Questionnaire: First, the questions are prepared according to the Big-Five personality traits, with 10 questions for each personality trait. Thus, the user has to answer 50 questions for the evaluation of the personality. Second, the answer to the questions are stored in the dictionary and the score is averaged according to the answer. Third, the percentile is also evaluated based on the user’s answer and the dataset of values for each personality trait. Forth, the score and the percentile are displayed on the user interface. The radar graph is displayed according to the score for visual representation. 3. Facebook: First, the user provides credentials for the Facebook account and are stored in yaml file. Second, the selenium platform is used for automation for extracting the Facebook statuses of 5 friends from the friend list provided in the Friends section on the Facebook account. Third, the extracted statues are stored in MongoDB. Forth, the statues are used to evaluate the personality traits scores, percentiles, and probabilities of each friend according to the Random Forest model. Fifth, the calculated personality is used to compare with the user’s personality in the questionnaire section. Sixth, the visual comparison can be done using by clicking on the compare button on the user interface. The radar graph and the difference between the scores and percentiles are visible on the user interface. This will help the user know the strength and weaknesses more clearly. 4. The novelty of the approach The salient features of the proposed system are: • • • • • • • The system uses the hybrid model which predicts the personality based on input text, questionnaire, and through Facebook friend network. Instate of art either of the techniques is used to predict the personality of a person. The study's originality is to use OCEAN features extraction from text, questionnaires, and social media profiles for personality prediction in order to disclose people's personalities. The Questionnaire section helps to know the personality of the person from the questions’ answers. The Text Predictor predicts the personality from the text the user inputs and scores and percentiles are displayed for the user’s understanding. Facebook friend’s statuses are extracted is entirely automated helpful to predict the personality of a person without violating any user privacy. The comparison tab can compare the personality of two different people, so the analysis can help know the strengths and weaknesses concerning others. The hybrid model can help the user with one platform for overall prediction and will not need to go search for another. 5. Statement of Further Particulars (SoFP): 1. Research methodologies and technologies • • • • • • • • Text predictor is generated using the random forest algorithm using tf-idf vectorization. React is used to design the user interface as it is an efficient and versatile JavaScript library. The Web App which can be created using React.js is going to be using backend as Flask and MongoDB. Python is used as a programming language. Python is simple and easy to find syntax emphasizes readability and thus reduces the value of program maintenance. MongoDB is used to store Facebook statuses from web scraping and it is further used for prediction. The predictions made by the models are again stored back into MongoDB. These are used for comparison. Selenium automated browser will scrape statuses from Facebook by taking the login credentials from the user and add them to a database in MongoDB. Node.js is used to run the web app that runs in React. React requires Node.js installed for running the project on port number 3000. Flask is used for connecting the React frontend to the MongoDB backend. It takes the values from MongoDB and posts them on port 5000 as well as takes the data from the frontend and provides it to the backend. So, it is mainly used for interactive frontend. Random forest models provide a prediction of personality score using the regression model and a probability of binary class using the classification model for each personality characteristic. 2. Benefits to the society and advantages a. Diagnose mental health problems In case anyone is facing a mental health issue, this can be helpful in some way for the person giving treatment to the victim to know the exact issue. This can speed up the process of curing the issue. b. Screening of the candidates based on predicted personality with the help of questionnaire: During the placement activities, to predict a candidate's personality, to check whether that candidate is suitable for that particular job role or not, this project can be used in which personality can be predicted based on five personality traits which are sufficient enough to predict one's personality. These five personality traits can be listed as Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism. The personality test consists of 50 personality-based questions having scores associated with each option and based on the option which the user will be selecting, probabilities or the scores will be generated. Hence this can be very helpful in the hiring of any job role, especially for the HR department. From the candidate's point of view, if he/she wants to make his/her profile suitable for the respective job role, this test can be taken and changes made in personality can lead to good results. c. To help the user compare his personality with others: In day-to-day life, users can compare his personality with others like his friends, colleagues, etc. to him he assumes idol. This will give chance to the user to make improvements in his personality based on his strength and weakness. The My Network section given in this application can be very useful in this case where the user can compare his personality (taken from the questionnaire section) with his Facebook network. The icing on the cake is that the results are displayed in the form of a radar graph. 3. Strength of the proposed system • • • • The average of three different methods can give more precise results. 50 questions are given in the questionnaire for a better analysis of personality. The comparison tab gives self-analysis to the user. It can be used by educational organizational, the employment sector, and knowing the patients before the diagnosis for psychological problems.