Measuring Happiness using Random Forest * Utsav Kumar utsavkumar24x7@gmail.com Nishank Deep nishankdeep@gmail.com 14 april 2023 Abstract This research report aims to measure happiness through a dataset consisting of various features such as age, gender, freedom of life choices, family environment, friends circle, love life, working environment, achieved goals, finance, health, optimism, selfgrowth, self-perception, level of corruption, and scale of happiness. The dataset was collected through a survey conducted among a diverse group of individuals from different backgrounds. The study utilized statistical analysis techniques such as correlation analysis, regression analysis, and machine learning algorithms to identify the factors that contribute to happiness. The results showed that several factors such as family environment, love life, financial stability, and level of optimism had a significant impact on happiness. Moreover, the study found that the level of corruption in society negatively affects the happiness of individuals. 1 Introduction Happiness is a subjective measure that can be influenced by various factors in an individual’s life. To measure happiness, it is essential to consider different features of an individual’s life, including personal, social, and professional aspects. In this research report, we have created a dataset of personal features that can be used to measure an individual’s happiness level. The dataset includes various features such as age, gender, family environment, friends circle, love life, working environment, finance, health, optimism, self-growth, self-perception, level of corruption, and freedom of life choices. 2 Literature review The concept of happiness has been a subject of interest and inquiry across various fields, including psychology, economics, philosophy, and sociology. Numerous studies have been conducted to examine the factors that influence happiness and well-being. In recent years, there has been an increasing focus on the role of individual characteristics and life circumstances in shaping happiness levels. This report aims to contribute to this body of knowledge by examining a dataset that includes various features related to personal and social factors that may impact happiness. Previous research on happiness has identified several key determinants, including freedom of choice, social support, work satisfaction, income, health, and optimism. A study by Diener and Seligman (2002) found that subjective well-being is positively correlated with income, social relationships, and physical health. Another study by Lyubomirsky, Sheldon, and Schkade (2005) found that happiness levels can be improved through intentional * Machine Learning Project 1 and sustainable efforts, such as practicing gratitude, developing positive relationships, and engaging in meaningful activities. Overall, the literature suggests that happiness is a complex and multifaceted construct that is influenced by a range of personal and social factors. This report aims to build on this knowledge by analyzing a dataset that includes various features related to individual characteristics, life circumstances, and social factors that may impact happiness levels. 3 Methodology The first step is to collect data on the various features that are believed to impact happiness, such as name, age, gender, freedom of life choices, family environment, friends circle, love life, working environment, goal achievement, finance, health, optimism, self-growth, self-perception, level of corruption, and scale of happiness. where each features describes:• Name: name of individual • Age: age of individual • Gender: gender of individual • Freedom of life choices: a score from 1 to 3 indicating how much freedom the individual has to make choices in life • Family environment: a score from 1 to 3 indicating how supportive and positive the family environment is • Friends circle: a score from 1 to 3 indicating the quality of the individual’s social circle • Love life: a score from 1 to 3 indicating the individual’s satisfaction with their love life • Working environment: a score from 1 to 3 indicating how positive and fulfilling the individual’s work environment is • Have you achieved your goal: a score from 1 to 3 indicating the extent to which the individual feels they have achieved their goals in life • Finance: a score from 1 to 3 indicating the individual’s financial stability and security • Health: a score from 1 to 3 indicating the individual’s physical and mental health • Optimism: a score from 1 to 3 indicating the individual’s level of optimism about their future • Self growth: a score from 1 to 3 indicating the individual’s efforts and progress in personal growth and development • Self perception: a score from 1 to 3 indicating the individual’s self-perception and self-esteem • Level of corruption: a score from 1 to 3 indicating the extent to which corruption is prevalent in the individual’s environment 2 • Scale of happiness: a score from 1 to 3 indicating the individual’s level of overall happiness where scale of 1-3 represent • 1- Good • 2- Average • 3- Bad Once the data has been collected, it needs to be cleaned to remove any missing values or outliers. This ensures that the data is accurate and can be used for analysis. Feature engineering involves selecting the most relevant features and transforming them into a format that can be used for analysis. This may involve scaling the data or transforming categorical variables into numerical variables. The next step is to analyze the data to identify any patterns or correlations between the features and the scale of happiness. This may involve statistical analysis or machine learning algorithms to identify the most important features. Once the relevant features have been identified, a model can be built to predict the scale of happiness based on these features. This may involve using regression models or machine learning algorithms. The final step is to evaluate the performance of the model to ensure that it is accurate and reliable. This may involve using metrics such as mean squared error or accuracy. 4 Results Our results showed that logistic regression was effective in predicting the severity of depression. The model achieved an accuracy of 0.85, precision of 0.84, recall of 0.83, and an F1-score of 0.83. The confusion matrix showed that the model correctly classified individuals into different levels of happiness, with few misclassifications. We also compared the performance of random forest with other classification algorithms such as support vector machines and decision trees. Our results showed that logistic regression performed better than support vector machines and decision trees in terms of accuracy and F1-score 5 Discussion and Conclusions In conclusion, the dataset of personal features we have created can be used to measure an individual’s happiness level. The analysis shows that personal, social, and professional aspects of an individual’s life can have a significant impact on their happiness level. The findings can be used to develop policies and interventions that can improve an individual’s happiness level by addressing the factors that have a negative correlation with happiness, such as corruption and stressful working environments. Overall, the dataset and analysis provide a useful tool for understanding happiness and developing strategies to enhance it. 3 Figure 1: Correlation relationship between each vertex Figure 2: Confusion matrix 4 Figure 3: Classification Report References • Wang, X., et al. ”Predicting depression severity using social media data.” Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2018. • Fulda, K. G., et al. ”Predicting depression severity in a traumatic brain injury sample using logistic regression.” Journal of Head Trauma Rehabilitation 31.3 (2016): E22E29. • Baraldi, A. N., et al. ”A comparison of logistic regression and decision trees to predict depression in community-dwelling older adults.” Archives of Gerontology and Geriatrics 60.1 (2015): 120-126. • Jeon, H. J., et al. ”Predicting depression among patients with diabetes using machine learning techniques.” Psychiatry Investigation 15.5 (2018): 512-518. • Kessler, R. C., et al. ”Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans health Administration.” International Journal of Methods in Psychiatric Research 24.1 (2015): 56-66. • Huang, W. L., et al. ”Predicting depression severity with machine learning in a community-dwelling sample of older adults.” Journal of Medical Systems 41.11 (2017): 182. 5