ML for Global Health - ICML Workshop 2020 REVIEW OF FEATURE IMPORTANCE FOR MENTAL HEALTH INTERVENTIONS Rhythm Bhatia University of Eastern Finland rbhatia@student.uef.fi EXTENDED ABSTRACT Mental health impacts our emotional, social and psychological well-being. It is an integral part of a balanced and healthy life. For working professionals, especially doctors, it is often seemed that long working hours and heavy workloads affect mental health. The emotional well-being of doctors is a matter of concern not only for them, but also us [11]. Due to lack of research and data availability to determine the factors affecting mental health, this study is conducted on publicly available dataset collected through a survey conducted on technology workers [1]. Data source The data for this feature analysis was hosted by Kaggle.com and was collected by Open Sourcing Mental Illness, Ltd with an open source creative common license (CC BY-SA 4.0). This dataset consists of responses to subset of questions prepared by the OSMI Mental Health in Tech survey. The responses were answered by multiple people working in technology firms spanning across multiple countries. Data for this survey was collected from August 2014 to February 2016. [1] Variables Feature analysis was performed across the participant characteristic variables on the survey that consists variables defined below: Variable Name Age Gender Family history Benefits Care option Anonymity Leave Work interfere Mental health consequence Physical health consequence Definition Age of participant Gender of participant History of mental illness in participant’s family. Mental health benefits plan provided by participant’s employer. Awareness of mental health benefits plan provided by participant’s employer. Option to protect identity if participant chooses to take advantage of mental health resources. How easy it is for participant to take medical leave from work for a mental health condition? If participant has a mental health condition, does he/she feel that it interferes with their work? Are there negative consequences of discussing a mental health issue with your employer? Are there negative consequences of discussing a physical health issue with your employer? Table 1: Variable definitions All binary with the values “yes” or “no” were coded into numeric format of 0 or 1. ’Age’, in years, was left as is. ’Work Interfere’ and ’leave’ was coded into ordinal bins consisting of never, rarely, sometimes and 1 ML for Global Health - ICML Workshop 2020 often. These ordinal measures were coded in the increments of 1 with the lowest level having the value of 0. Gender variable was coded to indicate females equal to 1 and males equal to 0. Only 172 out of 1,251 responses were used for analysis due to the uncertainty of the survey responses from the participants. [9] Feature analysis Method Linear Regression Logistic Regression Permutation importance Xgboost χ2 p-value Age 0.10650 0.65304 -0.00270 Gender -0.08078 -0.51353 0.00016 Family history 0.18267 1.13361 0.00955 Benefits 0.04342 0.32557 0.02259 Care options 0.04210 0.27556 0.01671 0.05079 NAN 0.05566 0.23919 0.08413 0.153238 0.06588 8.01e−105 0.05849 4.6075e−53 Method Anonymity Leave Work interfere Linear Regression Logistic Regression Permutation importance Xgboost χ2 p-value 0.02377 0.15819 0.02355 0.00170 0.02007 0.02848 0.16501 0.92513 0.26476 0.06305 3.7375e−36 0.05782 0.46566 5.8297e−16 6.764e−06 Mental health consequences -0.00500 -0.03414 0.00414 Physical health consequences -0.00020 0.00750 -0.00095 0.05291 4.4497e−05 0.04560 0.0145355 Table 2: Feature values based on the various methods The above table demonstrates the feature importance score of various variables with respect to the "seek help" variable. This is implemented using scikit library [10] for linear regression [5], logistic regression [8], permutation importance [2], Xgboost [4] and χ2 test [7]. Results Based on the above reported feature analysis and χ2 test, we observe that benefits, anonymity, leave, physical health consequences, mental health consequences are the relevant features. Consequently, a small p-value ( > 0.05), indicates that we can reject the null hypothesis that there is no relationship between the selected feature and help seeking behaviour of participant, and conclude that there is a strong relationship between the two variables. χ2 is not valid for age because expected frequencies for some participants is less than 1. These results also suggest that there are negative consequences of discussing mental and physical health issues with the employer. Also, mental health benefits and care options alone do not act as a motivation for employee to seek help. Although, if the employee is given an option to maintain their anonymity and take leave, it will positively impact their behaviour to opt for mental health services. Future Work These results can help in selection of features to develop models which can help in discovering and helping medical professionals dealing with mental health issues due to COVID-19 pandemic. COVID-19 pandemic has put the medical care professionals all over the world in an unusual situation where they have to work under extreme pressure with scarce resources, take tough decisions and deal with life and death. They suffer from moral injury, (military term) defined as the damage done to one’s conscience due to the actions, or the lack of them. This can contribute to mental health difficulties, including depression, post-traumatic stress disorder and suicide [6]. A mental health study conducted on medical staff in China also indicated significant behavioral changes on the staff attending to COVID-19 patients. [3] 2 ML for Global Health - ICML Workshop 2020 REFERENCES [1] Kaggle dataset link:. https://www.kaggle.com/osmi/mental-health-in-tech-survey. [2] André Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10):1340–1347, 2010. [3] Qiongni Chen, Mining Liang, Yamin Li, Jincai Guo, Dongxue Fei, Ling Wang, Li He, Caihua Sheng, Yiwen Cai, Xiaojuan Li, et al. Mental health care for medical staff in china during the covid-19 outbreak. The Lancet Psychiatry, 7(4):e15–e16, 2020. [4] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. [5] David A Freedman. Statistical models: theory and practice. cambridge university press, 2009. [6] Neil Greenberg, Mary Docherty, Sam Gnanapragasam, and Simon Wessely. Managing mental health challenges faced by healthcare workers during covid-19 pandemic. bmj, 368, 2020. [7] Priscilla E Greenwood and Michael S Nikulin. A guide to chi-squared testing, volume 280. John Wiley & Sons, 1996. [8] David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. Logistic regression. Springer, 2002. [9] Pratik Patel. Perceived workplace factors and their influence on self-reported mental health service seeking among technology workers, 2018. [10] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011. [11] Reidar Tyssen and Per Vaglum. Mental health problems among young doctors: an updated review of prospective studies. Harvard review of psychiatry, 10(3):154–165, 2002. 3