The Testing Process Dementia is a nonreversible condition caused by degenerative changes in the brain that lead to loss of brain function with noticeable impact on a person’s life, affecting memory, thinking, language, judgment, and behavior. A mild cognitive impairment (MCI) is characterized by a cognitive impairment beyond that of normal aging, but without other cognitive problems or functional deficits [2]. With MCI, instrumental activities of daily living (IADLs) should mostly be intact and they cannot fully meet the criteria for dementia [2]. Identifying one of these cognitive impairments before it reaches moderate or advanced stages can be particularly useful, especially for someone who has the condition and intends to plan for their future. Memory impairment alone can lead to one’s loss of ability to autonomously complete tasks as it has a strong impact on the ability of a person to complete IADLs [3]. Knowing this, IADLs can be used as a basis for identifying a cognitive impairment such as dementia or a MCI based on patterns of behavior while actively engaging in IADLs. At this time, detection of cognitive impairments is difficult since the conditions are often subtle in early stages [2]. Research has shown that as many as 75% of cases of dementia and cognitive disorders go unnoticed or become diagnosed in late stages [4, 5]. There remains much opportunity to improve the ability to detect and correctly diagnose cognitive impairments such as dementia and MCIs. The testing process is broken into seven stages to explain a single run of the test data: 1. Raw Data – at this stage, the data has many missing values, unnecessary attributes, is contained in separate files, and is generally unusable. 2. Format Data – the data is combined from given databases and formatted to a usable type for Weka. Scripting is being utilized to do batch testing, so separate pairs of test and train files must be created for each configuration. 3. Weka – After preprocessing, data is loaded into Weka for the remaining steps. 4. Attribute Selection – The top 5, 10, 15, 20, or 25 attributes are selected based on the one-attribute evaluator (OneR) and the stage of testing, and then broken into training and testing files. 5. Training – The artificial neural network is trained using the training file created in the previous step. 6. Testing – The artificial neural network is tested on new data from the testing file created in Step 4 and the results are printed to a file. • Repeat Steps 2 – 6 until all tests under configuration are complete 7. Analyze Results – The output file from running the test script is analyzed for attributes such as best accuracy and model structure to achieve it. Results Accuracy Classification Accuracy (%) Introduction 100 90 80 70 60 50 40 30 20 10 0 OneR - 6 Class OneR - 4 Class OneR - 3 Class 5 Top Attributes 10 15 20 # of Top Attributes OneR - 6 Class (%) OneR - 4 Class (%) 25 OneR - 3 Class (%) 5 69.04761905 72.09302326 87.09677419 10 71.42857143 76.74418605 80.64516129 15 73.80952381 79.06976744 87.09677419 20 73.80952381 76.74418605 87.09677419 25 73.80952381 81.39534884 87.09677419 To better understand the results, the steps taken at each phase of testing will be explained: IADLs Psychology Questionnaire OneR – 6 Class: The IADL responses were combined with the classification, age, education, and sex attributes in the second file based on ID numbers. The original database classification into seven categories was reduced to 6, combining the categories for “Other Medical” and “Watch Further” into “Other”(6 and 7). OneR – 4 Class: The 6 class dataset was reduced to 4 classes, combining all age categories into one “Healthy Aging” category (3,4, and 5). OneR – 3 Class: The 4 class dataset was reduced to 3 classes, removing all subjects classified as “Other” to focus the machine learning algorithm on the data it was intended to classify. All test subjects that were missing data were removed from consideration. Machine Learning The goal of this research is to utilize knowledge from the fields of Psychology and Computer Science to utilize machine learning for diagnosing and detecting dementia and MCIs. This is achieved with data about IADLs gathered from a questionnaire to develop a machine learning model based on the dataset. The model is then used to predict whether someone has a condition or not based on their responses. There are several advantages of a machine learning based assessment: 1. Machine learning technology allows computers to more accurately classify participants as more accurate data becomes available to train on. 2. The assessment could be administered quickly once the survey responses are in, as the data only needs to be run through the model. 3. A trained physician does not need to be present for the system to administer an assessment. 4. With enough data, the learning algorithms will be more efficient at determining patterns in behavior that better indicate proper classification. Collecting Data Interactive activities of daily living are defined as the daily tasks that enable a patient to live independently in a community [6]. The questionnaire used to gather the data contained in the database was developed around these IADLs: • Ability to use telephone • Laundry • Shopping • Transportation • Food Preparation • Responsibility for own medications • Housekeeping • Ability to handle finances The complete questionnaire contains 50 questions about these activities where the patient can answer on a scale of 1-10 by circling the number that corresponds to their level of independence regarding the activity in the question, as seen in the scale below. Response 1 2 3 4 5 6 7 8 9 10 Answer Interpretation Independent; perform as well as ever; no reminder or aid used Independent; perform as well as ever; use a reminder or aid to assist (e.g., to-do list; written notes; personal digital assistant; global positioning system in car) Independent but not as well as ever Somewhat independent; may require help/supervision/instruction Rarely independent; typically requires help/supervision/instruction No longer independent; needs help to complete activity Cannot engage in activity anymore Has never completed this activity Environment does not require this activity to be completed No basis for judgment Surveys were taken by both the person being considered and a close caregiver, but only the responses from the subject with a possible condition are considered for this study. Responses were stored in their numeric format to be processed through Weka. Machine Learning Model As the graph and table show, accuracy improved as the classifications were narrowed down to the desired 3. More than 3 classes was misleading during training, and having 3 is the most suitable for our goal. In the final, most accurate run, after being trained on only 124 subjects (80% of the usable dataset), the machine learning model was able to correctly classify 87% of the 31 subjects (20% of the usable dataset) in the test file To achieve machine learning, an artificial neural network was used for classification. In short, an artificial neural network is composed of interconnecting artificial neurons. It was created to abstract the complexity of biological systems, focusing on information processing. While being modeled after the biological nervous system, simpler artificial neural networks are adept at learning hidden patterns in data by inferring a function from observations, making them useful for classification and prediction based from a set of learning data. For a more in depth explanation of neural networks and their history, see David Kriesel’s A Brief Introduction to Neural Networks [7], available online. The basic structure of a neural network contains an input layer, any number of hidden layers, and an output layer for classifications. An example can be seen below. In our case, there will be 3 nodes on the output layer: one for dementia, one for MCI, and one for normal. In the testing script for the model, there are many parameters for the structure and training time that can be changed. At this time there is no good way to tell which structure will produce the best results, so the script was written to test every configuration of networks from 1-2 hidden layers, each with 1-30 nodes. Each of these is tested with training times from 100 to 1000 epochs in intervals of 100. This amounts to around 9000 configurations tested for each number of top attributes in each stage of testing. To get the percentage accuracy, all the results per configuration were written to a file and the best accuracy achieved was selected. [1] http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001748/ [2] Holsinger, T., Deveau, J., Boustani, M. , Williams, J. W. 2007. Does this patient have dementia? JAMA 297 2391–2404. [3] Maureen Schmitter-Edgecombe, Ellen Woo, David R. Greeley. 2009. Characterizing Multiple Memory Deficits and Their Relation to Everyday Functioning in Individuals With Mild Cognitive Impairment. Neuropsychology, vol. 23, Issue 2, 168-177. [4] Hodges M. R.,Kirsch K., Newman M. W., Pollack M. E. 2010. Automatic Assessment of Cognitive Impairment through Electronic Observation of Object Usage. Pervasive 2010, 192-209. [5] Prafulla Dawadi, Diane Cook, Carolyn Parsey, Maureen Scmitter-Edgecombe, Miya Schneider. 2011. An Approach to Cognitive Assessment in Smart Home. Washington State University, School of Electrical Engineering and Computer Science. [6] Bookman, A., Harrington, M., Pass, L., & Reisner, E. 2007. Family caregiver handbook: Finding elder care resources in Massachusetts. Cambridge, MA: Massachusetts Institute of Technology. [7] David Kriesel, 2007, A Brief Introduction to Neural Networks, available at http://www.dkriesel.com This work was supported by the National Science Foundation’s REU program under grant number IIS-0647705. Special thanks to Maureen Scmitter-Edgecombe for gathering and providing the data for this study, Teddy Yap for helping guide the project to completion, and WSU EECS for use of facilities and resources. Conclusion After extensive testing, it can be seen that using machine learning as a means of detecting and classifying cognitive impairments through prediction has the potential to be very successful. With an accurate base of data to train off of, the model is able to achieve high accuracy, even with small amounts of people. With high rates of cognitive impairments going undetected, the result is promising, also having other benefits such as the speed of the prediction. The prediction power will only grow in accuracy as more accurate data is available to train on. If further testing continues to obtain high accuracy, this could help lower the rates of patients going undiagnosed if something as simple as a survey was implemented and run through a machine learning model for all patients. References & Acknowlegements