Document 12005206

advertisement
The Testing Process
Dementia is a nonreversible condition caused by degenerative changes in the
brain that lead to loss of brain function with noticeable impact on a person’s life, affecting
memory, thinking, language, judgment, and behavior. A mild cognitive impairment (MCI)
is characterized by a cognitive impairment beyond that of normal aging, but without other
cognitive problems or functional deficits [2]. With MCI, instrumental activities of daily
living (IADLs) should mostly be intact and they cannot fully meet the criteria for dementia
[2]. Identifying one of these cognitive impairments before it reaches moderate or
advanced stages can be particularly useful, especially for someone who has the
condition and intends to plan for their future. Memory impairment alone can lead to one’s
loss of ability to autonomously complete tasks as it has a strong impact on the ability of a
person to complete IADLs [3]. Knowing this, IADLs can be used as a basis for identifying
a cognitive impairment such as dementia or a MCI based on patterns of behavior while
actively engaging in IADLs.
At this time, detection of cognitive impairments is difficult since the conditions are
often subtle in early stages [2]. Research has shown that as many as 75% of cases of
dementia and cognitive disorders go unnoticed or become diagnosed in late stages [4,
5]. There remains much opportunity to improve the ability to detect and correctly
diagnose cognitive impairments such as dementia and MCIs.
The testing process is broken into seven stages to explain a single run of the test data:
1. Raw Data – at this stage, the data has many missing values, unnecessary
attributes, is contained in separate files, and is generally unusable.
2. Format Data – the data is combined from given databases and formatted to a usable
type for Weka. Scripting is being utilized to do batch testing, so separate pairs of
test and train files must be created for each configuration.
3. Weka – After preprocessing, data is loaded into Weka for the remaining steps.
4. Attribute Selection – The top 5, 10, 15, 20, or 25 attributes are selected based on
the one-attribute evaluator (OneR) and the stage of testing, and then broken into
training and testing files.
5. Training – The artificial neural network is trained using the training file created in the
previous step.
6. Testing – The artificial neural network is tested on new data from the testing file
created in Step 4 and the results are printed to a file.
• Repeat Steps 2 – 6 until all tests under configuration are complete
7. Analyze Results – The output file from running the test script is analyzed for
attributes such as best accuracy and model structure to achieve it.
Results
Accuracy
Classification Accuracy (%)
Introduction
100
90
80
70
60
50
40
30
20
10
0
OneR - 6 Class
OneR - 4 Class
OneR - 3 Class
5
Top Attributes
10
15
20
# of Top Attributes
OneR - 6 Class (%)
OneR - 4 Class (%)
25
OneR - 3 Class (%)
5
69.04761905
72.09302326
87.09677419
10
71.42857143
76.74418605
80.64516129
15
73.80952381
79.06976744
87.09677419
20
73.80952381
76.74418605
87.09677419
25
73.80952381
81.39534884
87.09677419
To better understand the results, the steps taken at each phase of testing will be explained:
IADLs
Psychology Questionnaire
OneR – 6 Class: The IADL responses were combined with the classification, age, education, and sex
attributes in the second file based on ID numbers. The original database classification into seven
categories was reduced to 6, combining the categories for “Other Medical” and “Watch Further” into
“Other”(6 and 7).
OneR – 4 Class: The 6 class dataset was reduced to 4 classes, combining all age categories into one
“Healthy Aging” category (3,4, and 5).
OneR – 3 Class: The 4 class dataset was reduced to 3 classes, removing all subjects classified as
“Other” to focus the machine learning algorithm on the data it was intended to classify. All test subjects
that were missing data were removed from consideration.
Machine Learning
The goal of this research is to utilize knowledge from the fields of Psychology and
Computer Science to utilize machine learning for diagnosing and detecting dementia and
MCIs. This is achieved with data about IADLs gathered from a questionnaire to develop
a machine learning model based on the dataset. The model is then used to predict
whether someone has a condition or not based on their responses. There are several
advantages of a machine learning based assessment:
1. Machine learning technology allows computers to more accurately classify
participants as more accurate data becomes available to train on.
2. The assessment could be administered quickly once the survey responses are in, as
the data only needs to be run through the model.
3. A trained physician does not need to be present for the system to administer an
assessment.
4. With enough data, the learning algorithms will be more efficient at determining
patterns in behavior that better indicate proper classification.
Collecting Data
Interactive activities of daily living are defined as the daily tasks that enable a
patient to live independently in a community [6]. The questionnaire used to gather the
data contained in the database was developed around these IADLs:
• Ability to use telephone • Laundry
• Shopping
• Transportation
• Food Preparation
• Responsibility for own medications
• Housekeeping
• Ability to handle finances
The complete questionnaire contains 50 questions about these activities where the
patient can answer on a scale of 1-10 by circling the number that corresponds to their
level of independence regarding the activity in the question, as seen in the scale below.
Response
1
2
3
4
5
6
7
8
9
10
Answer Interpretation
Independent; perform as well as ever; no reminder or aid used
Independent; perform as well as ever; use a reminder or aid to assist (e.g., to-do list; written
notes; personal digital assistant; global positioning system in car)
Independent but not as well as ever
Somewhat independent; may require help/supervision/instruction
Rarely independent; typically requires help/supervision/instruction
No longer independent; needs help to complete activity
Cannot engage in activity anymore
Has never completed this activity
Environment does not require this activity to be completed
No basis for judgment
Surveys were taken by both the person being considered and a close caregiver, but only
the responses from the subject with a possible condition are considered for this study.
Responses were stored in their numeric format to be processed through Weka.
Machine Learning Model
As the graph and table show, accuracy improved as the classifications were narrowed down to
the desired 3. More than 3 classes was misleading during training, and having 3 is the most suitable for
our goal. In the final, most accurate run, after being trained on only 124 subjects (80% of the usable
dataset), the machine learning model was able to correctly classify 87% of the 31 subjects (20% of the
usable dataset) in the test file
To achieve machine learning, an artificial neural network was used for
classification. In short, an artificial neural network is composed of interconnecting
artificial neurons. It was created to abstract the complexity of biological systems,
focusing on information processing. While being modeled after the biological nervous
system, simpler artificial neural networks are adept at learning hidden patterns in data
by inferring a function from observations, making them useful for classification and
prediction based from a set of learning data. For a more in depth explanation of neural
networks and their history, see David Kriesel’s A Brief Introduction to Neural Networks
[7], available online.
The basic structure of a neural network contains an input layer, any number of
hidden layers, and an output layer for classifications. An example can be seen below. In
our case, there will be 3 nodes on the output layer: one for dementia, one for MCI, and
one for normal.
In the testing script for the model,
there are many parameters for the structure
and training time that can be changed. At
this time there is no good way to tell which
structure will produce the best results, so
the script was written to test every
configuration of networks from 1-2 hidden
layers, each with 1-30 nodes. Each of these
is tested with training times from 100 to
1000 epochs in intervals of 100. This
amounts to around 9000 configurations
tested for each number of top attributes in
each stage of testing. To get the percentage
accuracy, all the results per configuration
were written to a file and the best accuracy
achieved was selected.
[1] http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001748/
[2] Holsinger, T., Deveau, J., Boustani, M. , Williams, J. W. 2007. Does this patient have dementia?
JAMA 297 2391–2404.
[3] Maureen Schmitter-Edgecombe, Ellen Woo, David R. Greeley. 2009. Characterizing Multiple
Memory Deficits and Their Relation to Everyday Functioning in Individuals With Mild Cognitive
Impairment. Neuropsychology, vol. 23, Issue 2, 168-177.
[4] Hodges M. R.,Kirsch K., Newman M. W., Pollack M. E. 2010. Automatic Assessment of
Cognitive Impairment through Electronic Observation of Object Usage. Pervasive 2010, 192-209.
[5] Prafulla Dawadi, Diane Cook, Carolyn Parsey, Maureen Scmitter-Edgecombe, Miya Schneider.
2011. An Approach to Cognitive Assessment in Smart Home. Washington State University, School
of Electrical Engineering and Computer Science.
[6] Bookman, A., Harrington, M., Pass, L., & Reisner, E. 2007. Family caregiver handbook: Finding
elder care resources in Massachusetts. Cambridge, MA: Massachusetts Institute of Technology.
[7] David Kriesel, 2007, A Brief Introduction to Neural Networks, available at
http://www.dkriesel.com
This work was supported by the National Science Foundation’s REU program
under grant number IIS-0647705.
Special thanks to Maureen Scmitter-Edgecombe for gathering and providing the data for this study,
Teddy Yap for helping guide the project to completion, and WSU EECS for use of facilities and
resources.
Conclusion
After extensive testing, it can be seen that using machine learning as a means of
detecting and classifying cognitive impairments through prediction has the potential to be
very successful. With an accurate base of data to train off of, the model is able to achieve
high accuracy, even with small amounts of people. With high rates of cognitive
impairments going undetected, the result is promising, also having other benefits such as
the speed of the prediction. The prediction power will only grow in accuracy as more
accurate data is available to train on. If further testing continues to obtain high accuracy,
this could help lower the rates of patients going undiagnosed if something as simple as a
survey was implemented and run through a machine learning model for all patients.
References & Acknowlegements
Download