Big data analytics and automated decision making in medicine and

advertisement
Introduction
Materials and methods
Big data analytics and automated decision
making in medicine and healthcare
Computer Aided Diagnosis
Dr. Oliver Faust1
1 School
of Science & Engineerring
Habib University
2014
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Outline
1
Introduction
Problem: Information overload
Solution: Big data analytics
2
Materials and methods
A methodology for data analytics
Conclusions
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
The case for big data
Science and observation
Analytical science observes nature and analyzes the data from
these observations.
Medicine and biomedical engineering
Far from being an exception, biomedical data acquisition is on
the forefront in the quest for more data.
It strives towards extracting relevant information that could
accurately diagnose a disease or predict its prognosis.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
Big, as in more and better, data
More data sources
Medical facilities are equipped with new scanners while
older once are still used in parallel.
Mobile physiological monitoring equipment. For example a
mobile phone with a health application.
Better data quality
Higher resolution.
Specialized targeted applications.
Combination of different signal acquisition techniques.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
Example: X-ray evolution
Data size:
A simple Digital
Radiography (DR) such
as a chest X-ray takes up
about 30 MB (Megabyte).
In Computed Tomography
(CT), a full study takes
more than one GB
(Gigabyte).
PET-CT combines CT
with about 5 MB of
Positron Emission
Tomography (PET) data.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
Result: Big data
Medical data is
Big
In 2008 it has been estimated that all kinds of medical images
occupied 1250 Petabytes.
Truly very big
In comparison, the giant Internet search company Google used
an estimated disk space of 20 to 200 Petabytes in August 2009.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
Problem: Information overload
Medical practitioners face:
Fatigue;
Lag of attention;
Optical illusions;
Stress:
Time pressure;
Decision pressure;
Responsibility.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
Problem: Information overload
Solution: Big data analytics
Solution: Big data analytics
Analytics for practitioner support
Highlight important parts of the data to support the practitioner.
+ Low system complexity;
+ Large margin of error – the brain is fault tolerant;
Human decision making;
– Lag of traceability.
Computer aided diagnosis
+ Traceability;
+ Automation – riding Moors law;
– Small margin of error;
– High system complexity.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Diseases
Need definition
Detect symptoms of a disease in data.
These diseases can be:
Cardiovascular Disease – The killer:
Heart attack;
Stroke.
Diabetes – Paving the way for the killer diseases:
Diabetic retinopathy – A leading cause of blindness;
Diabetic foot – Walking becomes painful.
Epilepsy – Thunderstorm in the brain.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
What is the nature of the problem?
Unsupervised
There is no previous knowledge on the data. A possible
approach is:
Find structure in the data;
Characterize the structure.
Supervised
Medical data, such as body temperature, is used for disease
diagnosis since Milena.
Feature extraction;
Automated decision making.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature
assessment
Best
Features
Calculation of
selected features
Training
Online
classification
Offline
classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Data
One dimensional signals:
Electrocardiogram [9];
Heart Rate [3, 17, 15, 18, 9, 8, 14];
Electroencephalogram [7, 10, 12, 10, 13, 6].
Two dimensional signals:
Ultrasound [2, 4, 1, 5];
X-ray and mammography;
Thermography;
Fundus images of the retina [11, 16].
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Fundus image with signs of Diabetes retinopathy
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Ultrasound images of Carotid plaque
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature
assessment
Best
Features
Calculation of
selected features
Training
Online
classification
Offline
classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Preprocessing
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature
assessment
Best
Features
Calculation of
selected features
Training
Online
classification
Offline
classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Feature extraction
The aim of feature extraction is to extract significant measures
from the preprocessed physiological measurements. In
general, there are two different types of feature extraction:
Linear Features;
Nonlinear Features.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Linear Features
Algorithms used for linear feature extraction:
Statistical – Mean Variance;
Fourier transform coefficients;
Discrete wavelet coefficients;
Entropy measures;
Principal Component Analysis;
Recurrence plots.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Linear Feature example: Fourier transform
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Nonlinear Features
Algorithms used for nonlinear feature extraction:
Largest Lupanov Exponent (LLE);
Correlation Dimension (CD);
Hurst exponent (H);
Approximate entropy (ApEn);
Fractal dimension (FD);
Phase space plot;
Recurrence plots (RP).
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Electroencephalogram of different sleep stages
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature
assessment
Best
Features
Calculation of
selected features
Training
Online
classification
Offline
classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Feature assessment and feature selection
Analysis Of Variance (ANOVA)
ANOVA test yields the so called p-value, which provide a
statistical measure of the feature quality.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature
assessment
Best
Features
Calculation of
selected features
Training
Online
classification
Offline
classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Classification
The aim of classification is to obtain an objective decision on
the underlying data. For computer aided diagnosis systems,
this objective decision is concerned with whether or not the
underlying data comes from a diseased person, and if a
disease was detected a corollary decision is needed on the
nature of the disease.
Suppoert Vector Machine (SVM);
ADAboost (Adaptive Boosting);
Decision Tree (DT);
K-Nearest Neighbout (KNN);
Artificial Neural Network (ANN);
Probabilistic Neural Network (PNN).
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Blockdiagram
Data
Offline
Online
Pre-processing
Pre-processing
Feature
extraction
Feature assessment
Best
Features
Calculation of
selected features
Training
Online classification
Offline classification
Classification
assessment
Information
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Classification assessment
This is the most important classification criteria, because it
determines the quality of the analysis system.
Accuracy;
Sensitivity;
Specificity;
Positive Predictive Value;
Confusion Matrix;
Receiver operating characteristic.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Cross validation electrocardiogram example
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Conclusion, off-line system
Find a suitable algorithm structure through off-line processing
and modeling.
Know what you are looking for – Define the need;
Make the data accessible through preprocessing;
Extract features, many more than needed;
Test the features and select the best features;
Test automated decision algorithms, select the best
algorithm.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Conclusion, on-line system
Implement the on-line system to benefit the target audience.
Clean the data through pre-processing;
Extract the best features;
Employ the most accurate classification algorithm to get
the best decision support;
Visualize the decision support result;
Distribute the decision support result.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References I
[1]
Rajendra U Acharya, Oliver Faust, APC Alvin,
S Vinitha Sree, Filippo Molinari, Luca Saba,
Andrew Nicolaides, and Jasjit S Suri. “Symptomatic vs.
asymptomatic plaque classification in carotid ultrasound”.
In: Journal of medical systems 36.3 (2012),
pp. 1861–1871.
[2]
U Rajendra Acharya, Oliver Faust,
Subbhuraam Vinitha Sree, Filippo Molinari, Luca Saba,
Andrew Nicolaides, and Jasjit S Suri. “An accurate and
generalized approach to plaque characterization in 346
carotid ultrasound scans”. In: Instrumentation and
Measurement, IEEE Transactions on 61.4 (2012),
pp. 1045–1053.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References II
[3]
U Rajendra Acharya, Oliver Faust, S Vinitha Sree,
Dhanjoo N Ghista, Sumeet Dua, Paul Joseph,
VI Thajudin Ahamed, Nittiagandhi Janarthanan, and
Toshiyo Tamura. “An integrated diabetic index using heart
rate variability signal features for diagnosis of diabetes”.
In: Computer methods in biomechanics and biomedical
engineering 16.2 (2013), pp. 222–234.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References III
[4]
U Rajendra Acharya, Oliver Faust, S V Sree, Ang
Peng Chuan Alvin, Ganapathy Krishnamurthi,
Jose CR Seabra, João Sanches, and JS Suri.
“AtheromaticTM : Symptomatic vs. asymptomatic
classification of carotid ultrasound plaque using a
combination of HOS, DWT & texture”. In: Engineering in
Medicine and Biology Society, EMBC, 2011 Annual
International Conference of the IEEE. IEEE. 2011,
pp. 4489–4492.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References IV
[5]
U Rajendra Acharya, Oliver Faust, APC Alvin,
Ganapathy Krishnamurthi, José CR Seabra,
João Sanches, Jasjit S Suri, et al. “Understanding
symptomatology of atherosclerotic plaque by
image-based tissue characterization”. In: Computer
methods and programs in biomedicine 110.1 (2013),
pp. 66–75.
[6]
Rajendra Acharya U, Oliver Faust, N Kannathal,
TjiLeng Chua, and Swamy Laxminarayan. “Non-linear
analysis of EEG signals at various sleep stages”. In:
Computer methods and programs in biomedicine 80.1
(2005), pp. 37–45.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References V
[7]
O Faust, RU Acharya, AR Allen, and CM Lin. “Analysis of
EEG signals during epileptic and alcoholic states using
AR modeling techniques”. In: IRBM 29.1 (2008),
pp. 44–52.
[8]
Oliver Faust, Leong Mei Yi, and Lee Mei Hua. “Heart
Rate Variability Analysis for Different Age and Gender”.
In: Journal of Medical Imaging and Health Informatics 3.3
(2013), pp. 395–400.
[9]
Oliver Faust and Wenwei Yu. “Cardiac Health
Visualization and Diagnosis Using Entropies”. In: Journal
of Medical Imaging and Health Informatics 3.3 (2013),
pp. 409–416.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References VI
[10]
Oliver Faust, Wenwei Yu, and Nahrizul Adib Kadri.
“Computer-based identification of normal and alcoholic
eeg signals using wavelet packets and energy
measures”. In: Journal of Mechanics in Medicine and
Biology 13.03 (2013).
[11]
Oliver Faust, Rajendra Acharya, E Y K Ng,
Kwan-Hoong Ng, and Jasjit S Suri. “Algorithms for the
automated detection of diabetic retinopathy using digital
fundus images: a review”. In: Journal of medical systems
36.1 (2012), pp. 145–157.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References VII
[12]
Oliver Faust, U Rajendra Acharya, Lim Choo Min, and
Bernhard HC Sputh. “Automatic identification of epileptic
and background EEG signals using frequency domain
parameters”. In: International journal of neural systems
20.02 (2010), pp. 159–176.
[13]
OLIVER FAUST, ANG PENG CHUAN ALVIN,
SUBHA D PUTHANKATTIL, and PAUL K JOSPEH.
“DEPRESSION DIAGNOSIS SUPPORT SYSTEM
BASED ON EEG SIGNAL ENTROPIES”. In: Journal of
Mechanics in Medicine and Biology (2013).
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References VIII
[14]
Oliver Faust, U Rajendra Acharya, Filippo Molinari,
Subhagata Chattopadhyay, and Toshiyo Tamura. “Linear
and non-linear analysis of cardiac health in diabetic
subjects”. In: Biomedical Signal Processing and Control
7.3 (2012), pp. 295–302.
[15]
Tay Swee Hock, Oliver Faust, Teik-Cheng Lim, and
Wenwei Yu. “Automated Detection of Premature
Ventricular Contraction Using Recurrence Quantification
Analysis on Heart Rate Signals”. In: Journal of Medical
Imaging and Health Informatics 3.3 (2013), pp. 462–469.
[16]
M MUTHU RAMA KRISHNAN and Oliver Faust.
“Automated glaucoma detection using hybrid feature
extraction in retinal fundus images”. In: Journal of
Mechanics in Medicine and Biology 13.01 (2013).
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
References IX
[17]
Faust Oliver, Acharya U Rajendra, SM Krishnan, and
Min Lim. “Analysis of cardiac signals using spatial filling
index and time-frequency domain”. In: BioMedical
Engineering OnLine 3.30 (2004), pp. 1–11.
[18]
U Rajendra Acharya, Oliver Faust, Nahrizul Adib Kadri,
Jasjit S Suri, and Wenwei Yu. “Automated identification of
normal and diabetes heart rate signals using nonlinear
measures”. In: Computers in biology and medicine 43.10
(2013), pp. 1523–1529.
Dr. Faust
Computer Aided Diagnosis
Introduction
Materials and methods
A methodology for data analytics
Conclusions
Questions and Answers
Thesis
Data volume is nothing without analytics.
Questions? Comments?
Dr. Faust
Computer Aided Diagnosis
Download