Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept. 1 Course Information Instructor: Dr. Jinbo Bi – Office: ITEB 233 – Phone: 860-486-1458 – Email: jinbo@engr.uconn.edu – Web: http://www.engr.uconn.edu/~jinbo/ – Time: Mon / Wed. 2:00pm – 3:15pm – Location: CAST 204 – Office hours: Mon. 3:30-4:30pm HuskyCT – http://learn.uconn.edu – Login with your NetID and password – Illustration 2 Introduction of the instructor Ph.D in Mathematics Previous work experience: – Siemens Medical Solutions Inc. – Department of Defense, Bioanalysis – Massachusetts General Hospital Research Interests Color of flowers Cancer, Psychiatric disorders, … http://labhealthinfo.uconn.e du/EasyBreathing subtyping GWAS 3 Course Information Prerequisite: Basics of linear algebra, calculus, and basics of programming Course textbook (not required): – Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar – Pattern Recognition and Machine Learning (2006) Christopher M. Bishop – Pattern Classification (2nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork Additional class notes and copied materials will be given Reading material links will be provided 4 Course Information Objectives: – Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art literature in data mining/machine learning – Get to know some general topics in medical informatics – Focus on some high-demanding medical informatics problems with hands-on experience of applying data mining techniques Format: – Lectures, Labs, Paper reviews, A term project 5 Survey Why are you taking this course? What would you like to gain from this course? What topics are you most interested in learning about from this course? Any other suggestions? (Please respond before NEXT THUR. You can also Login HuskyCT and download the MS word file, fill in, and shoot me an email.) 6 Grading In-Class Lab Assignments (3): Paper review (1): Term Project (1): Participation (1): 30% 10% 50% 10% 7 Policy Computers Assignments must be submitted electronically via HuskyCT Make-up policy – If a lab assignment or a paper review assignment is missed, there will be a final take-home exam to make up – If two of these assignments are missed, an additional lab assignment and a final takehome exam will be used to make up. 8 Three In-class Lab Assignments At the class where in-class lab assignment is given, the class meeting will take place in a computer lab, and no lecture Computer lab will be at ITEB 138 (TA reserve) The assignment is due at the beginning of the class one week after the assignment is given If the assignment is handed in one-two days late, 10 credits will be reduced for each additional day Assignments will be graded by our teaching assistant 9 Paper review Topics of papers for review will be discussed Each student selects 1 paper in each assignment, prepares slides and presents the paper in 8 – 15 mins in the class The goal is to take a look at the state-of-the-art research work in the related field Paper review assignment is on topics of state-ofthe-art data mining techniques 10 Term Project Possible project topics will be provided as links, students are encouraged to propose their own Teams of 1-2 students can be created Each team needs to give a presentation in the last 1-2 weeks of the class (10-15min) Each team needs to submit a project report – Definition of the problem – Data mining approaches used to solve the problem – Computational results – Conclusion (success or failure) 11 Final Exam If you need make-up final exam, the exam will be provided on May. 1st (Wed) Take-home exam Due on May 9th (Thur.) 12 Three In-class Lab Assignments BioMedical Informatics Topics – So many – Cardiac Ultrasound image categorization – Computerized decision support for Trauma Patient Care – Computer assisted diagnostic coding 13 Cardiac ultrasound view separation 14 Cardiac ultrasound view separation Classification (or clustering) Apical 4 chamber view Parasternal long axis view Parasternal short axis view 15 Trauma Patient Care 25 min of transport time/patient High-frequency vital-sign waveforms (3 waveforms) – ECG, SpO2, Respiratory Low-frequency vital-sign time series (9 variables) – – – – Derived variables ECG heart rate SpO2 heart rate SaO2 arterial O2 saturation Respiratory rate Measured variables ► NIBP (systolic, diastolic, MAP) ► NIBP heart rate ► End tidal CO2 Discrete patient attribute data (100 variables) – Demographics, injury description, prehospital interventions, etc. Vital signs used in decisionsupport algorithms Propaq HR RR SaO2 SBP DBP 16 Trauma Patient Care 17 Trauma Patient Care Heart Rate Respiratory Rate Saturation of Oxygen Major Bleeding Blood Pressure Make a prediction 18 Diagnostic coding Hospital Document DB Patient – Notes Patient Note Diagnostic Code DB Patientsdatabase – Criteria Code Patient A 428 B 1 diagnosis 1 C 250 heart failure diabetes AMI D 2 414 E 250 F 429 Insurance 3 2 G ... Look up ICD-9 codes SCIP ... Statistics reimbursement ... ... ... ... ... ... ... ... SIEMENS 19 19/38 Diagnostic coding Hospital Document DB Patient – Notes Patient Note A B 1 C D E F 2 G ... Diagnostic Code DB Code database Patients – Criteria RWP/CC1 DICT. XXXXXXXXXXX P TRANS. XXXXXXXXXX P DOC.# 1554360 diagnosis Patient JOB # XXXXXXXXXX CC XXXXXXXXXX FILE CV XXXXXXXXXXXXXXXXXX. 428 XXXXXXXXXXXXXXXXXX ORDXXXXXXX, XXXX L ADM DIAGNOSIS: BRADYCARDIA ANEMIA 1 CHF 250 ORD #: XXXXXXX DX XXXXXXX 14:10 PROCEDURE: CHEST - PA ` LATERAL ACCXXXXXX REPORT: CLINICAL HISTORY: CHEST PAIN. CHF. AMI THERE ARE NO PRIOR AP ERECT AND LATERAL VIEWS OF THE CHEST WERE OBTAINED. STUDIES AVAILABLE FOR COMPARISON. THE TRACHEA IS NORMAL IN POSITION. HEART IS MODERATELY ENLARGED. HEMIDIAPHRAGMS ARE SMOOTH. THERE ARE 2 SMALL BILATERAL 414 PLEURAL EFFUSIONS. THERE IS ENGORGEMENT OF THE PULMONARY VASCULARITY. IMPRESSION: 1. CONGESTIVE HEART FAILURE WITH CARDIOMEGALY AND250 SMALL BILATERAL PLEURAL EFFUSIONS. 2. INCREASING OPACITY AT THE LEFT LUNG BASE LIKELY REPRESENTING PASSIVE ATELECTASIS. heart failure diabetes …. …………………. Look……………. up ICD-9 ………. 429 3 SCIP codes ... Statistics reimbursement ... ... ... ... ... ... ... ... SIEMENS Insurance 20 20/38 Diagnostic coding Hospital Document DB Patient – Notes Patient Note A B 1 C D E F 2 G ... Diagnostic Code DB Code database Patients – Criteria RWP/CC1 DICT. XXXXXXXXXXX P TRANS. XXXXXXXXXX P FAMILY HISTORY: IS NONCONTRIBUTORY IN A PATIENT OF THIS AGE GROUP. DOC.# 1554360 diagnosis Patient JOB # XXXXXXXXXX CC XXXXXXXXXX SOCIAL HISTORY: SHE IS DIVORCED. THE PATIENT CURRENTLY LIVES AT BERKS HEIM. FILE CV SHE IS ACCOMPANIED TODAY ON THIS VISIT BY HER DAUGHTER. SHE DOES NOT SMOKE XXXXXXXXXXXXXXXXXX. OR ABUSE ALCOHOLIC BEVERAGES. 428 XXXXXXXXXXXXXXXXXX ORDXXXXXXX, XXXX L PHYSICAL EXAMINATION: GENERAL: THIS IS AN ELDERLY, VERY PALE-APPEARING ADM DIAGNOSIS: BRADYCARDIA ANEMIA CHF IN A WHEELCHAIR FEMALE WHO IS SITTING 1 250 AND WAS EXAMINED IN HER WHEELCHAIR. ORD #: XXXXXXX DX XXXXXXX 14:10 HEENT: SHE IS WEARING GLASSES. SITTING UPRIGHT IN A WHEELCHAIR. NECK: NECK PROCEDURE: CHESTVEINS - PA ` LATERAL ACCXXXXXX I COULD NOT HEAR A LOUD CAROTID BRUIT. LUNGS: HAVE WERE NONDISTENDED. REPORT: CLINICAL HISTORY: CHEST PAIN. CHF. DIMINISHED BREATH SOUNDS AT THE BASES WITH NO LOUD WHEEZES, RALES OR AMI THERE ARE NO PRIOR AP ERECT AND LATERAL VIEWS HEART: OF THE HEART CHEST WERE RHONCHI. TONESOBTAINED. WERE BRADYCARDIC, REGULAR AND RATHER DISTANT STUDIES AVAILABLE FOR WITHCOMPARISON. A SYSTOLIC MURMUR HEARD AT THE LEFT LOWER STERNAL BORDER. I COULD NOT THE TRACHEA IS NORMAL HEART IS MODERATELY HEARIN A POSITION. LOUD GALLOP RHYTHM WITH HER ENLARGED. SITTING UPRIGHT OR A LOUD DIASTOLIC HEMIDIAPHRAGMS ARE SMOOTH. THERE ARE SMALL PLEURALEXTREMITIES: EFFUSIONS. ARE REMARKABLE FOR 2WAS MURMUR. ABDOMEN: SOFTBILATERAL AND 414 NONTENDER. THERE IS ENGORGEMENT OF THE PULMONARY THE FACT THAT SHE HAS AVASCULARITY. BRACE ON HER LEFT LOWER EXTREMITY. THERE DID NOT IMPRESSION: APPEAR TO BE SIGNIFICANT PERIPHERAL EDEMA. NEUROLOGIC: SHE CLEARLY HAD 1. CONGESTIVE HEART FAILUREHEMIPARESIS WITH CARDIOMEGALY AND SMALL BILATERAL 250 RESIDUAL FROM HER PREVIOUS STROKE, PLEURAL BUT SHE WAS AWAKE AND ALERT EFFUSIONS. AND ANSWERING QUESTIONS APPROPRIATELY. 2. INCREASING OPACITY AT THE LEFT LUNG BASE LIKELY REPRESENTING PASSIVE ATELECTASIS. heart failure diabetes ……………… …. …………………. ……………….. 3 Look……………. up ICD-9 ……….. ………… ………. codes ……… …….. ……. 429 SCIP ... Statistics reimbursement ... ... ... ... ... ... ... ... SIEMENS Insurance 21 21/38 Machine Learning / Data Mining Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information The ultimate goal of machine learning is the creation and understanding of machine intelligence The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and making decisions from a set of data. 22 Traditional Topics in Data Mining /AI Fuzzy set and fuzzy logic – Fuzzy if-then rules Evolutionary computation – Genetic algorithms – Evolutionary strategies Artificial neural networks – Back propagation network (supervised learning) – Self-organization network (unsupervised learning, will not be covered) 23 Next Class Continue with data mining topics Review of some basics of linear algebra and probability 24 Last Class Described the syllabus of this course Talked about HuskyCT website (Illustration) Briefly introduce 3 medical informatics topics – Medical images: cardiac echo view recognition – Numerical: Trauma patient care – Free text: ICD-9 diagnostic coding Introduce a little bit about definition of data mining, machine learning, statistical learning theory. 25 Challenges in traditional techniques Lack theoretical analysis about the behavior of the algorithms Traditional Techniques may be unsuitable due to Statistics/ Machine Learning/ – Enormity of data AI Pattern Recognition – High dimensionality of data Soft Computing – Heterogeneous, distributed nature of data 26 Recent Topics in Data Mining Supervised learning such as classification and regression – Support vector machines – Regularized least squares – Fisher discriminant analysis (LDA) – Graphical models (Bayesian nets) – others Draw from Machine Learning domains 27 Recent Topics in Data Mining Unsupervised learning such as clustering – K-means – Gaussian mixture models – Hierarchical clustering – Graph based clustering (spectral clustering) Dimension reduction – Feature selection – Compact feature space into low-dimensional space (principal component analysis) 28 Statistical Behavior Many perspectives to analyze how the algorithm handles uncertainty Simple examples: – Consistency analysis – Learning bounds (upper bound on test error of the constructed model or solution) “Statistical” not “deterministic” – With probability p, the upper bound holds P( > p) <= Upper_bound 29 Tasks may be in Data Mining Prediction tasks (supervised problem) – Use some variables to predict unknown or future values of other variables. Description tasks (unsupervised problem) – Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996 30 Problems in Data Mining Inference Classification [Predictive] Regression [Predictive] Clustering [Descriptive] Deviation Detection [Predictive] 31 Classification: Definition Given a collection of examples (training set ) – Each example contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen examples should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 32 Classification Example Tid Refund Marital Status Taxable Income Cheat Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No No Single 75K ? 2 No Married 100K No Yes Married 50K ? 3 No Single 70K No No Married 150K ? 4 Yes Married 120K No Yes Divorced 90K ? 5 No Divorced 95K Yes No Single 40K ? 6 No Married No No Married 80K ? 60K 10 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 10 No Single 90K Yes Training Set Learn Classifier Test Set Model 33 Classification: Application 1 High Risky Patient Detection – Goal: Predict if a patient will suffer major complication after a surgery procedure – Approach: Use patients vital signs before and after surgical operation. – Heart Rate, Respiratory Rate, etc. Monitor patients by expert medical professionals to label which patient has complication, which has not. Learn a model for the class of the after-surgery risk. Use this model to detect potential high-risk patients for a particular surgical procedure 34 Classification: Application 2 Face recognition – Goal: Predict the identity of a face image – Approach: Align all images to derive the features Model the class (identity) based on these features 35 Classification: Application 3 Cancer Detection – Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data – Approach: Use expression levels of all genes as the features Label each example as cancer or normal Learn a model for the class of all samples 36 Classification: Application 4 Alzheimer's Disease Detection – Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET – Approach: Extract features from neuroimages Label each example as AD or Reduced gray matter volume (colored normal areas) detected by MRI voxel-based Learn a model for the class of morphometry in AD patients compared to normal healthy controls. all samples 37 Regression Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields. Examples: – Predicting sales amounts of new product based on advertising expenditure. – Predicting wind velocities as a function of temperature, humidity, air pressure, etc. – Time series prediction of stock market indices. 38 Classification algorithms K-Nearest-Neighbor classifiers Naïve Bayes classifier Neural Networks Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Decision Tree Logistic Regression Graphical models 39 Clustering Definition Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that – Data points in one cluster are more similar to one another. – Data points in separate clusters are less similar to one another. Similarity Measures: – Euclidean Distance if attributes are continuous. – Other Problem-specific Measures 40 Illustrating Clustering Euclidean Distance Based Clustering in 3-D space. Intracluster distances are minimized Intercluster distances are maximized 41 Clustering: Application 1 High Risky Patient Detection – Goal: Predict if a patient will suffer major complication after a surgery procedure – Approach: Use patients vital signs before and after surgical operation. – Heart Rate, Respiratory Rate, etc. Find patients whose symptoms are dissimilar from most of other patients. 42 Clustering: Application 2 Document Clustering: – Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. – Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. – Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. 43 Illustrating Document Clustering Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in these documents (after some word filtering). Category Total Articles Correctly Placed 555 364 Foreign 341 260 National 273 36 Metro 943 746 Sports 738 573 Entertainment 354 278 Financial 44 Clustering algorithms K-Means Hierarchical clustering Graph based clustering (Spectral clustering) Semi-supervised clustering Others 45 Basics of probability An experiment (random variable) is a welldefined process with observable outcomes. The set or collection of all outcomes of an experiment is called the sample space, S. An event E is any subset of outcomes from S. Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S. 46 Probability Theory Apples and Oranges X: identity of the fruit Y: identity of the box Assume P(Y=r) = 40%, P(Y=b) = 60% (prior) P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% Marginal P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% P(X=a) = 11/20, P(X=o) = 9/20 Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3 47 Probability Theory Marginal Probability Conditional Probability Joint Probability 48 Probability Theory • Product Rule Sum Rule The marginal prob of X equals the sum of the joint prob of x and y with respect to y The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X 49 Illustration p(X,Y) p(Y) Y=2 Y=1 p(X) p(X|Y=1) 50 The Rules of Probability Sum Rule Product Rule = p(X|Y)p(Y) Bayes’ Rule posterior likelihood × prior 51 Mean and Variance The mean of a random variable X is the average value X takes. The variance of X is a measure of how dispersed the values that X takes are. The standard deviation is simply the square root of the variance. 52 Simple Example X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 Mean – 0.8 X 1 + 0.2 X 2 = 1.2 Variance – 0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2) X (2-1.2) 53 References SC_prob_basics1.pdf (necessary) SC_prob_basic2.pdf Loaded to HuskyCT 54 Basics of Linear Algebra 55 Matrix Multiplication The product of two matrices Special case: vector-vector product, matrix-vector product A B C 56 Matrix Multiplication 57 Rules of Matrix Multiplication B A C 58 Orthogonal Matrix 1 1 .. . 1 59 Square Matrix – EigenValue, EigenVector ( , x) is an eigen pair of A, if and only if Ax x. is the eigenvalue x is the eigenvecto r where 60 Symmetric Matrix – EigenValue EigenVector A is symmetric, if A AT eigen-decomposition of A . A nn is symmetric and positive semi -definite, if xT Ax 0, for any x n . i 0, i 1,, n A nn is symmetric and positive definite, if xT Ax 0, for any nonzero x n . i 0, i 1,, n 61 Matrix Norms and Trace Frobenius norm 62 Singular Value Decomposition orthogonal diagonal orthogonal 63 References SC_linearAlg_basics.pdf (necessary) SVD_basics.pdf loaded to HuskyCT 64 Summary This is the end of the FIRST chapter of this course Next Class Cluster analysis – General topics – K-means Slides after this one are backup slides, you can also check them to learn more 65 Neural Networks Motivated by biological brain neuron model introduced by McCulloch and Pitts in 1943 A neural network consists of Nodes (mimic neurons) Links between nodes (pass message around, represent causal relationship) All parts of NN are adaptive (modifiable parameters) Learning rules specify these parameters to finalize the NN Dendrite soma Node of Ranvier Axon terminal Axon Schwann cell Nucleus Myelin Sheath 66 Illustration of NN w11 Activation function x1 w12 y x2 67 Many Types of NN Adaptive NN Single-layer NN (perceptrons) Multi-layer NN Self-organizing NN Different activation functions Types of problems: – Supervised learning – Unsupervised learning 68 Classification: Addiitonal Application Sky Survey Cataloging – Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory). – 3000 images with 23,040 x 23,040 pixels per image. – Approach: Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find! From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996 69 Classifying Galaxies Courtesy: http://aps.umn.edu Early Class: • Stages of Formation Attributes: • Image features, • Characteristics of light waves received, etc. Intermediate Late Data Size: • 72 million stars, 20 million galaxies • Object Catalog: 9 GB • Image Database: 150 GB 70 Challenges of Data Mining Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation 71 Application of Prob Rules Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) =0.25*0.4 + 0.75*0.6 = 11/20 P(X=o) = 9/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3 72 The Gaussian Distribution 73 Gaussian Mean and Variance 74 The Multivariate Gaussian y x 75