Uploaded by DR. Ahmad Farouk

drug-repurposing-2

advertisement
Computational Drug Repositioning by
Machine Learning from EHR Data
David Page
Center for Predictive Computational Phenotyping
University of Wisconsin-Madison
EHR Data in a
Relational Data Warehouse
Patient ID
P1
Gender
M
Birthdate
3/22/1963
Patient ID
P1
P1
Date
1/1/2001
2/1/2001
Physician
Smith
Jones
Symptoms
palpitations
fever, aches
Patient ID
P1
P1
Date
1/1/2001
1/9/2001
Lab Test
blood glucose
blood glucose
Result
42
45
Patient ID
P1
P2
Date
1/1/2001
1/9/2001
Observation
Height
BMI
Result
5'11
34.5
Patient ID
P1
Date
Prescribed
5/17/1998
Date Filled
5/18/1998
Physician
Jones
Diagnosis
hypoglycemic
influenza
Medication
Prilosec
Dose
10mg
Duration
3 months
Alternative View of Patient Data:
Irregularly-Sampled Time Series
But Most ML Algorithms Expect:
• Single Table (Spreadsheet), or
• Regularly-Sampled Time Series
• Another Challenge: Most ML algorithms aim
at accurate prediction, not causal discovery,
and many potential confounders are
unmeasured, and possibly not even considered
High-Throughput ML (Kleiman, Bennett, et al., 2016)
Predicting Every ICD Diagnosis Code at the Press of a Button
Extending SCCS to Numerical Response (Kuang et al.)
Electronic Health Records (EHRs)
Properties
Person 1
Longitudinal
200
160
150
Diabetes
Drug 1
Observational
120
Drug 2
30
60
Age
Adverse Drug
Person 2
Reaction (ADR)
discovery
Bleeding
90
104
Applications
100
Drug 2
Age
Computational
Drug
Repositioning
(CDR)
A Critical Intuition: Underlying Baseline
Person 1
Diabetes
Drug 2
30
60
Age
Person 2
Bleeding
30
60
Age
Baseline: Blood sugar level under no influence of any drugs.
Fixed Effect Model
Person 1
Diabetes
Insulin
30
Insulin
45
60
Age
Fixed Effect Model (Frees, 2004):
yij | xij
dimβ =# drugs
⊤
= αi + β xij + ϵij,
ϵij ∼ N(0,σ2).
Time-Varying Baseline
Person 1
Diabetes
Insulin
30
Insulin
45
60
Age
Time-Varying Baseline, add regularization to minimize change
in proximal, consecutive tij values:
⊤
yij | xij = tij + β xij + ϵij,
ϵij ∼ N(0,σ2).
Time-Varying Baseline
Person 1
Diabetes
Insulin
30
Insulin
45
60
Age
Time-Varying Baseline: add regularization to minimize change
in proximal, consecutive tij values:
⊤
yij | xij = tij + β xij + ϵij,
ϵij ∼ N(0,σ2).
Last term penalizes consecutive baseline values that are close
together in time but far apart in numerical value.
Ground Truth for Drugs that are Glucose Lowering
1.00
1.00
Methods
ABR
CSCCSA
0.75
BR
CSCCS
0.75
0.50
0.25
0.00
0.50
Methods
ABR
CSCCSA
BR
CSCCS
8
0.25
16
24
K
32
40
0.00
0.25
0.5
0.75
1
FPR Threshold
Figure: Left: Precision at K among the top-forty drugs generated by the four
models; Right: Partial AUCs on the top-forty drugs generated by the four models.
Sample size: 219306.
Number of drug candidates: 2980.
Recovery of Known Glucose Lowering Agents
INDX CODE
1
4485
2
7470
3
8437
4
4837
5
6382
6
4171
7
4106
8
160
9
824
10
9152
11
4132
12
4184
13
4170
14
4208
15
416
16
4107
17
844
18
2830
19
4806
20
5787
21
2824
22
5786
23
7731
24
1760
25
4497
26
9889
27
4813
28
4133
29
6445
30
6656
31
9379
32
1636
33
1218
34
8025
35
8316
36
9136
37
4802
38
7674
39
4804
40
1200
DRUG NAME
HUMALOG
PIOGLITAZONE HCL
ROSIGLITAZONE MALEATE
INSULN ASP PRT/INSULIN ASPART
NEEDLES INSULIN DISPOSABLE
GLUCOTROL XL
GLIMEPIRIDE
ACTOS
AVANDIA
SYRING W-NDL DISP INSUL 0.5ML
GLUCOPHAGE
GLYBURIDE
GLUCOTROL
GLYNASE
AMARYL
GLIPIZIDE
AXID
DILTIAZEM
INSULIN GLARGINE HUM.REC.ANLOG
METFORMIN HCL
DILAUDID
METFORMIN
PRAVACHOL
CELEXA
HUM INSULIN NPH/REG INSULIN HM
URSODIOL
INSULIN NPL/INSULIN LISPRO
GLUCOPHAGE XR
NEURONTIN
NPH HUMAN INSULIN ISOPHANE
THIAMINE HCL
CARDURA
BLOOD SUGAR DIAGNOSTIC DRUM
PROZAC
REZULIN
SYRINGE & NEEDLE INSULIN 1 ML
INSULIN
POTASSIUM CHLORIDE
INSULIN ASPART
BLOOD-GLUCOSE METER
SCORE COUNT
-11.786
124
-10.220 3075
-9.731
1019
-9.658
258
-9.464
2827
-8.117
2853
-7.940
3384
-7.721
1125
-6.802
1239
-6.623
4186
-6.322
6736
-6.021
8879
-5.721
1259
-5.670
591
-5.599
2240
-5.563
9993
-4.682
189
-4.297
1021
-4.175
4213
-4.147 19584
-4.076
39
-3.890
3838
-3.532
1700
-3.517
1473
-3.501
1829
-3.132
376
-2.972
623
-2.845
765
-2.615
1418
-2.500
2874
-2.383
341
-2.198
1079
-2.073
2593
-2.037
1525
-1.895
444
-1.885
3542
-1.812
1526
-1.779
9842
-1.752
2476
-1.719
5289
INDX CODE
1
4802
2
8316
3
824
4
416
5
5226
6
5789
7
4485
8
4132
9
4811
10
144
11
1389
12
4171
13
9155
14
4116
15
6652
16
160
17
6646
18
4813
19
8437
20
4170
21
9889
22
5052
23
4118
24
2521
25
7470
26
5786
27
10366
28
4500
29
4172
30
7471
31
6382
32
4184
33
4208
34
4210
35
4163
36
5977
37
7946
38
1602
39
1305
40
6657
DRUG NAME
INSULIN
REZULIN
AVANDIA
AMARYL
LANTUS
METFORMIN HYDROCHLORIDE
HUMALOG
GLUCOPHAGE
INSULIN NPH
ACTIGALL
CAL
GLUCOTROL XL
SYRNG W-NDL DISP INSUL 0.333ML
GLUCAGON
NOVOLOG
ACTOS
NOVOFINE 31
INSULIN NPL/INSULIN LISPRO
ROSIGLITAZONE MALEATE
GLUCOTROL
URSODIOL
KAY CIEL
GLUCAGON HUMAN RECOMBINANT
DARVOCET-N
PIOGLITAZONE HCL
METFORMIN
ZINC SULFATE
HUMULIN
GLUCOVANCE
PIOGLITAZONE HCL/METFORMIN HCL
NEEDLES INSULIN DISPOSABLE
GLYBURIDE
GLYNASE
GLYSET
GLUCOSE
MINIPRESS
PROPANTHELINE
CARBOCAINE
BUDEPRION SR
NPH INSULIN
SCORE COUNT
47
635
50
120
59
449
65
503
66
33
75
10
81
63
86
1813
88
19
90
34
90
45
90
701
95
29
97
121
98
51
106
480
106
31
108
118
109
332
113
641
113
123
114
23
115
227
116
11
121
705
125
2149
130
34
135
33
136
115
136
16
137
649
144
1354
145
115
148
7
159
1778
163
19
163
6
182
63
185
9
185
43
Conclusion
• Impossible to perfectly infer causation from
purely observational data such as EHR
• But real applications are often about causation
• Empirical results suggest can do this well for
some drugs and some frequent lab
measurements
• Many other potential applications
– Treatment response and adverse drug events
– Other biological tasks
– Other causal discovery tasks
Thanks!
•
•
•
•
•
•
•
NIH BD2K Program
NLM, NIGMS, NIAID
Charles Kuang
Peggy Peissig
Michael Caldwell
James Thomson
Ron Stewart
Download