© Cassa, Christopher A. Page 1 Cassa, Christopher A. Page 2 Abstract Cassa, Christopher A. Page 3 Cassa, Christopher A. Page 4 Table of Contents Cassa, Christopher A. Page 5 Cassa, Christopher A. Page 6 Cassa, Christopher A. Page 7 Cassa, Christopher A. Page 8 Biographical Note Cassa, Christopher A. Page 9 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 11 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 12 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Acknowledgments Cassa, Christopher A. Page 13 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 14 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance List of Figures ≥ Cassa, Christopher A. Page 15 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≥ ≤ Σ ≤ ≤ Cassa, Christopher A. Page 16 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ο¬ Cassa, Christopher A. Page 17 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 18 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance List of Tables Λ Cassa, Christopher A. Page 19 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 20 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Chapter I: Introduction & Background Introduction Cassa, Christopher A. Page 21 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 22 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Ethical, Legal, and Social Implications (ELSI) of Personalized Medicine Cassa, Christopher A. Page 23 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 24 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 25 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance . Personalized Medicine and Personally Controlled Health Records Cassa, Christopher A. Page 26 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 27 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Human Variation Data Sources and Information Content Cassa, Christopher A. Page 28 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≤ Cassa, Christopher A. Page 29 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 30 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Measuring Risk of Identity Linkage using Genomic Data Cassa, Christopher A. Page 31 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 32 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 33 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance μ ≤ π′ π =1 ππ ≤ 0.689π Cassa, Christopher A. ′ Page 34 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Attempted Interventions to Protect Genomic Privacy Using Binning to Maintain Confidentiality of Medical Data Cassa, Christopher A. Page 35 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Disclose Frequencies and Aggregated Data Only Anonymity by Pool Selection Cassa, Christopher A. Page 36 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Use of Generalization Lattices Cassa, Christopher A. Page 37 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 38 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Add Noise to a Genotypic Sequence Cassa, Christopher A. Page 39 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Synthesizing anonymized ‘individuals’ using statistical data associations Cassa, Christopher A. Page 40 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Quantitative genomic disclosure risk models for patients and relatives Cassa, Christopher A. Page 41 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 42 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Geographical Data Privacy in Public Health and Clinical Practice Cassa, Christopher A. Page 43 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 44 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Anonymization of spatial data for disease surveillance Cassa, Christopher A. Page 45 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 46 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusion Cassa, Christopher A. Page 47 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 48 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Chapter II: Genomic privacy: identifiability and familial risks Cassa, Christopher A. Page 49 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Ability to infer SNP genotypes from sibling genomic data Abstract ≤ Background Cassa, Christopher A. Page 50 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 51 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Methods Cassa, Christopher A. Page 52 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Enhanced ability to infer sibling genotypes 9 π πππ2 π΄π΄ πππ1 π΄π΄ = π πππ2 π΄π΄ ππππππ‘ππ ππππ. π π(ππππππ‘ππ ππππ. π| πππ1 π΄π΄) π=1 9 = π =1 π πππ2 π΄π΄ ∩ ππππππ‘ππ ππππ. π π ππππππ‘ππ ππππ. π Cassa, Christopher A. π ππππππ‘ππ ππππ. π πππ1 π΄π΄) Page 53 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 4 = π =1 = π(πππ2 π΄π΄ ∩ ππππππ‘ππ ππππ. π) π(ππππππ‘ππ ππππ. π| πππ1 π΄π΄) π(ππππππ‘ππ ππππ. π) π πππ2 π΄π΄ ∩ π΄π΄π π΄π΄πΉ π π΄π΄π π΄π΄πΉ + + π π΄π΄π π΄π΄πΉ πππ1 π΄π΄ π πππ2 π΄π΄ ∩ π΄π΄π π΄ππΉ π π΄π΄π π΄ππΉ π πππ2 π΄π΄ ∩ π΄ππ π΄π΄πΉ π π΄ππ π΄π΄πΉ = 1 π2 + π π΄π΄π π΄ππΉ πππ1 π΄π΄ π(π΄ππ π΄π΄πΉ |πππ1 π΄π΄) + π πππ2 π΄π΄ ∩ π΄ππ π΄ππΉ π π΄ππ π΄ππΉ 1 2 ππ + 1 2 ππ + 1 4 π(π΄ππ π΄ππΉ |πππ1 π΄π΄) π2 π2 = π + ππ + 4 2 = π 2 +ππ + π2 4 Cassa, Christopher A. Page 54 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 55 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Measuring the information content of Sibling genotype data Cassa, Christopher A. Page 56 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ΛπΌππ 1 ,πΌππ 2 πππππ‘π¦πππ πΌππ1 π΄π΄ ΛπΌππ 1 ,πΌππ 2 πππππ‘π¦πππ = = = = ≅ π πΌππ2 πππππ‘π¦ππ πΌππ1 πππππ‘π¦ππ ∩ π πππππππ π πΌππ2 πππππ‘π¦ππ πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ π πππ2 πππππ‘π¦ππ πππ1 πππππ‘π¦ππ π(πΌππ2 πππππ‘π¦ππ) ∩ π πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ ) π(πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ) 9 π =1 9 π =1 9 π =1 π πππ2 πππππ‘π¦ππ ππππππ‘ππ ππππ. π π(ππππππ‘ππ ππππ. π| πππ1 πππππ‘π¦ππ) π(πΌππ2 πππππ‘π¦ππ) ∩ π πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ ) π(πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ) π(πππ2 πππππ‘π¦ππ ∩ ππππππ‘ππ ππππ. π) π(ππππππ‘ππ ππππ. π| πππ1 πππππ‘π¦ππ) π(ππππππ‘ππ ππππ. π) 1 π(πΌππ2 πππππ‘π¦ππ) β π πΌππ1 πππππ‘π¦ππ β 1 − π 1 π πΌππ1 πππππ‘π¦ππ β 1 − π π(πππ2 πππππ‘π¦ππ ∩ ππππππ‘ππ ππππ. π) π(ππππππ‘ππ ππππ. π| πππ1 πππππ‘π¦ππ) π(ππππππ‘ππ ππππ. π) π(πΌππ2 πππππ‘π¦ππ) Cassa, Christopher A. Page 57 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 58 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance log(ΛInd1, Ind2Aa) vs. MAF 2 1 Ind1AA, Ind2Aa 0.5 Ind1Aa, Ind2Aa 0 -0.5 Ind1aa, Ind2Aa 0.01 0.1 0.19 0.28 0.37 0.46 0.55 0.64 0.73 0.82 0.91 log(Λ Sib1, Sib2Aa) 1.5 -1 -1.5 log(ΛInd1, Ind2AA) vs. MAF 4 2 Ind1AA, Ind2AA 0 Ind1Aa, Ind2AA -2 0.01 0.09 0.17 0.25 0.33 0.41 0.49 0.57 0.65 0.73 0.81 0.89 0.97 log(Λ Sib1, Sib2AA) 6 Ind1aa, Ind2AA -4 -6 Cassa, Christopher A. Page 59 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance log(ΛInd1, Ind2aa) vs. MAF 6 2 Ind1AA, Ind2aa 0 Ind1Aa, Ind2aa -2 0.01 0.09 0.17 0.25 0.33 0.41 0.49 0.57 0.65 0.73 0.81 0.89 0.97 log(Λ Sib1, Sib2Aa) 4 Ind1aa, Ind2aa -4 -6 Λ Ind 1 ,Ind 2 genotypes Λ Sib2 AA Aa aa AA Aa aa AA Aa aa Cassa, Christopher A. Sib1 AA AA AA Aa Aa Aa aa aa aa Maximizing MAF 0.01 0.01 0.01 0.99 0.01, 0.99 0.01 0.99 0.99 0.99 Log(Λ Ind1,Ind2 genotypes) 3.407 3.699 3.389 1.396 1.407 1.396 3.389 3.699 3.407 Page 60 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Confirming sib-ship with two non-matching sets of SNP genotypes π π πππ πππ‘ππ ππ‘ π ππππ = = π πππ‘ππ ππ‘ π ππππ π πππ π π πππ π πππ‘ππ ππ‘ π ππππ π πππ π π πππ + π πππ‘ππ ππ‘ π ππππ ! π πππ π !π πππ π πππ‘π π΄π΄ π πππ + π πππ‘π π΄π π πππ + π πππ‘π ππ π πππ π πππ‘π π΄π΄ π πππ + π πππ‘π π΄π π πππ + π πππ‘π ππ π πππ Cassa, Christopher A. π π 1 π 1 π + π πππ‘ππ|! π πππ π Page 61 1 1−π Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance [a] p(sib | match at M independent SNPs) vs. Minor Allele Frequency (N=100,000) p(sib|match at M independent SNPs) 1 0.9 0.8 0.7 0.9-1 0.6 0.8-0.9 0.5 0.7-0.8 0.4 0.6-0.7 0.3 0.5-0.6 0.2 0.4-0.5 0.1 0.3-0.4 Cassa, Christopher A. 0.2-0.3 0 0.04 0.08 0.12 0.1-0.2 0.16 0.48 0.44 0.4 0.36 0.32 0.28 0.24 0.2 0 1 120 80 100 60 20 40 140 0-0.1 Page 62 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 1 0.9 0.8 0.7 0.9-1 0.6 0.8-0.9 0.5 0.7-0.8 0.4 0.6-0.7 0.3 0.5-0.6 0.2 0.4-0.5 0 0.3-0.4 140 120 100 80 60 40 20 1 0 0.05 0.1 0.15 0.2 0.25 0.2-0.3 0.3 0.35 0.1 0.5 0.45 0.4 p(sib|match at M independent SNPs) [b] p(sib | match at M independent SNPs) vs. Minor Allele Frequency (N=10,000,000) 0.1-0.2 0-0.1 M Matched Independent SNPs Cassa, Christopher A. Page 63 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 1 0.9 0.8 0.7 0.9-1 0.6 0.8-0.9 0.5 0.7-0.8 0.4 0.6-0.7 0.3 0.5-0.6 0.2 0.4-0.5 0 0.3-0.4 140 120 100 80 60 40 20 1 0 0.05 0.1 0.15 0.2 0.25 0.2-0.3 0.3 0.35 0.1 0.5 0.45 0.4 p(sib|match at M independent SNPs) [c] p(sib | match at M independent SNPs) vs. Minor Allele Frequency (N=6,000,000,000) 0.1-0.2 0-0.1 M Matched Independent SNPs Cassa, Christopher A. Page 64 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance N=100,000 N=10,000,000 N=6,000,000,000 Cassa, Christopher A. Page 65 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Modeling a series of SNP inferences using a binomial distribution π π, π, π = π π π (1 − π)π−π π πΉ π; π, π = π π ≤ π = Cassa, Christopher A. π π =0 π π π−π π π (1 − π) ( Page 66 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance HapMap CEPH and global population SNP genotypes and allele frequency data Cassa, Christopher A. Page 67 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Validating the sibling genotype probability vector using parental genotypic data Error reduction calculation Cassa, Christopher A. Page 68 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 69 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance [a] Inference Error Reduction vs. Minor Allele Frequency Percentage Inference Error Reduction 0.6 0.5 0.4 0.3 0.2 0.1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 0 Minor Allele Frequency Avg. Sib1AA Avg. Sib1Aa Avg. Sib1aa Scoring metric for calculating correct fraction of inferences Cassa, Christopher A. Page 70 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Results Validation of SNP genotype inference using HapMap trio data Cassa, Christopher A. Page 71 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 72 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≥ ≤ ≥ Cassa, Christopher A. ≤ Page 73 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Deriving propensity to disease from sibling SNP data Γπππ 2 πππππ‘π¦ππ = = |πππ 1πππππ‘π¦ππ probability with sibling knowledge probability without sibling knowledge π πππ2 πππππ‘π¦ππ πππ1 πππππ‘π¦ππ π πππ2 πππππ‘π¦ππ 9 π =1 π(πππ2 πππππ‘π¦ππ ∩ ππππππ‘ππ ππππ. π) π(ππππππ‘ππ ππππ. π| πππ1 πππππ‘π¦ππ) π(ππππππ‘ππ ππππ. π) π πππ2 πππππ‘π¦ππ Γπ΄π |πππ 1ππ = π πππ2 π΄π πππ1 ππ π(πππ2 π΄π) = p2 + pq 2pq = p + (1 − p) 2(1 − p) = = 1− p 2 − 2p Cassa, Christopher A. Page 74 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Γππ |πππ 1 ππ = π πππ2 ππ πππ1 ππ π(πππ2 ππ) = p2 + pq + q2 q2 = p2 + p 1 − p + (1 − p)2 (1 − p)2 Γππ |πππ 1 ππ Cassa, Christopher A. Page 75 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance β Discussion Cassa, Christopher A. Page 76 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≈ Cassa, Christopher A. Page 77 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusions Cassa, Christopher A. Page 78 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 79 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 80 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Ability to infer SNP genotypes from parental or child data π πΆππππ π΄π΄ πππ‘πππ π΄π΄) = 3 = π =1 = π(πΆππππ π΄π΄ ∩ πππ‘πππ π΄π΄) π(πππ‘πππ π΄π΄) π((πΆππππ π΄π΄ ∩ πππ‘πππ π΄π΄)|πππ‘πππππ πππππ‘π¦ππ π)π(πππ‘πππππ πππππ‘π¦ππ π) π πππ‘πππ π΄π΄ π (π΄π΄πΆ ∩ π΄π΄π |π΄π΄πΉ )π(π΄π΄πΉ ) π (π΄π΄πΆ ∩ π΄π΄π |π΄ππΉ )π(π΄ππΉ ) + π π΄π΄π π π΄π΄π + 1 1 π4 + 2 = π (π΄π΄πΆ ∩ π΄π΄π |πππΉ )π(πππΉ ) π π΄π΄π 2π3 π + 0 (π2 π 2 ) π2 = π 2 + ππ = π 2 +ππ . Cassa, Christopher A. Page 81 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π πππ‘πππ π΄π΄ πΆππππ π΄π΄ = π πππ‘πππ π΄π΄ ∩ πΆππππ π΄π΄ π πΆππππ π΄π΄ = π πππ‘πππ π΄π΄ ∩ πΆππππ π΄π΄ π πππ‘πππ π΄π΄ = π πππ‘πππ π΄π΄ ∩ πΆππππ π΄π΄ π πΆππππ π΄π΄ = π(πΆππππ π΄π΄|πππ‘πππ π΄π΄) Cassa, Christopher A. Page 82 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Child AA Aa Aa AA Aa Aa AA Aa Aa Known Parent AA AA AA Aa Aa Aa aa aa aa Prior Prob. p2 2pq q2 p2 2pq q2 p2 2pq q2 Posterior Prob. p2 + pq pq + q2 0 2 ½p +½pq ½p2+pq+½q2 ½pq+½q2 0 2 p + pq pq + q2 Error Reduction |p2 – [p2 + pq]| |2pq – [pq + q2]| |q2| |p2 – [½p2+½pq]| |2pq – [½p2+pq+½q2]| |q2-[½pq+½q2]| |p2| |2pq – [p2 + pq]| |q2-[pq + q2]| Likelihood ratio test statistic for paternity and information content Cassa, Christopher A. Page 83 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ΛπΌππ 1 ,πΌππ 2 πππππ‘π¦πππ ΛπΌππ 1 ,πΌππ 2 πππππ‘π¦πππ = = = = ≅ π πΌππ2 πππππ‘π¦ππ πΌππ1 πππππ‘π¦ππ ∩ πππ‘πππππ πππππ‘ππππ πππ π πΌππ2 πππππ‘π¦ππ πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ π πΆππππ π΄π΄ πππ‘πππ π΄π΄) π(πΌππ2 πππππ‘π¦ππ) ∩ π πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ ) π(πΌππ1 πππππ‘π¦ππ ∩ π’ππππππ‘ππ) π πππ‘πππ π΄π΄ ∩ πΆππππ π΄π΄ π πΆππππ π΄π΄ 1 π(πΌππ2 πππππ‘π¦ππ) β π πΌππ1 πππππ‘π¦ππ β 1 − π 1 π πΌππ1 πππππ‘π¦ππ β 1 − π 3 π =1 3 π =1 π((πΆππππ π΄π΄ ∩ πππ‘πππ π΄π΄)|πππ‘πππππ πππππ‘π¦ππ π)π(πππ‘πππππ πππππ‘π¦ππ π) π πππ‘πππ π΄π΄ 1 π(πΌππ2 πππππ‘π¦ππ) β π πΌππ1 πππππ‘π¦ππ β 1 − π 1 π πΌππ1 πππππ‘π¦ππ β 1 − π π((πΆππππ π΄π΄ ∩ πππ‘πππ π΄π΄)|πππ‘πππππ πππππ‘π¦ππ π)π(πππ‘πππππ πππππ‘π¦ππ π) π πππ‘πππ π΄π΄ π(πΌππ2 πππππ‘π¦ππ) Cassa, Christopher A. Page 84 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Deriving paternal and child propensity to disease from SNP data Γπππππ = = πππππ‘π¦ππ |πππ‘πππππ πππππ‘π¦ππ = probability with paternal knowledge probability without paternal knowledge (control) π πππππ πππππ‘π¦ππ πππ‘πππππ πππππ‘π¦ππ π πππππ πππππ‘π¦ππ 3 π =1 π(πππ‘πππππ πππππ‘π¦ππ ∩ ππππππ‘ππ ππππ. π) π(ππππππ‘ππ ππππ. π| πππππ πππππ‘π¦ππ) π(ππππππ‘ππ ππππ. π) π πππππ πππππ‘π¦ππ Cassa, Christopher A. Page 85 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 86 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Risk of re-identification analysis of mutation data Introduction De novo germline mutations Cassa, Christopher A. Page 87 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ππ = (πππππππ ,π‘π¦ππ ) β ππ π’π −π‘π¦ππ ο Mutation type and region-specific data sources Cassa, Christopher A. Page 88 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 89 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Wild type Guanine Thymine Adenine Cytosine G -1481 2947 1619 T 2228 -734 4785 A 7140 1045 -1376 C 2290 3609 1048 -- Total 11658 6135 4839 7780 Wild type Guanine Thymine Adenine Cytosine G -224 0 499 T 1009 -273 3178 A 1028 325 -727 C 0 0 0 -- Total 2037 549 339 4817 Wild type Guanine Thymine Adenine Cytosine G -*** 2947 *** T *** -*** 4785 A 7140 *** -*** C *** 3609 *** -- Total 7140 3609 2947 4785 Cassa, Christopher A. Page 90 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Probability of finding a match in rare mutation alleles ππ = π πππ‘ππ πππππ§π¦πππ‘π πππππ + π(πππ‘ππ πππ‘πππ§π¦πππ‘π) = ππ 2 2 + 2(ππ ) ππ ≅ 2(ππ ) 1 − ππ ≅ 2ππ −2ππ 2 2 2 2 +π +π ≅ (4ππ 2 ) + π Probability that two people are the same given a match at M mutant base pairs Cassa, Christopher A. Page 91 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π π πππ πππ‘ππ = = 1 1 π + 1 1 π π π=1 ππ π(πππ‘ππ|π πππ)π π πππ π(πππ‘ππ|π πππ)π π πππ + π πππ‘ππ ! π πππ π !π πππ 1 1 −π Likelihood of identifying an individual out of 10000 genotyped at that locus ο Cassa, Christopher A. Page 92 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance +π +π π(πππ‘ππ|π πππ)π π πππ π(πππ‘ππ|π πππ)π π πππ + π πππ‘ππ ! π πππ π !π πππ π π πππ πππ‘ππ = = = 1 π 1 1 1 π + π π=1 ππ 1 1 1 1 −π 1 104 1 + 1.54 β 10−14 104 1− 1 104 = 0.99999 ≅ 1 Cassa, Christopher A. Page 93 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π π = 1 − ! π π , ππππππ₯ππππ‘ππ ππ¦ 1 − π −(π π −1 ) 2(1 .54π₯ 10 14 ) π 10000 = 0.000000325; 14,599,883 ππππππ ππππ’ππππ πππ 50% ππππππ ππ π πππ‘ππ Cassa, Christopher A. Page 94 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Chapter III: Anonymization of data for transmission and disease surveillance A Context-Sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection Abstract Cassa, Christopher A. Page 95 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Introduction Background Cassa, Christopher A. Page 96 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 97 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Methods A. Overview Cassa, Christopher A. Page 98 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance B. Population-Density Based Gaussian Spatial Skew C. Census Block Groups Cassa, Christopher A. Page 99 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance D. Gaussian Randomization ο³ ο³ E. Anonymization Algorithm Cassa, Christopher A. Page 100 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Anonymization Multipliers and Factors: π΄ππ-_πππ ππ πππ. ππππ ππ‘π¦ ππ’ππ‘ππππππ = πππ‘ππ πππ. ππππ ππ‘π¦ ππ’ππ‘ππππππ = ππ£πππππ πππ_ππππ’π ππππ’πππ‘πππ ππππ ππ‘π¦ πππ‘ππππ‘′π πππππ ππππ’π πππ ππππ ππ‘π¦ ππ£πππππ π‘ππ‘ππ ππππ’πππ‘πππ ππππ ππ‘π¦ πππ‘ππππ‘′π πππππ ππππ’π ππππ’πππ‘πππ ππππ ππ‘π¦ Age Population Density vs. Total Population Density πΆπππππππ ππ’ππ‘ππππππ = π¨ππ πππππππππ β π΄ππ_πππ ππ ππππ’πππ‘πππ ππππ ππ‘π¦ ππ’ππ‘ππππππ + π − π¨ππ πππππππππ β (πππ‘ππ ππππ’πππ‘πππ ππππ ππ‘π¦ ππ’ππ‘ππππππ) Cassa, Christopher A. Page 101 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Overall Anonymization Parameter: ππ£πππππ ππ’ππ‘ππππππ = [π] β [ π΄ππ πππππππ‘ππ β π΄ππ_πππ ππ ππππ’πππ‘πππ ππππ ππ‘π¦ ππ’ππ‘ππππππ + 1 − π΄ππ πππππππ‘ππ β πππ‘ππ ππππ’πππ‘πππ ππππ ππ‘π¦ ππ’ππ‘ππππππ ] Cassa, Christopher A. Page 102 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance F. Test data sets Cassa, Christopher A. Page 103 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance G. Measuring Clustering Detection Performance ≤ Cassa, Christopher A. Page 104 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance H. Estimate of k-Anonymity ο³ ππ 2 Cassa, Christopher A. Page 105 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 106 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance I. Outlier Assessment and Percentage of Points Meeting Anonymity Thresholds J. Client Tool and GUI for Remote De-Identification of Data Cassa, Christopher A. Page 107 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Results A. Distribution of Location Skew Cassa, Christopher A. Page 108 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Distribution of Distances from Original Points [Average Distances Moved: 0.0587, 0.1168, 0.1762, 0.2354 km] % of Pts within a Spec. Dist. Interval 8.00% 7.00% 6.00% 5.00% 4.00% Avg. Distance 0.0587 Avg. Distance 0.1168 3.00% Avg. Distance 0.1762 2.00% Avg. Distance 0.2354 1.00% 0.00% B. Average Distance Moved vs. Estimate of k-anonymity Cassa, Christopher A. Page 109 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Average K-Anonymity Achieved vs. Average Distance Moved [km] 300 Average K-Anonymity 250 200 150 K-Anon vs. Avg. Distance Moved 100 50 0 Cassa, Christopher A. Page 110 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance C. Sensitivity of Spatial Clustering Detection Cassa, Christopher A. Page 111 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Avg. Cluster Sensitivity/Specificity vs. Avg. Distance to Original Point 0.96 Sensitivity & Specificity Values 0.94 0.92 0.9 Average Cluster Sensitivity 0.88 0.86 0.84 Average Cluster Specificity 0.82 0.8 0.78 0.76 D. Outlier Assessment and Percentage of Points Not Meeting Anonymity Thresholds Cassa, Christopher A. Page 112 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 113 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Percentage of Visits that Miss Specific kAnonymity Thresholds 120.00% % Below k-Anon Threshold 100.00% 80.00% 60.00% % K-Anon < 5 % K-Anon < 10 40.00% % K-Anon < 20 20.00% 0.00% Discussion Cassa, Christopher A. Page 114 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 115 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 116 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusion Cassa, Christopher A. Page 117 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 118 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Optimal discrete anonymization using linear programming techniques Abstract Background Cassa, Christopher A. Page 119 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 120 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 121 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 122 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance LP De-identification Cassa, Christopher A. Page 123 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ∈ ∈ ∈ 0 ≤ πππ πππ πππ π π π¨ πππ π π π© π πππ = 1 πππ πππ π π π¨ Cassa, Christopher A. Page 124 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ξ π πππ β ππ ≤ π π βπ π β ππ πππ΄ π β πππ πππ πππ π π π¨ πππ π π π© ≥ ξ π= πππ΄ ππ ∈ ξ ξ Cassa, Christopher A. Page 125 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ∈ ∈ πππ΄ πππ΅ π ππ βπ π βπππ π ξ ≤ Cassa, Christopher A. Page 126 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π ππ πΌπ ππ ≥ π ξ ∈ ξ Cassa, Christopher A. Page 127 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ∈ ∈ ξ ξ π πππ‘ππππ‘ π π = π ≤ π π ∈ π πππ‘ππππ‘ π ∩ π = πΏ(π) π = π π πππ‘ππππ‘ π|π = πΏ π ) β π(π = πΏ(π) π = π π πππ‘ππππ‘ π π = πΏ(π) = Cassa, Christopher A. 1 π πΏ (π ) Page 128 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π π π = πΏ(π) π = π ≤ ππΏ (π ) β π ∈ π π π = π π = π ≤ ππ β π ∈ ∈ ∩ π πππ β π(π = π) ≤ ππ β β π ∈ π∈π΄ πππ β π(π = π) ∈ π(π = π) = ππ π ∈ πππ΄ π πππ β ππ ≤ π π βπ π β ππ π ∈π΄ π ππ πππ πππ πππ π ∈ π΄ πππ π ∈ π΅ ∈ Cassa, Christopher A. ∈ Page 129 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ξ ξ Application Stringent De-identiο¬cation of Locations ≤ ≤ Cassa, Christopher A. Page 130 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≤ Cassa, Christopher A. Page 131 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Comparison to Aggregation Cassa, Christopher A. Page 132 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ≤ Cassa, Christopher A. Page 133 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cluster Detection Cassa, Christopher A. Page 134 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Effect of Underlying Population Density Cassa, Christopher A. Page 135 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance County name ρ∗ ξ/s¶ = 10E-1 LP method d † ξ/s = ξ/s = 10E-2 10E-3 Zip 5 ‡ ξ/s = 10E-4 Franklin County, MA 39.3 0.00 0.00 89.6 4736 Plymouth County, MA 276.2 0.00 0.00 62.0 1908 Middlesex County, MA 687.1 0.00 0.02 31.5 1105 New York County, NY 25846 0.00 0.08 4.3 172 ∗ρ = population density expressed in people per square kilometer †d = expected distance for strategy in meters ‡Zip 5 = aggregation by ο¬ve-digit zip code •Zip 3 = aggregation by ο¬rst three digits of zip code β£ξ = re-identiο¬cation probability, s = number of records in the data set ξ/s 2.80E03 5.50E04 6.50E04 1.10E03 Zip 3 § d (m) 2640 3123 1770 519 ξ/s 1.20E05 8.90E06 4.90E06 1.20E04 Discussion Cassa, Christopher A. Page 136 d (m) 13226 19115 10793 3866 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 137 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 138 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 139 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ξ Cassa, Christopher A. Page 140 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ξ ξ Cassa, Christopher A. − Page 141 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 142 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Chapter IV: Reverse Identification Potential of Authentic and Anonymized Geographical Data Cassa, Christopher A. Page 143 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Exploiting Repeatedly Non-deterministically Anonymized Spatial Data to Re-identify Individuals: A Vulnerability and Proposed Solutions Abstract Cassa, Christopher A. Page 144 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Background Cassa, Christopher A. Page 145 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 146 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 147 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Methods Geographical test data sets Cassa, Christopher A. Page 148 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ο¬ο¬ Cassa, Christopher A. Page 149 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Population-adjusted 2-dimensional Gaussian skew π₯0 π¦0 f X ( x) ο½ 1 2ο° ο³ x ο( x ο x0 ) 2 e 2ο³ x 2 , fY ( y) ο½ 1 2ο° ο³ y ο ( y ο y0 ) 2 e 2ο³ y 2 ο³ ο³ ο³ ο³ π₯0 π¦0 Re-Identification through averaging Cassa, Christopher A. Page 150 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Results Re-Identification of points anonymized using Gaussian and randomized skew Cassa, Christopher A. Page 151 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 152 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Discussion Re-Identification of data anonymized with Gaussian and randomized skew π₯0 π¦0 Cassa, Christopher A. Page 153 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance n ο₯L i ο½1 i n π₯0 π¦0 π2 π 0 0 π2 π ο³ ο³ n n π₯0 π¦0 ο¬ ο¬ ο ο¬ ο¬ π₯0 π¦0 ο¬2 3 0 0 ο¬2 3 Cassa, Christopher A. Page 154 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance n ο₯L i ο½1 i n π₯0 π¦0 ο¬2 3π 0 0 ο¬2 3π π₯0 π¦0 Cassa, Christopher A. Page 155 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Anonymizing within a distributed network or health information exchange Cassa, Christopher A. Page 156 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Removing other identifying information from data sets to avoid re-linking Cassa, Christopher A. Page 157 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 158 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Increasing anonymity using an algorithm based on a Markov process Cassa, Christopher A. Page 159 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusions Cassa, Christopher A. Page 160 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance An unsupervised classification method for inferring original case locations from low-resolution disease maps Preface Background Cassa, Christopher A. Page 161 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Methods Cassa, Christopher A. Page 162 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 163 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 164 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Results Cassa, Christopher A. Page 165 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 166 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 167 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Discussion Cassa, Christopher A. Page 168 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 169 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 170 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 171 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusions Cassa, Christopher A. Page 172 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 173 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 174 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Chapter V: Future Directions and Conclusions Disclosure Control Mechanisms that Incorporate Quantitative Estimates Cassa, Christopher A. Page 175 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Information Theoretic Approaches and Multi-Locus Measures Cassa, Christopher A. Page 176 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π» π =− π π₯ log π(π₯) π₯ ∈π Cassa, Christopher A. Page 177 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π» π, π = − π π₯, π¦ log π(π₯, π¦) π₯ ∈π π¦ ∈π Cassa, Christopher A. Page 178 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance πΏ ππ πΏ π₯ = π Cassa, Christopher A. Page 179 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance π» π =− π π₯ log π(π₯) π₯ ∈π Cassa, Christopher A. Page 180 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Geographical Anonymization and Privacy Cassa, Christopher A. Page 181 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Anonymization Type Standards and Meta-Data ο· ο· ο· Cassa, Christopher A. Page 182 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Availability of Anonymization Modules Development of a cryptographically secured anonymization web service Cassa, Christopher A. Page 183 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Improve anonymization toolkits to include secured upload, integrated geocoding, and interchangeable algorithm type ο· ο· ο· ο· Cassa, Christopher A. Page 184 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance ο· ο· Describing quantitative anonymity estimates to users and explaining how to set exclusion criteria from transmissions Constrained anonymization techniques Cassa, Christopher A. Page 185 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Mutli-Factor Authentication using Contents from Disparate EHRs Cassa, Christopher A. Page 186 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 187 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 188 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Cassa, Christopher A. Page 189 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance Conclusion Cassa, Christopher A. Page 190 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature 2001, 409:860-921. HapMap Sample Populations [http://hapmap.org/hapmappopulations.html.en ] Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P: A haplotype map of the human genome. Nature 2005, 437:1299-1320. Genome-Wide Association Studies [http://www.genome.gov/20019523] Benjamin EJ, Dupuis J, Larson MG, Lunetta KL, Booth SL, Govindaraju DR, Kathiresan S, Keaney JF, Jr., Keyes MJ, Lin JP, et al: Genome-wide association with select biomarker traits in the Framingham Heart Study. BMC Med Genet 2007, 8 Suppl 1:S11. Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005, 6:95-108. Morton NE: Into the post-HapMap era. Adv Genet 2008, 60:727-742. Klein TE, Chang JT, Cho MK, Easton KL, Fergerson R, Hewett M, Lin Z, Liu Y, Liu S, Oliver DE, et al: Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge Base. Pharmacogenomics J 2001, 1:167-170. Watson JD, Cook-Deegan RM: The human genome project and international health. JAMA 1990, 263:3322-3324. Collins F, Galas D: A new five-year plan for the U.S. Human Genome Project. Science 1993, 262:43-46. Baird PA: Genetics and health care: a paradigm shift. Perspect Biol Med 1990, 33:203-213. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate multiplex polony sequencing of an evolved bacterial genome. Science 2005, 309:1728-1732. Shendure J, Mitra RD, Varma C, Church GM: Advanced sequencing technologies: methods and goals. Nat Rev Genet 2004, 5:335-344. Beyond genetic testing: Personal DNA analysis and research for health, family, ancestry, and genealogy [https://www.23andme.com/] deCODEme - Empowering prevention. Calculate genetic risk for diseases, DNA research for personal and family health and ancestry [http://www.decodeme.com/] Navigencis - Your genes offer a road map to optimal health. [http://www.navigenics.com/] Church GM: The personal genome project. Mol Syst Biol 2005, 1:2005 0030. This time it's personal. Nature 2008, 453:697. ten Kate LP: Editorial. Community Genet 2001, 4:1. Wideroff L, Freedman AN, Olson L, Klabunde CN, Davis W, Srinath KP, Croyle RT, Ballard-Barbash R: Physician use of genetic testing for cancer susceptibility: results of a national survey. Cancer Epidemiol Biomarkers Prev 2003, 12:295303. Cassa, Christopher A. Page 191 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. ten Kate LP: Carrier screening for cystic fibrosis and other autosomal recessive diseases. Am J Hum Genet 1990, 47:359-361. ten Kate LP, Tijmstra T: Carrier screening for Tay-Sachs disease and cystic fibrosis. Lancet 1990, 335:1527-1528. Juengst ET: Can enhancement be distinguished from prevention in genetic medicine? J Med Philos 1997, 22:125-142. McGee G, Juengst ET: Genetic enhancement. Med Ethics 1999:6-7. Juengst ET, Binstock RH, Mehlman MJ, Post SG: Aging. Antiaging research and the need for public dialogue. Science 2003, 299:1323. Holden C: Genetic discrimination. Long-awaited genetic nondiscrimination bill headed for easy passage. Science 2007, 316:676. Hudson KL, Holohan MK, Collins FS: Keeping pace with the times--the Genetic Information Nondiscrimination Act of 2008. N Engl J Med 2008, 358:2661-2663. Bieber FR, Brenner CH, Lazer D: Human genetics. Finding criminals through DNA of their relatives. Science 2006, 312:1315-1316. Bieber FR, Lazer D: Guilt by association: should the law be able to use one person's DNA to carry out surveillance on their family? Not without a public debate. New Sci 2004, 184:20. Freedom of Information Act 5 USC 552 1996. Kohut K, Manno M, Gallinger S, Esplen MJ: Should healthcare providers have a duty to warn family members of individuals with an HNPCC-causing mutation? A survey of patients from the Ontario Familial Colon Cancer Registry. J Med Genet 2007, 44:404-407. Offit K, Groeger E, Turner S, Wadsworth EA, Weiser MA: The "duty to warn" a patient's family members about hereditary disease risks. Jama 2004, 292:14691473. Yoon PW, Chen B, Faucett A, Clyne M, Gwinn M, Lubin IM, Burke W, Khoury MJ: Public health impact of genetic tests at the end of the 20th century. Genet Med 2001, 3:405-410. Hall WD, Morley KI, Lucke JC: The prediction of disease risk in genomic medicine. EMBO Rep 2004, 5 Spec No:S22-26. Lin K, Lipsitz R, Miller T, Janakiraman S: Benefits and harms of prostate-specific antigen screening for prostate cancer: an evidence update for the U.S. Preventive Services Task Force. Ann Intern Med 2008, 149:192-199. Screening for prostate cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2008, 149:185-191. Summaries for patients. Screening for prostate cancer with prostate-specific antigen testing: U.S. Preventive Services Task Force recommendations. Ann Intern Med 2008, 149:I37. Kruer M: The incidentalome. JAMA 2006, 296:2801; author reply 2801-2802. Wolf SM, Kahn JP, Lawrenz FP, Nelson CA: The incidentalome. JAMA 2006, 296:2800-2801; author reply 2801-2802. Kohane IS, Masys DR, Altman RB: The incidentalome: a threat to genomic medicine. JAMA 2006, 296:212-215. Cassa, Christopher A. Page 192 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. Zondervan KT, Cardon LR: The complex interplay among factors that influence allelic association. Nat Rev Genet 2004, 5:89-100. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K: A comprehensive review of genetic association studies. Genet Med 2002, 4:45-61. Colhoun HM, McKeigue PM, Davey Smith G: Problems of reporting genetic associations with complex outcomes. Lancet 2003, 361:865-872. Cappuccio FP, Oakeshott P, Strazzullo P, Kerry SM: Application of Framingham risk estimates to ethnic minorities in United Kingdom and implications for primary prevention of heart disease in general practice: cross sectional population based study. Bmj 2002, 325:1271. Colditz GA, Coakley E: Weight, weight gain, activity, and major illnesses: the Nurses' Health Study. Int J Sports Med 1997, 18 Suppl 3:S162-170. Empana JP, Ducimetiere P, Arveiler D, Ferrieres J, Evans A, Ruidavets JB, Haas B, Yarnell J, Bingham A, Amouyel P, Dallongeville J: Are the Framingham and PROCAM coronary heart disease risk functions applicable to different European populations? The PRIME Study. Eur Heart J 2003, 24:1903-1911. Halamka JD, Mandl KD, Tang PC: Early experiences with personal health records. J Am Med Inform Assoc 2008, 15:1-7. Mandl KD, Simons WW, Crawford WC, Abbett JM: Indivo: a personally controlled health record for health information exchange and communication. BMC Med Inform Decis Mak 2007, 7:25. Simons WW, Halamka JD, Kohane IS, Nigrin D, Finstein N, Mandl KD: Integration of the personally controlled electronic medical record into regional interregional data exchanges: a national demonstration. AMIA Annu Symp Proc 2006:1099. Simons WW, Mandl KD, Kohane IS: The PING personally controlled electronic medical record system: technical architecture. J Am Med Inform Assoc 2005, 12:47-54. Riva A, Mandl KD, Oh DH, Nigrin DJ, Butte A, Szolovits P, Kohane IS: The personal internetworked notary and guardian. Int J Med Inform 2001, 62:27-40. Weingart SN, Rind D, Tofias Z, Sands DZ: Who uses the patient internet portal? The PatientSite experience. J Am Med Inform Assoc 2006, 13:91-95. Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel LM: Medicine. Reestablishing the researcher-patient compact. Science 2007, 316:836-837. McMurry AJ, Gilbert CA, Reis BY, Chueh HC, Kohane IS, Mandl KD: A self-scaling, distributed information architecture for public health, research, and clinical care. J Am Med Inform Assoc 2007, 14:527-533. Mandl KD, Kohane IS: Tectonic shifts in the health information economy. N Engl J Med 2008, 358:1732-1737. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449:851-861. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, et al: Blocks of limited haplotype diversity Cassa, Christopher A. Page 193 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. revealed by high-resolution scanning of human chromosome 21. Science 2001, 294:1719-1723. Malin BA, Sweeney LA: Inferring genotype from clinical phenotype through a knowledge based algorithm. Pac Symp Biocomput 2002:41-52. Lin Z, Owen AB, Altman RB: Genetics. Genomic research and human subject privacy. Science 2004, 305:183. Henneman L, Timmermans DR, van der Wal G: Public experiences, knowledge and expectations about medical genetics and the use of genetic information. Community Genet 2004, 7:33-43. Levitt DM: Let the consumer decide? The regulation of commercial genetic testing. J Med Ethics 2001, 27:398-403. Miller SM, Fleisher L, Roussi P, Buzaglo JS, Schnoll R, Slater E, Raysor S, PopaMabe M: Facilitating informed decision making about breast cancer risk and genetic counseling among women calling the NCI's Cancer Information Service. J Health Commun 2005, 10 Suppl 1:119-136. Mouchawar J, Hensley-Alford S, Laurion S, Ellis J, Kulchak-Rahm A, Finucane ML, Meenan R, Axell L, Pollack R, Ritzwoller D: Impact of direct-to-consumer advertising for hereditary breast cancer testing on genetic services at a managed care organization: a naturally-occurring experiment. Genet Med 2005, 7:191-197. Mouchawar J, Laurion S, Ritzwoller DP, Ellis J, Kulchak-Rahm A, Hensley-Alford S: Assessing controversial direct-to-consumer advertising for hereditary breast cancer testing: reactions from women and their physicians in a managed care organization. Am J Manag Care 2005, 11:601-608. Malin BA: An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J Am Med Inform Assoc 2005, 12:2834. Malin B, Sweeney L: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J Biomed Inform 2004, 37:179-192. Lin Z, Hewett M, Altman RB: Using binning to maintain confidentiality of medical data. Proc AMIA Symp 2002:454-458. Facts About Genome Sequencing [http://www.ornl.gov/sci/techresources/Human_Genome/faq/seqfacts.shtml#w hose] Malin BA: Protecting genomic sequence anonymity with generalization lattices. Methods Inf Med 2005, 44:687-692. Nothnagel M, Furst R, Rohde K: Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks. Hum Hered 2002, 54:186-198. Croner CM, Sperling J, Broome FR: Geographic information systems (GIS): new perspectives in understanding human health and environmental relationships. Stat Med 1996, 15:1961-1977. Pickle LW, Waller LA, Lawson AB: Current practices in cancer spatial data analysis: a call for guidance. Int J Health Geogr 2005, 4:3. Cassa, Christopher A. Page 194 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. Reis BY, Kirby C, Hadden LE, Olson K, McMurry AJ, Daniel JB, Mandl KD: AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc 2007, 14:581-588. Bradley CA, Rolka H, Walker D, Loonsk J: BioSense: implementation of a National Early Event Detection and Situational Awareness System. MMWR Morb Mortal Wkly Rep 2005, 54 Suppl:11-19. Brownstein JS, Freifeld CC, Reis BY, Mandl KD: Surveillance Sans Frontieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Med 2008, 5:e151. Freifeld CC, Mandl KD, Reis BY, Brownstein JS: HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports. J Am Med Inform Assoc 2008, 15:150-157. Brownstein JS, Freifeld CC: HealthMap: the development of automated realtime internet surveillance for epidemic intelligence. Euro Surveill 2007, 12:E071129 071125. EpiSPIDER [http://www.epispider.org/] Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL: Geocoding in cancer research: a review. Am J Prev Med 2006, 30:S16-24. Cassa CA, Grannis SJ, Overhage JM, Mandl KD: A Novel, Context-Sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection. J Am Med Inform Assoc 2005. Sweeney L: Uniqueness of Simple 'Demographics in the U.S. Population, LIDAPWP4. In Forthcoming book entitled, The Identifiability of Data. 2000 Sweeney L: k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 2002, 10:557-570. Sweeney L: Replacing Personally-Identifying Information in Medical Records, the Scrub System. Proc AMIA Annu Fall Symp 1996:333-337. Uzuner O, Luo Y, Szolovits P: Evaluating the state-of-the-art in automatic deidentification. J Am Med Inform Assoc 2007, 14:550-563. Uzuner O, Sibanda TC, Luo Y, Szolovits P: A de-identifier for medical discharge summaries. Artif Intell Med 2008, 42:13-35. Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD: Automated De-Identification of Free-Text Medical Records. BMC Med Inform Decis Mak 2008, 8:32. Ohno-Machado L, Silviera SP, Vinterbo S: Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 2004, 73:599-606. Vinterbo SA, Ohno-Machado L, Dreiseitl S: Hiding information by cell suppression. Proc AMIA Symp 2001:726-730. Ohno-Machado L SP, Vinterbo S.: Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 2004, 73:599-606. Cassa, Christopher A. Page 195 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. Armstrong MP RG, Zimmerman DL.: Geographically masking health data to preserve confidentiality. Statistics in Medicine 1999, 18:497-525. Cassa CA, Schmidt BW, Kohane IS, Mandl KD: My sister's keeper?: genomic research and the identifiability of siblings. BMC Med Genomics 2008, 1:32. Adida B, Kohane IS: GenePING: secure, scalable management of personal genomic data. BMC Genomics 2006, 7:93. Hoffman MA: The genome-enabled electronic medical record. J Biomed Inform 2007, 40:44-46. Kaiser J: Genomic databases. NIH goes after whole genome in search of disease genes. Science 2006, 311:933. Thomas DC: Are we ready for genome-wide association studies? Cancer Epidemiol Biomarkers Prev 2006, 15:595-598. Lowrance WW, Collins FS: Ethics. Identifiability in genomic research. Science 2007, 317:600-602. Brenner CH, Weir BS: Issues and strategies in the DNA identification of World Trade Center victims. Theor Popul Biol 2003, 63:173-178. Violence Against Women and Department of Justice Reauthorization Act of 2005. HR 3402; Public Law 109-162 2005. A haplotype map of the human genome. Nature 2005, 437:1299-1320. Olivier M: A haplotype map of the human genome. Physiol Genomics 2003, 13:3-9. Sebastiani P, Ramoni MF, Nolan V, Baldwin CT, Steinberg MH: Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat Genet 2005, 37:435-440. Slaughter LM: The Genetic Information Nondiscrimination Act: Why Your Personal Genetics are Still Vulnerable to Discrimination. Surg Clin North Am 2008, 88:723-738. Korobkin R, Rajkumar R: The Genetic Information Nondiscrimination Act--a halfstep toward risk sharing. N Engl J Med 2008, 359:335-337. Dugan RB, Wiesner GL, Juengst ET, O'Riordan M, Matthews AL, Robin NH: Duty to warn at-risk relatives for genetic disease: genetic counselors' clinical experience. Am J Med Genet C Semin Med Genet 2003, 119C:27-34. Kohane IS, Altman RB: Health-information altruists--a potentially critical resource. N Engl J Med 2005, 353:2074-2077. Population Mutation Databases [http://snpnet.jst.go.jp/link/Population_e.html] Ethnic & National Variation Databases [http://www.hgvs.org/dblist/deth.html] Krawczak M, Ball EV, Cooper DN: Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet 1998, 63:474-488. Tables that summarize the results of a meta-analysis of single base-pair substitutions logged in the Human Gene Mutation Database. [http://www.hgmd.cf.ac.uk/docs/msajhg1.txt] Cassa, Christopher A. Page 196 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. Olson KL BM, Pagano M, Mandl KD: Real time spatial cluster detection using interpoint distances among precise patient locations. BMC Med Inform Decis Mak 2005, 5:19. Buckeridge DL, Burkom H, Campbell M, Hogan WR, Moore AW, Project B: Algorithms for rapid outbreak detection: a research synthesis. Journal of Biomedical Informatics 2005, 38:99-113. Kulldorff M, Heffernan R, Hartman J, Assuncao R, Mostashari F: A space-time permutation scan statistic for disease outbreak detection. Plos Medicine 2005, 2:216-224. Wieland SC, Brownstein JS, Berger B, Mandl KD: Density-equalizing Euclidean minimum spanning trees for the detection of all disease cluster shapes. Proc Natl Acad Sci U S A 2007, 104:9404-9409. Cassa CA OK, Mandl KM: System To Generate Semisynthetic Data Sets of Outbreak Clusters for Evaluation of Outbreak-Detection Performance. MMWR Morb Mortal Wkly Rep 2004, 53:231. Kulldorff M, Nagarwalla N: Spatial Disease Clusters - Detection and Inference. Statistics in Medicine 1995, 14:799-810. Bureau USC: Census Block Groups Cartographic Boundary Files Descriptions and Metadata - U.S. Census Bureau. 2005. Documentation SJ: Random Class: nextGaussian() Method Documentation. 2005. Beitel AJ, Olson KL, Reis BY, Mandl KD: Use of Emergency Department Chief Complaint and Diagnostic Codes for Identifying Respiratory Illness in a Pediatric Population. Pediatr Emerg Care 2004, 20:355-360. Mandl KD RB, Cassa C.: Measuring outbreak-detection performance by using controlled feature set simulations. MMWR Morb Mortal Wkly Rep 2004, 53:130-136. Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, Pavlin JA, Gesteland PH, Treadwell T, Koski E, et al: Implementing syndromic surveillance: A practical guide informed by the early experience. Journal of the American Medical Informatics Association 2004, 11:141-150. Seaman V: An inquiry into the cause of the prevalence of the yellow fever in New York. The Medical Repository 1798, 1:315-372. Brownstein JS, Cassa CA, Mandl KD: No place to hide--reverse identification of patients from published maps. N Engl J Med 2006, 355:1741-1742. Moskop JC, Marco CA, Larkin GL, Geiderman JM, Derse AR: From Hippocrates to HIPAA: privacy and confidentiality in emergency medicine--Part I: conceptual, moral, and legal foundations. Ann Emerg Med 2005, 45:53-59. Federal Register. (Office C ed., vol. 67. pp. 53182-53273: Library of Congress; 2002:53182-53273. Fefferman NH, O'Neil EA, Naumova EN: Confidentiality and confidence: is data aggregation a means to achieve both? J Public Health Policy 2005, 26:430-449. Olson KL, Grannis SJ, Mandl KD: Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health 2006, 96:2002-2008. Cassa, Christopher A. Page 197 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. Cox LH: Protecting confidentiality in small population health and environmental statistics. Stat Med 1996, 15:1895-1905. Armstrong MP, Rushton G, Zimmerman DL: Geographically masking health data to preserve confidentiality. Stat Med 1999, 18:497-525. Cassa CA, Grannis SJ, Overhage JM, Mandl KD: A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. J Am Med Inform Assoc 2006, 13:160-165. Machanavajjhala A KD, Abowd J, Gehrke J, Vilhuber L: Privacy: Theory meets practice on the map. In International Conference on Data Engineering (ICDE) 2008 Strayer J: Linear Programming and Its Applications. New York: Springer-Verlag; 1989. ILOG I: ILOG CPLEX 10.010 Gentilly, France; 1999. Kulldorff M, Nagarwalla N: Spatial disease clusters: detection and inference. Stat Med 1995, 14:799-810. VanWey LK, Rindfuss RR, Gutmann MP, Entwisle B, Balk DL: Confidentiality and spatially explicit data: concerns and challenges. Proc Natl Acad Sci U S A 2005, 102:15337-15342. Salazar-Gonzalez J: Protecting tables with cell perturbation. In Proceedings of the Joint UN-ECE/Eurostat Work Session on Statistical Data Conο¬dentiality. 2005 Duncan G, Fienberg, S Obtaining information while preserving privacy: a Markov perturbation method for tabular data. In Proceedings of the Statistical Data Protection Conference. 1998: 351-362. Kargupta H, Datta, S, Wang, Q, Sivakumar, K: Random-data perturbation techniques and privacy-preserving data mining. Knowledge and Information Systems 2005, 7:387-414. Gutmann M, Stern, PC Putting people on the map: Protecting conο¬dentiality with linked social-spatial data. Panel on conο¬dentiality issues arising from the integration of remotely sensed and self-identifying data. In National Research Council of the National Academies Press Washington D.C.; 2007. Lakshmanan L, Ng, R, Ramesh, G: To do or not to do: the dilemma of disclosing anonymized data. In Proc ACM SIGMOD Conference. 2005: 61–72. Aggarwal C, Pei, J, Zhang, B: On privacy preservation against adversarial data mining. In 12th ACM SIGKDD Conference. 2006: 510-516. Ganta S, Acharya, R: On breaching enterprise data privacy through adversarial information fusion. In Proc Workshop on Information Integration Methods, Architecture, and Systems, at the 24th IEEE International Conference on Data Engineering. 2008: 246-249. Spruill NL: The confidentiality and analytic usefulness of masked business microdata. Proceedings of the American Statistical Association Section on Survey Research Methods 1983:602-607. Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, Pavlin JA, Gesteland PH, Treadwell T, Koski E, et al: Implementing syndromic Cassa, Christopher A. Page 198 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. surveillance: a practical guide informed by the early experience. J Am Med Inform Assoc 2004, 11:141-150. Cassa CA GS, Overhage JM, Mandl KD: A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. J Am Med Inform Assoc 2006, 13:160-165. Kamel Boulos MN, Cai Q, Padget JA, Rushton G: Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. J Biomed Inform 2006, 39:160-170. Statistics NCfH: Health, United States. pp. p.370, 429, 434; 2006:p.370, 429, 434. Bleichenbacher D: Chosen ciphertext attacks against protocols based on the RSA encryption standard PKCS #1. In Advances in Cryptology. 1998: 1-12. Cassa CA IK, Olson KL, Mandl KD: A software tool for creating simulated outbreaks to benchmark surveillance systems. BMC Med Inform Decis Mak 2005, 5:22-28. AEGIS-CCT [http://www.sourceforge.net/projects/chipcluster/] Stein C: Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proc Third Berkeley Symp Math Statist Prob 1956: 197-206. Baranchik AJ: A family of minimax estimators of the mean of a multivariate normal distribution. Ann Math Statist 1970, 41:642-645. HHS Awards Contracts to Develop Nationwide Health Information Network [http://www.hhs.gov/news/press/2005pres/20051110.html] American Health Information Community (the Community) Pickle LW: Spatial analysis of disease. Cancer Treat Res 2002, 113:113-150. Elliot P, Wakefiled JC, Best NG, Briggs DJ: Spatial epidemiology: methods and applications. Oxford: Oxford University Press; 2000. Brownstein JS, Cassa CA, Kohane IS, Mandl KD: Reverse geocoding: Concerns about patient confidentiality in the display of geospatial health data. AMIA Annu Symp Proc 2005:905. Curtis A, Mills JW, Leitner M: Keeping an eye on privacy issues with geospatial data. Nature 2006, 441:150. Curtis AJ, Mills JW, Leitner M: Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina. Int J Health Geogr 2006, 5:44. Boston Water and Sewer Commission: Boston planimetric and topographic data. 1996. US Census Bureau: Redistricting Census 2000 TIGER/Line Files [machinereadable data files]. Washington, DC; 2000. Cayo MR, Talbot TO: Positional error in automated geocoding of residential addresses. Int J Health Geogr 2003, 2:10. ESRI: ArcGIS. 8.1 edition. Redlands, CA: Environmental Systems Institute Inc.; 2001. ERDAS: IMAGINE. 8.5 edition. Atlanta, GA: ERDAS; 2001. Guha R: Object Coidentification on the Semantic Web. IBM Research, Almaden. New York: http://tap.stanford.edu/CoIdent.pdf; 2004. Cassa, Christopher A. Page 199 Privacy and Identifiability in Clinical Research, Personalized Medicine, and Public Health Surveillance 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. Sweeney L: Uniqueness of Simple Demographics in the U.S. Population, LIDAPWP4. Pittsburgh, PA: Carnegie Mellon University, Laboratory for International Data Privacy; 2000. Gregorio DI, Dechello LM, Samociuk H, Kulldorff M: Lumping or splitting: seeking the preferred areal unit for health geography studies. Int J Health Geogr 2005, 4:6. Leitner M, Curtis A: Cartographic guidelines for geographically masking the locations of confidential point data Cartographic Perspectives 2004, 49:22-39. Leitner M, Curtis A: A First Step Towards a Framework for Presenting the Location of Confidential Point Data on Maps - Results of an Empirical Perceptual Study. International Journal of Geographical Information Science 2006, 20:797-811. Krieger N, Waterman P, Lemieux K, Zierler S, Hogan JW: On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. Am J Public Health 2001, 91:1114-1116. Monmonier M: Spying with Maps: Surveillance Technologies and the Future of Privacy. Chicago, IL: University of Chicago Press.; 2002. Wartell J, McEwen JT: Privacy in the Information Age: A Guide for Sharing Crime Maps and Spatial Data. U.S. Department of Justice, Office of Justice Programs: http://www.ncjrs.org/pdffiles1/nij/188739.pdf; 2001. Pickle LW, Szczur M, Lewis DR, Stinchcomb DG: The crossroads of GIS and health information: a workshop on developing a research agenda to improve cancer control. Int J Health Geogr 2006, 5:51. Strang G: Introduction to Applied Mathematics. 1 edn: Wellesley Cambridge Press; 1986. American Health Information Community: Breakthroughs [http://www.hhs.gov/healthit/breakthroughs.pdf] Patient Record Anonymization GUI [http://www.sourceforge.net/projects/patientanon/] Authentication in an Internet Banking Environment. [http://www.ffiec.gov/pdf/authentication_guidance.pdf ] Cassa, Christopher A. Page 200