Slides - Indiana University

advertisement
High dimensional genomic data,
identifiability, and query-response
Haixu Tang
School of Informatics and Computing
Indiana University, Bloomington
“Big Data” in Personal Genomics
• Genomics is a key component of personalized
medicine
– Massive
• Large research-oriented projects: 1000 genomes to 106
• Genome sequencing for all new-borns?
• Open data project, e.g., the Personal Genomics Project
(PGP)
– Heterogeneous
• Genomic sequence (variations)
• Constant, dynamic monitoring
– Transcritpomics, proteomics, metabolomics, microbial
communities, etc. (as demonstrated by iPOP)
Challenges in Personal Genomics
Personalized Healthcare
Research (secondary)
Analysis
Detection of markers for diagnosis
and treatment (pharmacogenomics)
Discovery of markers
Sharing
Sharing patient data among health
practitioners
Searching for successful treatment on
similar patients (“patient like me”)
Methodology development
Validation of markers
Challenges: Speed, Storage, Scalability, Security
Solution: cloud, hybrid cloud, bring computing to the data!
Privacy Enhancing Technologies
Personalized Healthcare
Research (Secondary)
Analysis
Detection of markers for diagnosis
and treatment (pharmacogenomics)
Discovery of markers
Sharing
Sharing patient data among health
practitioners
Searching for successful treatment on
similar patients (“patient like me”)
Methodology development
Validation of markers
Cryptographic protocols: SMC,
homomorphic computation,
functional encryption
Database security approaches:
access control, query auditing,
differential privacy
Ethic studies, informed consent, policy
What is specific for genomic data?
• Challenges
– Genome technologies evolve very fast!
– Genomic data are extremely high dimensional
• Millions of SNPs, easily identifiable
• Balance between data security and utility
– Not only the data, but also analysis results need to be
protected
• Allele frequencies or test statistics (e.g., Homer’s attack)
• Special properties
– Different dimensions are NOT independent
• Genetic structures (e.g., linkage disequilibrium)
– Specific genomic research focuses on a small number of
dimensions (e.g., disease-associated SNPs)
Download