more details - Edmond J. Safra Bioinformatics

advertisement
You are cordially invited to a talk of the Edmond J. Safra Bioinformatics Program
Distinguished Speaker Series.
The speaker is Dr. Yaniv Erlich, Whitehead Institute for Biomedical Research,
MIT
Title: "Surname leakage from whole genome sequencing dataset"
Time: Tuesday, 10 January 2012, at 12:15 (refreshments from 12:00)
Place: Holtzblat hall 007, Physics building, at Exact Sciences Faculty
Host: Prof. Ron Shamir, School of Computer Science
Abstract:
Posting annonymous sequening results without identifiers have become a common practice in
large scale sequecning projects. Here, we report a novel risk of surname leakage from wholegenome sequencing datasets. Different from previously described risks, this approach does
not require physical access to DNA of the target or prvious leakage of DNA data from the
target. Surname leakage relies on bioinformatic profiling of short tandem repeats (STR) on
the Y-chromosome and querying massive Web 2.0 genealogical databases. We demonstrate
the applicability of the technique by recovering the surname ‘Venter’ from Craig Venter’s
genome sequence. We also show that short read datasets are amenable to this technique.
According to a conservative estimation, surname recovery would jeopardize 10% of whole
genome sequencing datasets of US individuals. Moreover, the combination of a leaked
surname with age and state narrows the identity of a sequencing dataset to ≤10 US
individuals in most cases. As a remedy, we developed STR ANGERS, a tool to mitigate the
risk for surname leakage from sequencing datasets. STRANGERS uses a statistical
framework that ensures maximal datasharing of Y chrmosome variations while eliminating the
risk of surname leakage.
Download