8-17-12 GenotypingMinutesAug162012_Affy edits

advertisement
Follow-up Genotyping Conference Call
“Transdisciplinary Research in Cancer of the Lung (TRICL)”
August 16, 2012
Meeting minutes
Participants: Chris Amos (MDACC), Mala Pande (MDACC), Maria Tere Landi (NCI), Shenying
Fang (MDACC), Jen Doherty (Dartmouth), Andrea Finn (Affymetrix) Yiping Zhan (Affymetrix),
Teresa Webster (Affymetrix), Alex Forrest-Hay (Affymetrix)
The purpose of this call was to discuss the ongoing development of an Affymetrix Axiom array
genotyping platform to be applied to samples from TRICL replication and fine mapping
analyses. The TRICL grant proposal calls for a replication and fine mapping stage of about
15,000 individuals across different ethnic groups with an emphasis on studying cohort samples.
The primary goals were to validate findings from the GWAS that was conducted, to perform fine
mapping of regions identified by GWAS, to characterize findings in different populations, and to
provide a platform for epidemiological characterization. Shenying Fang and Mala Pande are
working on configuring the request for variants to be genotyped.
Dr. Amos began the call by reviewing the status of genotyping contracts. Affymetrix indicated
that the initial contract from Dr. Hung has been received and they are awaiting a second
contract to be received next week. Dr. Doherty has not initiated her contract but it must be
offered as a fee for service relationship. It is preferred that the work be done at the same time
as the rest of the genotyping to ensure that genotype calling is comparable among studies.
Dr. Amos asked about the manner in which information should be provided to Affymetrix about
SNPs, genes, variations described by DNA sequences and regions of interest. In particular he
was wondering if we could keep track of what data source provided a request so that we could
separate the requests at the end of the experiment. Affymetrix responded that this would be
feasible but they prefer to have spreadsheets that concatenate all the different sources of
information.
Affymetrix indicated that there are several categories of markers they can query with their
platform. There are already validated markers that can immediately be included (with very high
probability their genotyping will work in the array). Where markers are not already validated the
P.I. can elect to try to genotype using probesets that query the marker using the forward and
backward strands, so that these trial analyses require twice as much space in the array. Finally
the P.I. can select tagging SNPs that have been validated (and therefore only require querying a
single variant to tag a number of other markers in many cases). The genes that have been
requested currently contain about 160,000 Axiom-validated markers (and there are about
30,000 rsID’s in the request) but this is including all Axiom-validated markers in the genes of
interest, which may be more than we require for many genes for which tagging SNPs should
suffice. Affymetrix suggested that we submit a list of genes and regions that we want densely
genotyped and on that list also indicate genes that can be genotyped by tagging (At Affy a
greedy method is used to pick Axiom-validated markers that can tag all markers with LD data in
a given population and can be tagged at a given r2 cutoff.). They thought this would allow a
sufficient reduction in number of genotypes to be done that we would fit within the approximately
100K markers, assuming the “normal” amount of space (versus twice as much space for
markers tiled “de novo”) requirement for each marker available on the platform. When SNP
information is transmitted to Affymetrix they prefer that the rs number, the chromosomal position
in HG19 coordinates and the forward strand alleles be provided. Flanking sequences are very
valuable especially for indels. An individual marker list with indications about how far we are
willing to go to get its genotype information for them will be provided. There are at least the
following 3 options.
#1: Only include a marker in an array if it is already Axiom-validated (for markers that are not
very important as an individual)
#2: Include a marker in an array if it is Axiom-validated and greedily pick tags for all those
markers that are not Axiom-validated but have LD data (this can be done in the same process
when we pick tags for genes).
#3: Include a marker in an array if it is Axiom-validated and for each marker that is not Axiomvalidated, tile it “de novo” on the array and pick a “best tag” for it if possible (we can even
consider picking redundant tags for these super important markers where possible).
They indicated that the TCGA derived mRNA information would be hard to transform to a format
for array design as this spreadsheet requires that they derive the forward strand allele based on
information related to mRNA. They requested that rs numbers and forward strand alleles be
provided if possible.
Dr. Landi asked about timeline. Dr. Finn indicated that from the time that a final SNP list is
provided to Affymetrix, it will take 4-6 weeks to configure and deliver the array. The time
between when a design request is communicated with Affymetrix and when they can deliver the
associated design results depends on the complexity of the request but this time delay can be
as short as two to three days. Dr. Finn indicated that the lab in Toronto is working towards
having the equipment in place and working.
Dr. Amos indicated that communication from Affymetrix to TRICL should only include Drs. Amos
and Pande for now if it includes specific information about SNPs because some SNP requests
are viewed as proprietary by the requestors.
Additional notes:
YZ: Flanking sequences in the format like “AGACCATTCTTGCCCCAGCCCTTTCACCTGGCCCA[/CCT]CCTCTCCCTCCTCAGGGCCTGAGCACATCACAACT” are highly valuable for indel markers if
provided since indel position alignment can be confusing in dbSNP in some cases. If such
flanking sequences are not provided, we can extract such information based on our
understanding and together we can figure out some method to ensure that Affy is going to use
the correct sequences to design the probesets later.
YZ: It seems that the TCGAlung3007 tab does contain the chromosomal positions in the first
data column (in hg19?). Understanding the allele-specification may take some work though,
especially figuring out on which strand the alleles are located
AF from a later request about details of amplified DNA:
It seems that the TCGAlung3007 tab does contain the chromosomal positions in the first data
column (in hg19?). Understanding the allele-specification may take some work though,
especially figuring out on which strand the alleles are located
CA: we do not have any lists from fine mapping (Yufei and Rayjean) or from Alvara Monteiro.
CA: For consistency chromosomal positions should be retrieved from HG19
CA: Fine Level Prioritization Scheme:
1. SNPs/genes identified from U19 GWAS – very dense coverage.
2. SNPs/genes identified from pathway based or other novel analytical approaches from U19 (by
the way I will also have some genes from g x e analyses to suggest) – tagging SNPs.
3. SNPs/genes suggested from other GWAS of lung cancer – very dense coverage.
4. SNPs/genes suggested from TCGA affecting risk for disease.
5. SNPs/genes suggested from survival analyses of lung cancer – tagging SNPs.
6. SNPs suggested by other U19 groups.
7. Vitamin B/Folate (integration with another consortium).
8. Strong Candidate Genes for lung cancer (e.g. CYP2A6, CYP2B6).
9. Inflammation pathway and other suggested pathways (these are put below candidate genes
because I think they have less evidence than some candidate genes)
10. Weak Candidate Genes for lung cancer (e.g.GSTM1).
11. SNPs related to Smoking behavior.
12. SNPs suggested from COPD studies.
13. SNPs/genes suggested from TCGA or other bioinformatic studies that identify genes modified in
lung cancer.
14. SNPs related to other cancers.
15. SNPS for European Admixture.
16. SNPs related other diseases like asthma.
Download