ESR1_HUMAN: D538G http://www.pantherdb.org/tools/csnpScoreForm.jsp? EVOLUTIONARY ANALYSIS OF CODING SNPS subPSEC (substitution position-specific evolutionary conservation) estimates the likelihood of a functional effect. Values are 0 to 10, (-10 most likely to be deleterious). -3 is the previously identified cutoff point for functional significance. 3.968431 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Pdeleterious (anything above 0.5 is substitution considered deleterious) 0.72481 D538G ESR1_HUMAN: D538G 2 http://mutationassessor.org/ Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University SNPs in miRNA Binding Sites • 11 possible candidate SNPs were selected for their potential relevance to breast cancer. • rs2747648, which resides in a predicted binding site for 3 miRNAs in the estrogen receptorα (ESR1) gene, was associated with a 27% reduction in breast cancer risk in premenopausal women. • When the C allele is present, miR-453 binds with greater affinity to ESR1, thus leading to decreased levels of ERα protein. Postmenopausal women already have reduced levels of endogenous estrogen, perhaps explaining why this SNP is relevant only in premenopausal women. • Would carriers of the ancestral T allele respond better to endocrine therapy ? given that they will naturally express increased levels of the receptor. References: Tchatchou, S. et al. A variant affecting a putative miRNA target site in estrogen receptor (ESR) 1 is associated with breast cancer risk in premenopausal women. Carcinogenesis 30, 59–64 (2009). Adams, B. D., Furneaux, H. & White, B. A. The micro-ribonucleic acid (miRNA) miR-206 targets the human estrogen receptor-α (ERα) and represses ERα messenger RNA and protein expression in breast cancer cell lines. Mol. Endocrinol. 21, 1132–1147 (2007). 3 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.genemania.org/ 4 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Before you design your own primers – Don’t reinvent the wheels! Essential Bioinformatics Resources for Designing PCR Primers for Various Applications: http://www.humgen.nl/primer_design.html 5 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Basic considerations before designing primers 1. Use NCBI Gene or UCSC genome browser to find gene variants: • Transcript variants • Alternative isoforms • Exon-intron boundaries • Pseusogenes 2. Gene conservation considerations 3. SNPsThere are approximately 56 million SNPs in the human genome, 16 million are in gene introns and exons, most are silent mutations. Are we aiming at these locations ? 6 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University jPCR: http://primerdigital.com/tools/soft.html Primer design and primer characteristics Primer length determines the specificity and affects annealing to the template: Short primer => low specificity, non-specific amplification Long primer => decreased binding efficiency at normal annealing temperature (due to high probability of forming secondary structures such as hairpins). • • • • Primer length: 18-24 bps, complete sequence identity to template G/C content: 40-60% Avoid mismatches at the 3’ end The presence of G or C bases within the last five bases from the 3' end of primers (GC clamp) helps promote specific binding at the 3' end. Avoid 3 or more G or C at the 3’ end because high primer-dimer probability • Avoid a 3’ end T • Always have a reference gene (GAPDH, actin, RPLPO (Large Ribosomal Protein)) performed with your query genes • Optimal amplicon size: 100-1000 bps Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.sciencedirect.com/science/article/pii/S0888754311001066# 7 Primer design: Melting temperature (Tm) Tm is the temperature at which 50% of the DNA duplex dissociates to become single stranded Determined by primer length, base composition and concentration Affected by the salt concentration of the PCR reaction mix Optimal melting temperature: 52°C - 60°C Tm above 65°C may cause secondary annealing, higher Tm (75°C 80°C) is recommended for amplifying high GC content targets Primer pair Tm mismatch Significant primer pair Tm mismatch can lead to poor amplification (desirable Tm difference < 5°C between primer pairs) 8 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Primer design: Annealing temperature Ta (Annealing temperature) vs. Tm 9 Ta is determined by the Tm of both primers and amplicons: optimal Ta=0.3 x Tm(primer)+0.7 x Tm(product)-25 General rule: Ta is 5°C lower than Tm Higher Ta enhances specific amplification but may lower yields Crucial in detecting polymorphisms Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Primer design: Specificity and cross homology Specificity: Determined primarily by primer length and sequence Cross homology: Cross homology may become a problem when PCR template is DNA with highly repetitive sequences Avoid non-specific amplification: BLAST PCR primers against NCBI non-redundant sequence database 10 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Primer design: Avoid secondary structures Hairpins are formed via intra-molecular interactions, negatively affect primer-template binding, leading to poor or no amplification Self-Dimer (homodimer) Formed by inter-molecular interactions between the two same primers Cross-Dimer (heterodimer) Formed by inter-molecular interactions between the sense and antisense primers Avoid Template Secondary Structure 11 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University 12 Web Site: Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://bioinfo.ut.ee/primer3-0.4.0/primer3/input.htm 13 Web Site: Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://primer3plus.com/cgi-bin/dev/primer3plus.cgi 14 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Design specific primers for each transcript: SNP primers: 0 Web Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Site: http://genepipe.ngc.sinica.edu.tw/primerz/beginDesign.do 15 SNPs Copy number variation and InDels Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www4a.biotec.or.th/rexprimer2/Genotyping 16 Dr. Metsada Pasmanik-Chor, 17 http://www4a.biotec.or.th/rexprimer2/OligoChecking Bioinformatics Unit, Tel Aviv University Primer Design Tools for Degenerate PCR– CODEHOP Name Type Key Functions Publication Info Times Cited Pros Cons Note YiBu’s Rating CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design Web-based software Design degenerate PCR primers based on multiple protein sequences alignments Nucleic Acids Research 2003 37 Widely cited with many successful applications; settings for genetic code and codon usage; Requires local multiple alignment as input and must be in Blocks Database format; In OBRC 4 out of 5 Web Site: http://blocks.fhcrc.org/codehop.html More Info: http://www.hsls.pitt.edu/guides/genetics/obrc/dna/pcr_oligos/URL1118954832/info 18 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Cross hybridization and specificity of primers Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.ncbi.nlm.nih.gov/tools/primer-blast/ 19 Resources for PCR Primer Specificity Analysis: NCBI BLAST 20 Primer specificity and Mapping: The UCSC In-Silico PCR Dr. Metsada Pasmanik-Chor, 21 Bioinformatics Unit, Tel Aviv University http://genome.csdb.cn/cgi-bin/hgPcr PCR reaction setup calculators Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://primerdigital.com/tools/ReactionMixture.html 22 Public PCR Primers/Oligo Probes Repository: The NCBI Probe Database ESR1 human 23 http://www.ncbi.nlm.nih.gov/probe Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Resources for real time PCR: RTPrimerDB Shows pre-calculated primers on all gene transcripts ! Web Site: 24 http://www.rtprimerdb.org/ Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Web Site: http://pga.mgh.harvard.edu/primerbank/index.html More Info: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=14654707 25 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University 26 http://primerdepot.nci.nih.gov/ Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://eu.idtdna.com/pages/scitools 27 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://eu.idtdna.com/calc/dilution/ Dilution Calculator Takes an oligo stock solution of higher concentration and determines how much volume to dilute down to final (desired) lower concentration. Input of the volumes of the stock solution (Start Volume) and the diluted solution (End Volume) are not required, but recommended. 28 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Exome Analysis Identify genetic disease causes: Sequence the human coding regions of patient and healthy (1-2% of the human genome (~30Mb)), find the genomic cause of diseases. http://www.frontiersin.org/Journal/10.3389/fendo.2011.00008/full http://gtbinf.wordpress.com/2012/11/29/exome-sequence-analysis-group-1/ Dr. Metsada Pasmanik-Chor, 29 Bioinformatics Unit, Tel Aviv University => http://www.ebi.ac.uk/Tools/st/emboss_backtranseq/ => 30 >A8KAF4_HUMAN A8KAF4 Estrogen receptor OS=Homo sapiens PE=2 SV=1 ATGACCATGACCCTGCACACCAAGGCCAGCGGCATGGCCCTGCTGCACCAGATCCAGGGC AACGAGCTGGAGCCCCTGAACAGGCCCCAGCTGAAGATCCCCCTGGAGAGGCCCCTGGGC GAGGTGTACCTGGACAGCAGCAAGCCCGCCGTGTACAACTACCCCGAGGGCGCCGCCTAC GAGTTCAACGCCGCCGCCGCCGCCAACGCCCAGGTGTACGGCCAGACCGGCCTGCCCTAC GGCCCCGGCAGCGAGGCCGCCGCCTTCGGCAGCAACGGCCTGGGCGGCTTCCCCCCCCTG AACAGCGTGAGCCCCAGCCCCCTGATGCTGCTGCACCCCCCCCCCCAGCTGAGCCCCTTC CTGCAGCCCCACGGCCAGCAGGTGCCCTACTACCTGGAGAACGAGCCCAGCGGCTACACC GTGAGGGAGGCCGGCCCCCCCGCCTTCTACAGGCCCAACAGCGACAACAGGAGGCAGGGC GGCAGGGAGAGGCTGGCCAGCACCAACGACAAGGGCAGCATGGCCATGGAGAGCGCCAAG GAGACCAGGTACTGCGCCGTGTGCAACGACTACGCCAGCGGCTACCACTACGGCGTGTGG AGCTGCGAGGGCTGCAAGGCCTTCTTCAAGAGGAGCATCCAGGGCCACAACGACTACATG TGCCCCGCCACCAACCAGTGCACCATCGACAAGAACAGGAGGAAGAGCTGCCAGGCCTGC AGGCTGAGGAAGTGCTACGAGGTGGGCATGATGAAGGGCATCAGGAAGGACAGGAGGGGC GGCAGGATGCTGAAGCACAAGAGGCAGAGGGACGACGGCGAGGGCAGGGGCGAGGTGGGC AGCGCCGGCGACATGAGGGCCGCCAACCTGTGGCCCAGCCCCCTGATGATCAAGAGGAGC AAGAAGAACAGCCTGGCCCTGAGCCTGACCGCCGACCAGATGGTGAGCGCCCTGCTGGAC GCCGAGCCCCCCATCCTGTACCCCGAGTACGACCCCACCAGGCCCTTCAGCGAGGCCAGC ATGATGGGCCTGCTGACCAACCTGGCCGACAGGGAGCTGGTGCACATGATCAACTGGGCC AAGAGGGTGCCCGGCTTCGTGGACCTGACCCTGCACGACCAGGTGCACCTGCTGGAGTGC GCCTGGCTGGAGATCCTGATGATCGGCCTGGTGTGGAGGAGCATGGAGCACCCCGGCAAG CTGCTGTTCGCCCCCAACCTGCTGCTGGACAGGAACCAGGGCAAGTGCGTGGAGGGCATG GTGGAGATCTTCGACATGCTGCTGGCCACCAGCAGCAGGTTCAGGATGATGAACCTGCAG GGCGAGGAGTTCGTGTGCCTGAAGAGCATCATCCTGCTGAACAGCGGCGTGTACACCTTC CTGAGCAGCACCCTGAAGAGCCTGGAGGAGAAGGACCACATCCACAGGGTGCTGGACAAG ATCACCGACACCCTGATCCACCTGATGGCCAAGGCCGGCCTGACCCTGCAGCAGCAGCAC CAGAGGCTGGCCCAGCTGCTGCTGATCCTGAGCCACATCAGGCACATGAGCAACAAGGGC ATGGAGCACCTGTACAGCATGAAGTGCAAGAACGTGGTGCCCCTGTACGACCTGCTGCTG GAGATGCTGGACGCCCACAGGCTGCACGCCCCCACCAGCAGGGGCGGCGCCAGCGTGGAG GAGACCGACCAGAGCCACCTGGCCACCGCCGGCAGCACCAGCAGCCACAGCCTGCAGAAG TACTACATCACCGGCGAGGCCGAGGGCTTCCCCGCCACCGTG 6 frames translation Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.ebi.ac.uk/Tools/st/emboss_transeq/ Resources for PCR Primer Mapping/Amplicon Size Format Conversion tools: Reverse and\or Complement of DNA sequences (http://www.bioinformatics.org/sms2/rev_comp.html) Split FASTA: divides FASTA sequence records into smaller FASTA sequences of the size you specify (http://www.bioinformatics.org/sms2/split_fasta.html) Sequence Analysis: DNA Pattern Find: accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern (http://www.bioinformatics.org/sms2/dna_pattern.html) PCR Primer Stats: accepts a list of PCR primer sequences and returns a report describing the properties of each primer, including melting temperature, percent GC content, and PCR suitability (http://www.bioinformatics.org/sms2/pcr_primer_stats.html) PCR Products: accepts one or more DNA sequence templates and two primer sequences. The program searches for perfectly matching primer annealing sites that can generate a PCR product. Any resulting products are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the primers that produced them (http://www.bioinformatics.org/sms2/pcr_products.html) Reverse Translate (http://www.bioinformatics.org/sms2/rev_trans.html) Translate (http://www.bioinformatics.org/sms2/translate.html) Primer Map: accepts a DNA sequence and returns a textual map showing the annealing positions of PCR primers (http://www.bioinformatics.org/sms2/primer_map.html) Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.bioinformatics.org/sms2/index.html 31 Comparing gene-lists x total 127 x only 62 x-y total overlap y total 628 y only 566 x-z total overlap z total 0 z only 0 y-z total overlap http://www.cmbi.ru.nl/cdd/biovenn/ Venny http://bioinfogp.cnb.csic.es/tools/venny/ 32 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Microarray and Next Generation Sequencing Technologies Microarray Experiments Next Generation Sequencing Anchor DNA single molecule to solid surface Amplify template by in situ PCR Add 4 color labeled reverse terminators, polymerase, universal primer Remove un-incorporated nucleotide Detect with laser Reverse termination, repeat 1…100 times, the number of cycles determines the length of sequence. Probes for genes are located on the chip. Hybridization Next generation sequencing bypass the rate-limiting step of conventional In both technologies, the great advantage is achieved by novel bio-technologies for of mRNA to the probes on the chip is performed and DNA sequencing (separating randomly terminated DNA polymers by gel producing high throughput data !!! electrophoresis) by physically arraying DNA molecules on solid surfaces and results are recorded. However, both have pros and cons… determining the DNA sequence in situ, without the need for gel separation. Various platforms ! 33 http://molonc.bccrc.ca/?page_id=191 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Arrays pros cons relatively cheap detection of only known transcripts mature biotechnology and analysis tools (since the late 90’s) limited to sequenced organisms, no de-novo fixed probes, no heterogeneity of coverage highly reproducible higher background low expressed genes are less accurately detected still expensive very sensitive if sufficient sequence depth direct read-out of all transcripts paired-end reads, better accuracy NGS de-novo sequencing, new genomes highly reproducible new and exciting technical bias in mRNA library preparation and in transcripts of different length pre-mature bioinformatics tools de-novo analysis is tricky, ambiguity in mapping reads to the genome very high coverage is needed for low expressed genes variable sequence coverage for different genomic regions 34 In both, consistent biological interpretation ! Consistent Biological Interpretation ? Marioni J C et al. Genome Res. 2008;18:1509-1517 35 http://cage.unl.edu/RNASEQ_Transcriptomics.pdf Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University Copyright © 2008, Cold Spring Harbor Laboratory Press NGS are becoming the technology of choice for a wide range of applications, but the transition away from microarrays is still long. Different applications have different requirements, so researchers need to carefully weigh their options when making the choice for using a platform. http://www.genengnews.com/gen-articles/next-generation-sequencing-vs-microarrays/4689/ Dr. Metsada Pasmanik-Chor, 36 Bioinformatics Unit, Tel Aviv University TAU Bioinformatics unit: who are we and what do we do ? 37 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University http://www.tau.ac.il/lifesci/bioinformatics.html metsada@post.tau.ac.il Tel: 03-6406992 38 Dr. Metsada Pasmanik-Chor, Bioinformatics Unit, Tel Aviv University