Real-Time Primer Design for DNA Chips Annie Hui CMSC 838 Presentation Use of primers in PCR and Microarrays PCR (polymerase chain reaction: to amplify a particular DNA fragment Use: to test for the presence of nucleotide sequences Test of PCR products: Ladder: a mixture of fragments of known length Lane 1 : PCR fragment is ~1850 bases long. Lane 2 and 4 : the fragments are ~ 800 bases long. Lane 3 : no product is formed, so the PCR failed. Lane 5 : multiple bands are formed because one of the primers fits on different places. CMSC 838T – Presentation Use of primers in PCR and Microarrays DNA chips (Microarrays): to analyse a large number of genes in parallel. fluorescence Primers: 20 to 100 bases long Synthetically manufactured Bound to primer Automated design of primer A computational approach Objective: To find primers that bind well without self-hybridizing Critique: how accurate? Fixed on chip CMSC 838T – Presentation Motivation: This group uses the automated NucliSens extraction system (bioMerieux) to develop their primers here. CMSC 838T – Presentation Technique: The computational model 1. Select primers from target sequence two primers P (forward) and Q (reverse) for PCR, one primer for DNA chip (microarray) Using window size W, number of possible primers with length n between m and n within 1 window is: S l m (W l ) 1 CMSC 838T – Presentation Technique: The computational model 2. For each primer pair, or single primer, Quantify 4 hybridization conditions: a. Primer length b. Melting temperature c. GC content d. Secondary structure i. ii. iii. iv. We are starting here Self annealing Self end annealing Pair annealing Pair end annealing CMSC 838T – Presentation Technique: quantifying hybridization conditions a. Primer length len(P) b. Affect melting temperature and hybridization Melting temperature Tm(P) Temperature at which the bonds between primer and gene sequence break n 1 c. H p H pi , pi 1 CG content CG(P) T p H ni1p1 m ,1 S p S p ,p G-C pairs are more stable than A-Tpairs S p R ln 4 (because of more H-bonds) p primer i 1 i R 1.987cal / C mol 50 109 # G in P # C in P GC p 100 T0 237.15 C p What is this measure good for? t 21.6 C H p enthalpy S p entropy CMSC 838T – Presentation T0 t i 1 Technique: quantifying hybridization conditions d. Secondary structure Study how likely a primer entangles with itself or with another primer P = {p1, p2, …, pn}, Q = {q1, q2, …, qm}, Scoring function: S(pi, qj) Example: = 2 = 4 if {pi, qj} = {A, T} if {pi, qj} = {C, G} = otherwise 0 Position i of primer P P: ...AGCTTTAGCCATAG Q: TCTTAGGATCGC... score S(pi, q1) = 2+4+2+2+4 = 14 CMSC 838T – Presentation Technique: quantifying hybridization conditions Four measures of secondary structure: i. Self annealing, • SA(P, P’) P’ = reverse of P SA( p, p' ) P m max s( p , p k 1 m ,..., m 1 i 1 ii. ik ') P’ P’ P’P’ P’ P’ P’ Self end annealing, SEA(P, P’) • • • iii. i Like Self annealing P’ P’ P’ P’ k>=0 Only count longest continuous overlaps Pair annealing, PA(P, Q) P and Q are the forward and reverse primers Pair end annealing, PEA(P, Q) • iv. P • similar to self end annealing CMSC 838T – Presentation Technique: How to apply the model For PCR: SCPCR( p, q) [ len ( p) GC( p) Tm ( p) SA( p) SEA( p) len (q) GC(q) Tm (q) SA(q) SEA(q) PA( p, q) PEA( p, q) ] P is forward primer, Q is reverse primer Ideally, no annealing, length, GC and temp of P equals Q SCPCRideal p len p 0 0 0 0 w 0.5 1 1 0.1 0.2 0.5 1 1 0.1 0.2 0.1 0.2 GC p Tm, p The optimization is: 0 0 len p GC p Tm , p min lPCR p p where lPCR p SCPCR ( p, q ) SCPCRideal p wT For DNA chips (Microarrays): Q doesn’t exist. No pair annealing to study. Only 5 terms left. CMSC 838T – Presentation Technique: parallelize SCPCR(p,q) calculation Compute PA and PEA in parallel Calculate Len, GC, Temp, SA and SEA in parallel CMSC 838T – Presentation Technique: details Melting temperature and CG content: Simple adder+divider Use pipelining 1st one: O(m) Subsequent cost: O(1) Whole window: AGCGATATA i-th P primer: GCGATA (i+I)-th P primer: CGATAT • CG(Pi+1) = CG(Pi) - 1 • H(Pi+1) = H(Pi) - H(GC) + H(AT), • similar for S Annealing matrix c b cd a bd ce ad be cf d ae bf e af f CMSC 838T – Presentation Complexity Complexity for sequential algorithm: For PCR: p Number of choices of P (window size=Wp): S l m p (W p l ) 1 n Number of choices of Q (window size=Wq): T l m (Wq l ) 1 Each distance SCPCR(P,Q): Ol p2 lq2 l plq Total: OS T Wp2 Wq2 WpWq nq q Complexity for parallel algorithm: For PCR: Distance measure SCPCR(P, Q) = O(1) Total: O(S*T) O(S*S*T*T) is a typo in the paper Similar but simpler for Microarray CMSC 838T – Presentation Evaluation Experimental environment 512 primer pairs, |Wp| = |Wq| = 16 1. 500MHz Celeron system with integrated hardware accelerator 2. Software implementation Evaluation results 1920 secs for software implementation 3.41 secs for using hardware accelerator CMSC 838T – Presentation Related Work Previous approach DOPRIMER Same computational model Differ in the way of doing dynamic programming Sequential in nature Other Primer selection softwares Eg: Primer Premier 5, Primer3, PrimerGen, PrimerDesign Similarities: Criteria: Length, Temp range, GC range, GC Clamp, 3’ end stability, uniqueness of 3’ end base, Dimer/hairpins, Degeneracy, Salt concentration, Annealing Oligo Concentration, etc Differences: Not a weighed linear sum of all criteria Need much expert’s supervision, the numerical criteria are used as a guide only CMSC 838T – Presentation More Related Works Case study Burpo did a critical review of PCR primer design algorithms Subject: saccharomyces cerevisiae deletion strains Conclusion: no suitable program for the task of post-design PCR analysis Especially in the aspect of accurately predicting non-specific hybridization events that impair PCR amplification. CMSC 838T – Presentation Observations My observations: Minus side: Is the computational model too simplistic? Specifically, is a weighed linear sum justified? Plus side: The design of the parallel architecture is neat. Since primers are about the length of 18-22 bases, current technology certainly can handle it. When would you need fast primer selection? Primer walking to connect contigs together quickly To scan through a large number of sequences for possible primers CMSC 838T – Presentation