in Silico Primer Design Presentation

advertisement
in Silico Primer Design and
Simulation for Targeted High
Throughput Sequencing
I519 – FALL 2010
Adam Thomas,
Kanishka Jain,
Tulip Nandu
BACKGROUND
n Major Milestone
n Molecular structure of DNA
n Human Genome Project
n High-Throughput Sequencing (HTS)
n HTS transformed common experiments on single
genes to entire genomes
n Low cost
n Multiple samples in every run (Eg. 454 Sequencer can
sequence 400-600Mb)
BACKGROUND
n Primers are a short stand of nucleotides that serve
as the starting point of DNA synthesis.
n Approximately 20-25 nucleotides.
n Used to determine the DNA strand that needs
amplification.
n Complement of DNA strand.
PCR
n
Polymerase Chain Reaction
n
Technique to amplify a small region of DNA
n
3 step process:
n
n
Denaturation,
n
Annealing and
n
Extension.
Process repeated for approximately 30 to 40 cycles.
PCR
n Denaturation
Heat (approx 90°C) separates double strand into two single strands
PCR
n Annealing
Primer binding to individual strands (occurs at 45 to 60°C)
PCR
n Extension
Temperature raised to 72°C and the Tag DNA polymerase enzyme is used to
replicate DNA strands
PCR
n End of First Cycle
Process repeated for approximately 30 to 40 cycles.
CURRENT PROCESS
CURRENT PROCESS
n Primer3 used to create primers using PCR.
n The primers then need to be validated. Validation is
performed by simulation, alignment and re-assembly.
n MetaSim is used to simulate PCR to create expected
amplicons.
n CAP3 is used for re-assembly of simulated sequences.
n BLASTing the simulated sequences against the original
sequence give a fairly accurate measure of how well the
primers will perform.
ISSUES FACED WITH
CURRENT PROCESS
n Each tool uses different file inputs and outputs.
n Users have to manually convert file formats to use in
each tool.
n None of the tools up till now can integrate all of the
functions and give high throughput analysis.
GOAL
Integrate the whole process involved in the High
throughput sequencing experiment and keep track of the
parameters that are enter or changed.
OBJECTIVES
n
A way to visualize the primers and amplicons in relation
to the genome and be able to edit the primers manually
and see how that affects the simulation.
n
Optimization of the high-throughput process by
minimizing the number of reads needed by the ‘454
process’ and still be able to assemble the sequence.
n
Validation of the simulated amplicon reads to see
whether the predicted simulation is in order and rectify
the problem.
PROPOSED SOLUTION
VISUALIZATION TOOL
n GBrowse
n Popular and open source.
n Well defined plugin architecture.
n Plugin to design primers using Primer3 already
available.
PRIMER DESIGN
n PrimerDesign.pm plugin already exists for GBrowse.
Design primers using Primer 3
n Designed to only amplify one specific region of DNA
with as few primers and no overlapping amplicons.
n Tweaked to take two additional input parameters:
Amplicon Overlap and Max Amplicon Length.
n Once primers are created using GBrowse, the
primers are output into a Featured File Format (FFF)
PRIMER VALIDATION SIMULATION
n Simulation performed using MetaSim.
n MetaSim:
n
Generates sets of synthetic reads or mate-pairs based on
adaptable sequencing error models (e.g. for Sanger
chemistry, Roche's 454 and Illumina (former Solexa).
n
Can be controlled via graphical user interface or in command
line mode.
SIMULATION
n Function written in Perl to invoke MetaSim using
command line option.
n Algorithm:
n Read FFF file. Extract primer coordinates.
n Extract sequence from the original sequence.
n Run MetaSim simulation using command line
options.
n Each sequence generates its own FASTA
ASSEMBLY
n Perl function written to invoke CAP3 using its
command line interface.
n Each file generated from the MetaSim simulation is
input into CAP3 which then assembles the contigs.
ASSEMBLY
n CAP3.
n Input simulated sequences as FASTA file.
n CAP3 is a sequence assembly program that allows users
to assemble a set of short contigs.
n Takes an input a file of sequence reads in FASTA format.
n If header contains a dot (‘.’), CAP3 requires that the
names of reads sequenced from the same subclone
contain the same substring up to the first dot.
n Can be invoked using a command line interface.
BLAST
n Assembled contigs are then BLASTed against the
original sequence to validate.
n GBrowse accepts the assembled sequence and
BLASTs against the original sequence.
n This plugin requires 4 steps:
n Exporting assembled contigs and original
sequence from Gbrowse.
n Creating a BLAST database.
n BLASTing the contigs against the sequence.
n Importing result back into GBrowse.
DEMO
QUESTIONS
Download