High-throughput mapping pipeline of EST based

advertisement
High-Throughput Mapping Pipeline
Of EST Based Sequences On Lettuce Genome.
Genetic Map Validation, Visualization And Web Presentation.
2
Marker (Contig/EST) selection
(gene of interest)
1
Alexander Kozik , Steve Edberg , Barnaly Pande , David Caldwell ,
David Lee , Travis Kleeburg , Fallon Chen , Richard Michelmore
Identification of putative
polymorphism and oligo design
http://cgpdb.ucdavis.edu/
Genome Center, University of California Davis CA 95616
Efficient genetic mapping of large numbers of markers requires the coordination of the efforts of
several people each of whom are responsible for different parts of the operation. We have developed
a pipeline to map new markers on high density genetic maps. Major goals and features of this
pipeline are: A) Minimizing the number of steps from experimental data generation to entry of data
into database. B) Automatic error checking and error handling of spreadsheets that contain marker
descriptions and raw scores, prior to uploading into the database. C) Simplified and controlled data
flow to and from database for mapping procedures. This pipeline includes: i) Python Contig Viewer
for semiautomatic search of EST candidates with putative polymorphisms and automatic design
oligonucleotide primers for selected sequences relative to potential intron positions. ii) Scripts that
provide controlled dataflow from spreadsheets into our relational mySQL database. iii) Web interface
(Dendrogram Viewer) to manipulate data in database and pre-select sets of markers for further
mapping while maintaining linkage group designations from earlier maps. iv) Visualization tools
(CheckMatrix) to perform quality control and validate maps. v) Web interface to display results of
mapping graphically. Our database (http://cgpdb.ucdavis.edu/) provides functionality to compare
several genetic maps simultaneously as well as the raw data that were used to construct the genetic
maps. These scripts are publicly available.
MAPPING USING JOINMAP AND CUSTOM PROGRAMS
4
Python ContigViewer
http://cgpdb.ucdavis.edu/SNP_Discovery/Py_ContigViewer/
3
Entering raw scores data and
marker information into database
MS Excel templates with defined fields.
Error checking using custom Python scripts.
Validation of polymorphism and
scoring (genotyping) using RILs
Wet lab experiment
6
http://cgpdb.ucdavis.edu/database/genome_viewer/viewer/
5
1. Pairwise comparison
(finding recombination/LOD score values for all pairs of markers)
(Tools: custom Python scripts or JoinMap)
Web interface to access
Lettuce genetic maps
http://cgpdb.ucdavis.edu/XLinkage/Genetic_Map_PyMad_Matrix.html
2. Group analysis
(assigning particular markers to specific chromosomes)
(Tools: PHP Dendrogram Viewer)
Map and linkage
group selection
> Decreasing recombination value threshold >
3. Pairwise comparison within each group
(Tools: JoinMap)
4. Mapping
(Tools: JoinMap)
5. Validation of constructed map using
CheckMatrix (custom Python scripts)
http://cgpdb.ucdavis.edu/XLinkage/
Query submission
and data visualization
6. Map data entering into database
Dendrogram Viewer manipulates data in the
database and selects set of markers for further
mapping while maintaining linkage group
designations from earlier maps. Basically it
performs group or clustering analysis.
Zoom-in functionality
into selected region
Detailed information
about selected marker
Lettuce genetic map viewer
is written in PHP and uses GD library.
The viewer interacts with tables in the
relational mySQL database and creates
graphical output dynamically.
CheckMatrix 2D plot:
validation of map quality
(web link to large image)
Displaying and comparing of
several different maps simultaneously
CheckMatrix 2D plot is a set of
Python scripts to visualize
and validate genetic maps.
Required input files:
1. Genetic map
2. Recombination scores
3. Raw marker scores
Output:
1. CheckMatrix
diagonal 2D plot of all markers
versus all markers.
Color gradient reflects linkage /
recombination scores:
Red – strong linkage;
Yellow – weak linkage;
Black – no detectable linkage.
2. Visualization of raw scores
where all markers ordered as
on genetic map.
Markers with high number of
double crossovers are
candidates for re-checking of
raw scores or map position.
Purple–framework
markers
CheckMatrix source code and
detailed description is available at:
http://cgpdb.ucdavis.edu/XLinkage/
Allele composition
Download