High-Throughput Mapping Pipeline Of EST Based Sequences On Lettuce Genome. Genetic Map Validation, Visualization And Web Presentation. 2 Marker (Contig/EST) selection (gene of interest) 1 Alexander Kozik , Steve Edberg , Barnaly Pande , David Caldwell , David Lee , Travis Kleeburg , Fallon Chen , Richard Michelmore Identification of putative polymorphism and oligo design http://cgpdb.ucdavis.edu/ Genome Center, University of California Davis CA 95616 Efficient genetic mapping of large numbers of markers requires the coordination of the efforts of several people each of whom are responsible for different parts of the operation. We have developed a pipeline to map new markers on high density genetic maps. Major goals and features of this pipeline are: A) Minimizing the number of steps from experimental data generation to entry of data into database. B) Automatic error checking and error handling of spreadsheets that contain marker descriptions and raw scores, prior to uploading into the database. C) Simplified and controlled data flow to and from database for mapping procedures. This pipeline includes: i) Python Contig Viewer for semiautomatic search of EST candidates with putative polymorphisms and automatic design oligonucleotide primers for selected sequences relative to potential intron positions. ii) Scripts that provide controlled dataflow from spreadsheets into our relational mySQL database. iii) Web interface (Dendrogram Viewer) to manipulate data in database and pre-select sets of markers for further mapping while maintaining linkage group designations from earlier maps. iv) Visualization tools (CheckMatrix) to perform quality control and validate maps. v) Web interface to display results of mapping graphically. Our database (http://cgpdb.ucdavis.edu/) provides functionality to compare several genetic maps simultaneously as well as the raw data that were used to construct the genetic maps. These scripts are publicly available. MAPPING USING JOINMAP AND CUSTOM PROGRAMS 4 Python ContigViewer http://cgpdb.ucdavis.edu/SNP_Discovery/Py_ContigViewer/ 3 Entering raw scores data and marker information into database MS Excel templates with defined fields. Error checking using custom Python scripts. Validation of polymorphism and scoring (genotyping) using RILs Wet lab experiment 6 http://cgpdb.ucdavis.edu/database/genome_viewer/viewer/ 5 1. Pairwise comparison (finding recombination/LOD score values for all pairs of markers) (Tools: custom Python scripts or JoinMap) Web interface to access Lettuce genetic maps http://cgpdb.ucdavis.edu/XLinkage/Genetic_Map_PyMad_Matrix.html 2. Group analysis (assigning particular markers to specific chromosomes) (Tools: PHP Dendrogram Viewer) Map and linkage group selection > Decreasing recombination value threshold > 3. Pairwise comparison within each group (Tools: JoinMap) 4. Mapping (Tools: JoinMap) 5. Validation of constructed map using CheckMatrix (custom Python scripts) http://cgpdb.ucdavis.edu/XLinkage/ Query submission and data visualization 6. Map data entering into database Dendrogram Viewer manipulates data in the database and selects set of markers for further mapping while maintaining linkage group designations from earlier maps. Basically it performs group or clustering analysis. Zoom-in functionality into selected region Detailed information about selected marker Lettuce genetic map viewer is written in PHP and uses GD library. The viewer interacts with tables in the relational mySQL database and creates graphical output dynamically. CheckMatrix 2D plot: validation of map quality (web link to large image) Displaying and comparing of several different maps simultaneously CheckMatrix 2D plot is a set of Python scripts to visualize and validate genetic maps. Required input files: 1. Genetic map 2. Recombination scores 3. Raw marker scores Output: 1. CheckMatrix diagonal 2D plot of all markers versus all markers. Color gradient reflects linkage / recombination scores: Red – strong linkage; Yellow – weak linkage; Black – no detectable linkage. 2. Visualization of raw scores where all markers ordered as on genetic map. Markers with high number of double crossovers are candidates for re-checking of raw scores or map position. Purple–framework markers CheckMatrix source code and detailed description is available at: http://cgpdb.ucdavis.edu/XLinkage/ Allele composition