SNPPEB 1.1 Software Design Document (current document version 1.101) Document update history: version 1.0 Created by Tony on Aug 4, 2004 Description: First draft for general idea version 1.01 Modified by Tony on September 29, 2004 Draw flowchart to show design, and also address the issues in SRS1.01 Version 1.011 Modified by Tony on Oct 1, 2004 Still address to SRS1.01 More detailed module design in section 5. Version 1.012 Modified by Tony on Oct 6, 2004 Still address to SRS1.01 Modify module design in section 5 from version 1.011. Version 1.1 Modified by Tony on Jan 31, 2005 Address to SRS 1.1 Major change: 1. provide service to new “Backman GenomeLab™ SNPstream® Genotyping System” 2. Setup local databases instead of using XML files from NCBI Version 1.101 Modified by Tony on Mar 10, 2005 Still address to SRS 1.1 Redefine DB design 1. Description The requirements in SRS will be fully addressed in this software design document or alternative solution should be given. We will use reference sequence data from NCBI in fasta files and XML file to setup our local dabases. Also, "Primer3" (http://frodo.wi.mit.edu/primer3/primer3_code.html) will be integrated into our application within the useage condition in its copyright document. 2. Function Design In this version of design document, we have a primary design to address the issues in SRS 1.1, and draw a design flowchart. a. Input and criteria Input: List of SNP ids Two locations in a chromosome (Two STS markers?) Criteria: Orientation: Original, all forward, all reverse SNP types: (any combination) 6 types Exclude coding SNPs? Flanking sequence length? Number of SNPs to be separated Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/ b. Query local databases Database name: snppeb i. ER: genome_contig PK accession ctg_id tax_id ctg_length chr chr_from chr_to orientation assembly snp_flanking PK,FK1 PK PK id side fragment snp_info PK seq genome_contig_set PK,FK1 PK accession segment_id ctg_from ctg_to fragment_seq ii. FK1 id tax_id build_create build_update allele_1 allele_1_frq allele_2 allele_2_frq frq_count validated_pop validated_frq validated_clu validated_2h2 validated_hap ctg_accession ctg_chr ctg_loc chr_loc ctg_ori ctg_fxn Table and column definition: Table genome_contig: accession: ctg_id: tax_id: ctg_length: chr: chr_from: chr_to: orient: assembly: accession.version format, example: ‘NT_077402.1’ internal ID, example: ‘CONTIG:77451’ 9606 is Homo sapiens length of contig chromosome. ‘Un’ is not placed on any chromosome chromosome coordinate, reported in 1 base coordinates, starts from 1. 0 means not localized or placed on any chromosome chromosome coordinate, reported in 1 base coordinates. 0 means not localized or placed on any chromosome +, -, 0, where 0 indicates uncertainty in orientation this value is used to associate contigs with a particular assembly (e.g., reference assembly vs alternate assemblies provided by other groups or representing other haplotypes) Table genome_contig_set: accession: accession.version format, example: ‘NT_077402.1’ segment_id: #ctg_from: #ctg_to: seq: this is associated with ctg_from and ctg_to. Let ctg_from ≤ m ≤ ctg_to Segment_id = int((m-1)/200 + 1); contig coordinate, reported in 1 base coordinates, starts from 1. Not added in DB, can be calculated from seqment_id: ctg_from = 200 * (segment_id – 1); contig coordinate, reported in 1 base coordinates. Not put in db. Can be calculated from segment_id and seq: ctg_to = 200 * (segment_id – 1) + seq.length – 1; sequence segment from contig, lower case means repetitive Table snp_info: id: tax_id: build_create: build_update: allele_1, allele_2: allele_1_frq: allele_2_frq: frq_count: validated_pop: validated_frq: validated_clu: validated_2h2: validated_hap: ctg_accession: ctg_chr: ctg_loc: chr_loc: ctg_ori: ctg_fxn: rs# species id, 9606 for human build to create this SNP last build to update this SNP nucleotides in SNP site, 1 and 2 are in alphabet order (example: A C, not C A) average frequency of allele_1 average frequency of allele_2 number of all chromosomes contributing to frequency calculation. T|F, at least one ss in cluster was validated by independent assay T|F, at least one subsnp in cluster has frequency data submitted T|F, cluster has 2+ submissions, with 1+ submission assayed with a non-computational method T|F, all alleles have been observed in 2+ chrosomes T|F, validated by HapMap project mapping contig in accession.version format, example: ‘NT_077402.1’ chromosome of mapping contig snp location mapped to contig snp location mapped to chromosome orientation of snp and flanking sequence to contig functional relationship of SNP to genes at contig location: locus-region |coding |conding-synon |coding-nonsynon | mrnautr |intron |splice-site |reference |exception Table snp_flanking: id: side: fragment: seq: rs# 5|3, 5’ or 3’ side number index of fragment of a flanking sequence in order 5’ side starts from the far end to SNP site, 3’side starts from the immediate neighboring site of SNP fragement of flanking sequence c. Information retrieval SNP Information displayed: Checkbox for further primer design SNP id Allele Allele frequencies Flanking sequences, length and orientation Verification information function class(coding nonsynon, coding synon, ...) location info (chr, contig, ...) Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/ d. Primer design i. Generate text file for autoprimer.com ii. Primer in batch (call primer3) Parameter setup This page will be similar to the primer3 web application to setup parameters to run primer3. The default value will be given according to suggestions from our lab specialists. Display Result This page will display the primers for a list of SNPs. The format will be customized by our lab specialists. 3. Flowchart List of SNP ids Two STS markers Query conditions SNP database (also reference genome database) and STS marker database) Display SNP info and choose SNPs to get primers Parameter setup for primer design Call Prmer3 Display primer design result Generate file for Backman program 4. System Requirement and Running Enviroment Programming tool: Java, PHP, Perl, CGI, BioPerl, XML::Twig Primer design software: Primer3 Running environment: Redhat Enterprise Linux ws3, Dell workstation Precision 670 Database server: MySQL Server: bioinfo.vipbg.vcu.edu/SNPPEB Client: IE, Mozilla, or Netscape browser and internet connection