Software Design Document

advertisement
SNPPEB 1.1
Software Design Document
(current document version 1.101)
Document update history:
version 1.0
Created by Tony on Aug 4, 2004
Description: First draft for general idea
version 1.01
Modified by Tony on September 29, 2004
Draw flowchart to show design, and also address the issues in SRS1.01
Version 1.011
Modified by Tony on Oct 1, 2004
Still address to SRS1.01
More detailed module design in section 5.
Version 1.012
Modified by Tony on Oct 6, 2004
Still address to SRS1.01
Modify module design in section 5 from version 1.011.
Version 1.1
Modified by Tony on Jan 31, 2005
Address to SRS 1.1
Major change:
1. provide service to new “Backman GenomeLab™ SNPstream® Genotyping System”
2. Setup local databases instead of using XML files from NCBI
Version 1.101
Modified by Tony on Mar 10, 2005
Still address to SRS 1.1
Redefine DB design
1. Description
The requirements in SRS will be fully addressed in this software design document or alternative
solution should be given. We will use reference sequence data from NCBI in fasta files and XML
file to setup our local dabases. Also, "Primer3" (http://frodo.wi.mit.edu/primer3/primer3_code.html)
will be integrated into our application within the useage condition in its copyright document.
2. Function Design
In this version of design document, we have a primary design to address the issues in SRS 1.1, and
draw a design flowchart.
a. Input and criteria
Input:
 List of SNP ids
 Two locations in a chromosome (Two STS markers?)
Criteria:
 Orientation: Original, all forward, all reverse
 SNP types: (any combination)
 6 types
 Exclude coding SNPs?
 Flanking sequence length?
 Number of SNPs to be separated
Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/
b. Query local databases
Database name: snppeb
i.
ER:
genome_contig
PK
accession
ctg_id
tax_id
ctg_length
chr
chr_from
chr_to
orientation
assembly
snp_flanking
PK,FK1
PK
PK
id
side
fragment
snp_info
PK
seq
genome_contig_set
PK,FK1
PK
accession
segment_id
ctg_from
ctg_to
fragment_seq
ii.
FK1
id
tax_id
build_create
build_update
allele_1
allele_1_frq
allele_2
allele_2_frq
frq_count
validated_pop
validated_frq
validated_clu
validated_2h2
validated_hap
ctg_accession
ctg_chr
ctg_loc
chr_loc
ctg_ori
ctg_fxn
Table and column definition:
Table genome_contig:
accession:
ctg_id:
tax_id:
ctg_length:
chr:
chr_from:
chr_to:
orient:
assembly:
accession.version format, example: ‘NT_077402.1’
internal ID, example: ‘CONTIG:77451’
9606 is Homo sapiens
length of contig
chromosome. ‘Un’ is not placed on any chromosome
chromosome coordinate, reported in 1 base coordinates, starts
from 1. 0 means not localized or placed on any chromosome
chromosome coordinate, reported in 1 base coordinates. 0 means
not localized or placed on any chromosome
+, -, 0, where 0 indicates uncertainty in orientation
this value is used to associate contigs with a particular
assembly (e.g., reference assembly vs alternate assemblies
provided by other groups or representing other haplotypes)
Table genome_contig_set:
accession:
accession.version format, example: ‘NT_077402.1’
segment_id:
#ctg_from:
#ctg_to:
seq:
this is associated with ctg_from and ctg_to.
Let ctg_from ≤ m ≤ ctg_to
Segment_id = int((m-1)/200 + 1);
contig coordinate, reported in 1 base coordinates, starts from
1. Not added in DB, can be calculated from seqment_id:
ctg_from = 200 * (segment_id – 1);
contig coordinate, reported in 1 base coordinates.
Not put in db.
Can be calculated from segment_id and seq:
ctg_to = 200 * (segment_id – 1) + seq.length – 1;
sequence segment from contig, lower case means repetitive
Table snp_info:
id:
tax_id:
build_create:
build_update:
allele_1, allele_2:
allele_1_frq:
allele_2_frq:
frq_count:
validated_pop:
validated_frq:
validated_clu:
validated_2h2:
validated_hap:
ctg_accession:
ctg_chr:
ctg_loc:
chr_loc:
ctg_ori:
ctg_fxn:
rs#
species id, 9606 for human
build to create this SNP
last build to update this SNP
nucleotides in SNP site, 1 and 2 are in alphabet order
(example: A C, not C A)
average frequency of allele_1
average frequency of allele_2
number of all chromosomes contributing to frequency
calculation.
T|F, at least one ss in cluster was validated by independent
assay
T|F, at least one subsnp in cluster has frequency data
submitted
T|F, cluster has 2+ submissions, with 1+ submission assayed
with a non-computational method
T|F, all alleles have been observed in 2+ chrosomes
T|F, validated by HapMap project
mapping contig in accession.version format, example:
‘NT_077402.1’
chromosome of mapping contig
snp location mapped to contig
snp location mapped to chromosome
orientation of snp and flanking sequence to contig
functional relationship of SNP to genes at contig location:
locus-region |coding |conding-synon |coding-nonsynon | mrnautr |intron |splice-site |reference |exception
Table snp_flanking:
id:
side:
fragment:
seq:
rs#
5|3, 5’ or 3’ side
number index of fragment of a flanking sequence in order
5’ side starts from the far end to SNP site, 3’side starts
from the immediate neighboring site of SNP
fragement of flanking sequence
c. Information retrieval
SNP Information displayed:
 Checkbox for further primer design
 SNP id
 Allele
 Allele frequencies
 Flanking sequences, length and orientation
 Verification information
 function class(coding nonsynon, coding synon, ...)
 location info (chr, contig, ...)
Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/
d. Primer design
i.
Generate text file for autoprimer.com
ii.


Primer in batch (call primer3)
Parameter setup
This page will be similar to the primer3 web application to setup parameters to run
primer3. The default value will be given according to suggestions from our lab
specialists.
Display Result
This page will display the primers for a list of SNPs. The format will be customized
by our lab specialists.
3. Flowchart
List of SNP ids
Two STS markers
Query conditions
SNP database
(also reference
genome database)
and STS marker
database)
Display SNP info
and choose SNPs
to get primers
Parameter setup
for primer design
Call Prmer3
Display primer
design result
Generate file for
Backman program
4. System Requirement and Running Enviroment
Programming tool: Java, PHP, Perl, CGI, BioPerl, XML::Twig
Primer design software: Primer3
Running environment: Redhat Enterprise Linux ws3, Dell workstation Precision 670
Database server: MySQL
Server: bioinfo.vipbg.vcu.edu/SNPPEB
Client: IE, Mozilla, or Netscape browser and internet connection
Download