SNPs

advertisement
SNP comparisons
• Using positions only
• Comparing Watson, Venter, dbSNP
– Watson more conservative in calls
– I used Venter method 1 calls (placed unambiguously)
• Venter Indels
• Ensembl made their own SNP calls for Watson and
Venter
SNPs
dbSNP 12,350,000
Venter 3,325,000
Watson 2,060,000
1,032,000
1,817,000
233,000
476,000
Venter and Watson
each have 3 variants
that are in LSDBs.
210,000
10, 000
SNP density (SNPs/kb)
1.40
1.20
1.00
0.80
Watson
Venter
0.60
0.40
0.20
0.00
total
intergenic
genes
coding exons
conserved tfbs
ORegAnno
Ultra
conserved
PRP
ORegAnno (regions from 1 bp to over 5000 bps)
– 5,391 (Venter 7,253) SNPs that overlap ORegAnno
– 3,519 (3,956) ORegAnno regions overlap the SNPs
• 2,571 ORegAnno regions are the same
– 27 (36) are in 1 bp regions
• All are also in dbSNP
• 11 are same
Venter’s Indels
• 486,598 non-dbSNP variants
– 280,722 SNPs
– 205,875 indels
• Only 94 of these are in coding exons
– 65 are frame shifts
» Found GO terms for 41 entries.
» 37 cellular
» 33 function
» 34 process
» GO:0005515 {14} protein binding (Molecular function)
» GO:0016020 {13} membrane (Cellular component)
» GO:0005634 {13} nucleus (Cellular component)
» GO:0016021 {9} integral to membrane (Cellular component)
» GO:0005622 {7} intracellular (Cellular component)
» GO:0004872 {7} receptor activity (Molecular function)
» GO:0005509 {6} calcium ion binding (Molecular function)
Ensembl
• Ensembl made their own calls on the SNPs
from Venter and Watson’s sequences
– Need Ensembl 49 for accurate Venter SNPs
Ensembl Venter and Venter method1
2,602,178
95% in dbSNP
766,195
64% in dbSNP
723,352
56% in dbSNP
dbSNP 128 chromosome reports
Ensembl Watson and Watson
1,602,563
100% in dbSNP
798,567
100% in dbSNP
457,981
53% in dbSNP
dbSNP 128 chromosome reports
SNP density with Ensembl calls
1.40
1.20
1.00
0.80
Watson
Watson Ensembl
Venter
0.60
Venter Ensembl
0.40
0.20
0.00
total
intergenic
genes
coding exons
conserved
tfbs
ORegAnno
Ultra
conserved
PRP
Percent of SNPs by location
1.20
1.00
0.80
dbSNP
Watson
0.60
Watson Ensembl
Venter
Venter Ensembl
0.40
0.20
0.00
coding exons
conserved tfbs
ORegAnno
Ultra conserved
PRP
SNPs in Ultra Conserved Regions
• dbSNP 128 has 168
• Watson and Venter have 25
– 16 of which are in dbSNP
– 9 new (none from Ensembl)
•
•
•
•
•
•
•
•
•
chr1
chr1
chr1
chr3
chr6
chr9
chr11
chr15
chr18
115081716 W
115081717 W
50872068 W
153647146 W
163911701 V
139162434 W
8274734 W
65665291 V
21119518 W
SNPs in Ultra conserved regions
• 3 % of Ultra conserved regions have either Watson
or Venter SNPs
• 25 % of Ultra conserved regions have SNPs from
dbSNP 128 (168 SNPs), most have 1 or 2
– With a maximum of 12 SNPs in one region
– A runner up of 5 SNPs in one region
UC with 12 SNPs
What’s in Ultra Conserved regions?
• Total 481 region
– 480 have conserved TFBS
– 168 have Vista Enhancers from LBNL
– 72 have coding exons (only 19 with SNPs)
– 124 have SNPs (dbSNP 128)
• 2 more interesting UCs
– UC without a conserved TFBS
– UC with TFBS, SNP, Coding, and Repeat masker
UC without conserved TFBS
UC with TFBS, SNP, RM, Coding
SNPs in Conserved TFBS
• About 3% of the 3.8 million binding sites have
SNPs in dbSNP 128
• About 3% of the .8 million binding sites with a
z score >= 2.33 have SNPs in dbSNP 128
SNP density
This is
computed on
a 10kb
window. The
weighted
average is
computed for
each window.
SNP density with Ensembl
SNPs in ORegAnno regions
SNPs in ORegAnno regions
SNPs in ORegAnno regions
Of the 9,427 ORegAnno regions (excluding TFBS), 1,603
of them have no SNPs from dbSNP 126 (from UCSC).
SNPs in ORegAnno regions
SNPs per exon
Venter SNPs
Percentages close in both
Venter’s set and Ensembl’s.
93 % none
5% 1 SNP
2% >1 SNP
The End
SNP locations nt counts
• Watson 2,060,544
• Venter method 1 3,325,530
SNP density
Min and Max from last slide, plus dbSNP 126
5.00
4.50
4.00
3.50
3.00
2.50
SNP density min
2.00
SNP density max
1.50
dbSNP 126
1.00
0.50
0.00
total
intergenic
genes
coding exons
conserved
tfbs
ORegAnno
Ultra
conserved
PRP
Percent of SNPs by location
100%
90%
37
80%
42
41
39
40
70%
60%
genes
50%
intergenic
40%
63
30%
58
59
61
60
20%
10%
0%
dbSNP
Watson
Watson Ensembl
Venter
Venter Ensembl
SNP coverage
This is computed on 10kb non-overlapping windows. If there
is 1 SNP in the window it gets a 1 otherwise a zero. The
darker areas indicate sections where it jumps between 0 and
1 more often. A white area with a blue line at the top has at
least 1 SNP per 10kb consistently.
UCSC Genome Browser
Download