ppt format - people.vcu.edu - Virginia Commonwealth University

advertisement
Lives of the Scientist
Genetic Basis of Differentiation
Events in time and space . . .
Genetic Basis of Differentiation
Events in time and space . . .
. . . driven by patterned gene expression
Genetic Basis of Differentiation
Events in time and space . . .
. . . driven by patterned gene expression
Genetic Basis of Differentiation
Nostoc
NH3
N2
NH3
Events in time and space . . .
. . . driven by patterned gene expression
Genetic Basis of Differentiation
How?
Environmental Signal
NH3
Histidine Kinase
Developmental Response
Genetic Basis of Differentiation
How?
Environmental Signal
NH3
Histidine Kinase
Developmental Response
Genetic Basis of Differentiation
How?
Environmental Signal
Developmental Response
NH3
histidine
Histidine Kinase
Response Regulator
Genetic Basis of Differentiation
How?
Environmental Signal
Developmental Response
NH3
Histidine Kinase
Response Regulator
NpR3010
???
Genetic Basis of Differentiation
AATAAAGCTTTACAAACCAA
How?
ACTCTGGCTTCAATTGTGTAA
Environmental Signal
Developmental Response
CCCAAGCTTTGATTCTTTCCT
NH3
CTGTTAAATCGGATTGATTAT
CTTCATCAAGGGCAAGACCT
ACAAATTTACCATCACGAAC
Histidine Kinase
Response Regulator
AGCTTTAGACTCACTGAATT
NpR3010
???
CATAACCTTCTGTAGGCCAA
TAGCCAACTGTTTCACCACC
Genes Functionally Related to His Kinase
Histidine Kinase
Nostoc punctiforme
NpR3010
Anabaena PCC 7120
Trichodesmium
Synechocystis PCC 6803
Find similar genes
. . . (13 total)
Conserved
Blast
>npun_22dec03_Contig1_revised_geneNpR3010
MWHIQDSIITLSNHNQYLTFYKNQVKNPERFCRNVNQFDSQIDFVSCDIL
ELKDGRFFEQYSKPLRLAEEIIGTVWSFRDITESQQAKEENRRIIQQEKQ
LAEDRAYFTSMIFHEFRNPLNIISYSTSLLKRHSHHWSEEKKLQCLQNLQ
TAVEQINQFTDEVLIIESVEAGKLQYELKPIDLNLFCREVLAEMSLYTKG
ASQFLLFQNK*
MWHIQDSIITLSNHNQYLTFYKNQVKNPERFCRNVNQFDSQIDFVSCDIL
ELKDGRFFEQYSKPLRLAEEIIGTVWSFRDITESQQAKEENRRIIQQEKQ
LAEDRAYFTSMIFHEFRNPLNIISYSTSLLKRHSHHWSEEKKLQCLQNLQ
TAVEQINQFTDEVLIIESVEAGKLQYELKPIDLNLFCREVLAEMSLYTKG
ASQFLLFQNK
>npun_22dec03_Contig1_revised_geneNpR3008
LSPYLEACCLRISASVSYQRAAEDIEYLTGVEVSKSVQQRLVHRQNFELP
QVESTVEELSVDGGNIRIRTIKGQVCDWKGYKATCLHEKQAIAASFQENS
LVIDWVKSQSIAPILTCLGDGHDGIWNIVRDFAPEHQRREVLDWFHLMEN
LHKIGGSNQRLNQAKILLWQGKVDDAIAVFADCQLKQAFNFCTYLEKHRH
RIVNYQYYQAEQICSIGSGAIESTVKQIDRRTKISGAQWKSDNVPQVLAQ
RQSLSQWINLCSLNKNWDAPMKSSVERLSDYPVAR*
A new family of proteins?!
A type of transposase?
TRANSPOSON
transposase
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA...
...TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
A new family of proteins?!
A type of transposase?
TRANSPOSON
transposase
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA...
...TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
A new family of proteins?!
A type of transposase?
TRANSPOSON
transposase
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA...
...TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
A new family of proteins?!
A type of transposase?
TRANSPOSON
transposase
Is Npr3008 a
transposase?
AATAAAGCTTTACAAA
CCAAACTCTGGCTTCA
ATTGTGTAACCCAAGC
TTTGATTCTTTCCTCTG
TTAAATCGGATTGATT
ATCTTCATCAAGGGCA
AGACCTACAAATTTAC
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Filters: Information reducers
Squirrel filter
Filters: Information reducers
Molecular filter
Filters: Information reducers
Sequence filter
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TATGAGGCAA
CTCGGGAGCG
CCTTTAGATG
AGGCCGGAGG
CCCCGGCCTA
TTCCCTGGGC
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
TCACAGCATC
CACGGCTCTA
CAAGAAGGAG
GTCAAGAACT
AGGCTGCCTG
TCGGCGGGAC
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
AGGTGACCTT
AAGAGGCCCA
GAAACAGCTC
CTCCACCGGC
TGCTATAAAT
AGATAACATG
CTAGTTCTTG
TTATCTGTTT
CACTAGTTTC
TTAGATAAAC
CTCCACGCCC
ATATTAAAAA
AATTAGCAAA
CATTCTAGGG
AAACAAGCTA
ATTTCCTGGG
AGCCAAGGAC
TGACAGACAG
ATTGAACCCT
AGTGCAGACA
AGAAATGAGA
AGTATCTATT
TATCCAGGCA
GAAATCCCTG
GGCAGCGGCC
ACGCGGCCCA
AATGTGCCCT
CTCCGTAAAC CTCTAAC...
How do Biologists use Bioinformation?
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
What genes are in my organism?
Gene finder
Interpolated
Markov model
Candidate genes
Predicted genes
How do Biologists use Bioinformation?
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
What genes are in my organism?
Gene finder
Interpolated
Markov model
Challenge
accepted
beliefs
Candidate genes
Conform to
standard model
Predicted genes
How do Biologists use Bioinformation?
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
What genes are in my organism?
Gene finder
Interpolated
Markov model
Conform to
standard model
Candidate genes
Predicted genes
How do Biologists use Bioinformation?
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
What genes are in my organism?
Gene finder
Interpolated
Markov model
Challenge
accepted
beliefs
Candidate genes
Conform to
standard model
Predicted genes
Filters are powerful
Highly filtered output
• Easy to grasp
• High-level insights
Filters Constrain New Discovery
Highly filtered output
• Easy to grasp
• High-level insights
Unfiltered output
• Confusing
• Basic insights
Filters are tempting
Filters are tempting
The Death of Science
Current State of Affairs
1. Need high-level filters
Current State of Affairs
1. Need high-level filters
2. Need access to raw phenomena
AATAAAGCTTTACAAACCAAACTCTGGCTTCA
TTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTT
AAATCGGATTGATTATCTTCATCAAGGGCAAG
CCTACAAATTTACCATCACGAACAGCTTTAGA
TCACTGAATTCATAACCTTCTGTAGGCCAATAG
CCAACTGTTTCACCACCATTTTCTGAAATTTTTT
CCTCTAGAATACCGCAACACTATCACCACCAA
ACTCCTTCTGAATTATTTCTGATTCAGTTTGGGT
ATTGCCTGTTTGAGTACCAAAAAATAAACCAA
Current State of Affairs
1. Need high-level filters
2. Need access to raw phenomena
3. Need ability to build new tools
ASSIGN K12-set FROM Gene-finder (K12-DNA)
ASSIGN O157-set FROM Gene-finder (O157-DNA)
CONSIDER EACH protein IN O157-set
WHEN Constituent-of (K12-set, protein) = FALSE
COLLECT protein
We need…
Biologists . . .
. . . and Programmers
Current State of Affairs
1. Need high-level filters
2. Need access to raw phenomena
3. Need ability to build new tools
Need biologist programmers
AATAAAGCTTTACAAACCAAA
CTCTGGCTTCAATTGTGTAACC
CAAGCTTTGATTCTTTCCTCTG
TTAAATCGGATTGATTATCTTC
ATCAAGGGCAAGACCTACAAA
TTTACCATCACGAACAGCTTTG
ARYGACTCACTGAATTCLARAT
AACCTTCTGTAGGCCASONATA
GCCAACTGTTTCACCACCATTT
TATTCAAAATGAATTATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA
ATATCTGCAACTTTAAACCTGAATGATATTATTATTGGCGGGCCTCCATGCCAG
GGATTTAGTATTGCTGGGCCAGCCCAAAEALAVGIASTCCTAAAGATCCTAGAAATG
GTTTAGAATTTTCATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC
ATGGAAAACGTGAATTCAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG
ATATTATTAAGAAAACATTTGGAATTCGAGAACTTGGTTATTTTGTCGAAGTATGGGTTTTAAATGCTG
CGGAATATGGCATTCCGCAAATTAGAGAACGGAATTCGATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT
AGGTATTCCTAAAAAAACACATTCTCTGCAATTTTTAAGAATTCGATTTAAATAGGTCTCAATTATCGATCTTCGATGAT
ATGAGTATTATACCTGCACTAACTTTGTGGGACGCAATATCAGACTTACGAATTCGACAGAACTTAATGCGCGTGAAGGAAGTGAA
GAGCAACCCTATCATTTAAAACCTCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGGAATTCGATACGCTTTACAATCATGTTGCAAT
GGAACATTCTGACCGTTTAGTAGAACGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAAGAATTCGACATGGAGCTAGACGACGT
AGTGGTAATGGTGAATTATCAAACAAATCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGGAATTCGAATTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG
TCCATCCTTTTCAACATCGAAATTTAACAGCCCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAGAATTCGAATTCAAACTGTCGTATCTCATAAACTATTGCATCGA
GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATATAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGGAATTCGAATTCAGTTATGCCAACAACTGATAGAAATCCTCTA
GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTACAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGAATTCGAATTCGAGTTGGACCAAAATCAGAAATTACTGACCA
AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAAGATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCGAATTCGAATTCGAATTCATAATACGAGTCATAACGGCATATATG
GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTAACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTGAATTCGAATTCGAATTCGAACAACTTTTTCCAGTAATT CTGGAC
GCTCTTCTAACAGTTCCATCAAAGTATAGGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCACTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAATTCGAATTCGAATTCGAATTCGAATTCGA
CAACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGTCATTAAGG TCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAA
Why hasn’t this happened?
Part of bioinformatic program written in C
if (pcInFile == NULL)
pfInFile = stdin;
else
pfInFile = fopen(pcInFile, "r");
pfOutFile = fopen( pcOutFile, "w" );
if (pfInFile
== NULL) { fprintf( stderr, "ERROR opening %s\n", pcInFile );
exit(1); }
if (pfOutFile == NULL) { fprintf( stderr, "ERROR opening %s\n", pcOutFile ); exit(1); }
fputc( fgetc(pfInFile), pfOutFile );
/* deal with first '>' in file */
for ( ; ; )
{
if (processIdentifier( pfInFile, pfOutFile )) {
}
else
{ break; }
if (processSequence( pfInFile, pfOutFile ))
{
else
{ break; }
}
fclose( pfInFile
);
fclose( pfOutFile );
}
Why hasn’t this happened?
Part of bioinformatic program written in Perl
sub match_positions {
my $pattern;
local $_;
($pattern, $_) = @_;
my @results;
local $matchStart;
my $instrumentedPattern = qr/(?{ $matchStart = pos() })$pattern/;
while (/$instrumentedPattern/g) {
my $nextStart = pos();
push @results, "[$matchStart..$nextStart)";
pos() = $matchStart+1;
}
return @results;
Why hasn’t this happened?
Biologists will not come to programming
Programming must come to biologists
BioLingua
Genetic Basis of Differentiation
Environmental Signal
Developmental Response
NH3
P
Histidine Kinase
Response Regulator
NpR3010
???
Genetic Basis of Differentiation
NpR3010
RR
HK-upstream HK HK-downstream
Genetic Basis of Differentiation
NpR3010
RR
HK-upstream
HK HK-downstream
BioLingua
<1>> (GENES-DESCRIBED-BY "response regulator" IN Npun)
:: (#$Npun.NpF0304 #$Npun.NpR0355 #$Npun.NpR0450 #$Npun.NpF0484
#$Npun.NpR0589 #$Npun.NpF0832 #$Npun.NpF0906 #$Npun.NpR0956
#$Npun.NpF1084 #$Npun.NpF1085 #$Npun.NpR1109 #$Npun.NpF1184
#$Npun.NpF1278 #$Npun.NpR1450 #$Npun.NpF1453 #$Npun.NpF1516
#$Npun.NpR1633 #$Npun.NpR1678 #$Npun.NpR1683 #$Npun.NpR1688
#$Npun.NpF1776 #$Npun.NpR1779 #$Npun.NpF1800 #$Npun.NpR1903
#$Npun.NpR2091 #$Npun.NpF2162 #$Npun.NpR2263 #$Npun.NpF2346
#$Npun.NpF2364 #$Npun.NpR2420 #$Npun.NpR2902 #$Npun.NpF2972
#$Npun.NpR3053 #$Npun.NpF3084 #$Npun.NpR3197 #$Npun.NpR3241
#$Npun.NpF3659 #$Npun.NpF3676 #$Npun.NpR3733 #$Npun.NpF3829
#$Npun.NpR3907 #$Npun.NpR3959 #$Npun.NpF3972 #$Npun.NpR4101
#$Npun.NpR4160 #$Npun.NpR4165 #$Npun.NpF4214 #$Npun.NpR4435
#$Npun.NpF4460 #$Npun.NpR4503 #$Npun.NpR4743 #$Npun.NpR4768
#$Npun.NpF4909 #$Npun.NpR5015 #$Npun.NpF5034 #$Npun.NpF5044
#$Npun.NpR5135 #$Npun.NpR5136 #$Npun.NpR5316 #$Npun.NpF5361
#$Npun.NpF5636 #$Npun.NpF5682 #$Npun.NpF5759 #$Npun.NpF5763
#$Npun.NpF5788 #$Npun.NpR6014 #$Npun.NpR6015 #$Npun.NpR6228
#$Npun.NpF6321 #$Npun.NpR6360 #$Npun.NpF6363 #$Npun.pNpAF075
#$Npun.pNpBR039 #$Npun.pNpBF139 #$Npun.pNpBF146 #$Npun.pNpBR169
#$Npun.pNpBR170 #$Npun.pNpBF205 #$Npun.pNpEF003)
<2>> (GENE-UPSTREAM-OF NpF0304)
BioLingua
<2>> (GENE-UPSTREAM-OF NpF0304)
:: #$Npun.NpF0303
<3>> (GENES-UPSTREAM-OF (RESULT 1))
:: (#$Npun.NpF0303 #$Npun.NpF0356 #$Npun.NpF0451 #$Npun.NpF0483
#$Npun.NpR0590 #$Npun.NpF0831 #$Npun.NpF0905 #$Npun.NpF0957
#$Npun.NpR1083 #$Npun.NpF1084 #$Npun.NpR1110 #$Npun.NpF1183
#$Npun.NpF1277 #$Npun.NpR1451 #$Npun.NpR1452 #$Npun.NpR1515
#$Npun.NpF1634 #$Npun.NpR1679 #$Npun.NpF1684 #$Npun.NpR1689
#$Npun.NpF1775 #$Npun.NpF1780 #$Npun.NpF1799 #$Npun.NpR1904
#$Npun.NpR2092 #$Npun.NpF2161 #$Npun.NpR2264 #$Npun.NpR2345
#$Npun.NpF2363 #$Npun.NpR2421 #$Npun.NpR2903 #$Npun.NpR2971
#$Npun.NpR3054 #$Npun.NpR3083 #$Npun.NpR3198 #$Npun.NpF3242
#$Npun.NpR3658 #$Npun.NpF3675 #$Npun.NpR3734 #$Npun.NpR3828
#$Npun.NpF3908 #$Npun.NpR3960 #$Npun.NpF3971 #$Npun.NpF4102
#$Npun.NpR4161 #$Npun.NpF4166 #$Npun.NpR4213 #$Npun.NpR4436
#$Npun.NpF4459 #$Npun.NpR4504 #$Npun.NpR4744 #$Npun.NpR4769
#$Npun.NpR4908 #$Npun.NpF5016 #$Npun.NpF5033 #$Npun.NpF5043
#$Npun.NpR5136 #$Npun.NpF5137 #$Npun.NpF5317 #$Npun.NpF5360
#$Npun.NpR5635 #$Npun.NpF5681 #$Npun.NpF5758 #$Npun.NpR5762
#$Npun.NpR5787 #$Npun.NpR6015 #$Npun.NpR6016 #$Npun.NpR6229
#$Npun.NpR6320 #$Npun.NpF6361 #$Npun.NpF6362 #$Npun.pNpAF074
#$Npun.pNpBR040 #$Npun.pNpBF138 #$Npun.pNpBF145 #$Npun.pNpBR170
#$Npun.pNpBR171 #$Npun.pNpBR204 #$Npun.pNpER002)
<4>> (DESCRIPTIONS-OF *)
BioLingua
<4>> DESCRIPTIONS-OF *)
:: ("two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25531611|p
"unknown protein [Nostoc sp. PCC 7120] gi|25534386|pir||AH1981 hypothetical p
"tmRNA-binding protein [Nostoc sp. PCC 7120] gi|22096164|sp|Q8YM70|SSRP_ANASP
"GTP-binding protein era homolog"
"unknown protein [Nostoc sp. PCC 7120] gi|25533156|pir||AF2229 hypothetical p
"ORF_ID:tlr0160~similar to ferredoxin [Thermosynechococcus elongatus BP-1]
"hypothetical protein [Nostoc sp. PCC 7120] gi|25367067|pir||AH2295 hypotheti
"two-component hybrid sensor and regulator [Nostoc sp. PCC 7120] gi|25532444|
"hypothetical protein [Nostoc sp. PCC 7120] gi|25358966|pir||AG2158 hypotheti
"two-component response regulator [Nostoc sp. PCC 7120] gi|25533086|pir||AF21
"probable two-component sensor histidine kinase [Gloeobacter violaceus] gi|35
"phytochrome-like protein [Tolypothrix sp. PCC 7601]"
"two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25530471|pir|
NIL NIL NIL
"hypothetical protein [Nostoc sp. PCC 7120] gi|25535333|pir||AI2179 hypotheti
NIL
"unknown protein [Nostoc sp. PCC 7120] gi|25535440|pir||AI2275 hypothetical p
"transcriptional regulator [Nostoc sp. PCC 7120] gi|25302898|pir||AB2544 tran
"similar to two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25
"putative gluconolactonase precursor [Sinorhizobium meliloti] gi|25369832|pir
"similar to two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25
"hypothetical protein [Nostoc sp. PCC 7120] gi|25530521|pir||AC1903 hypotheti
. . .
BioLingua
<5>> (DEFINE RR-class AS
(GENES-DESCRIBED-BY "response regulator" IN Npun)
DISPLAY off)
:: "List of length 79 suppressed"
<6>> (DEFINE HK-class AS
(GENES-DESCRIBED-BY “histidine kinase" IN Npun)
DISPLAY off)
:: "List of length 89 suppressed"
<7>> (DEFINE HK-upstream AS
(GENES-UPSTREAM-OF HK-class) DISPLAY off)
:: "List of length 89 suppressed"
<8>> (DEFINE HK-downstream AS
(GENES-DOWNSTREAM-OF HK-class) DISPLAY off)
:: "List of length 89 suppressed"
<9>> (DEFINE HK-adjacent AS
(UNION-OF (HK-upstream HK-downstream)) DISPLAY off)
:: "List of length 178 suppressed"
<10>>(INTERSECTION-OF (HK-adjacent RR-class))
BioLingua
<10>> (INTERSECTION-OF (HK-adjacent RR-class))
::
22 elements in INTERSECTION
> (#$Npun.pNpBF205 #$Npun.pNpBF139 #$Npun.NpR6228 #$Npun.NpR5316
#$Npun.NpF4214 #$Npun.NpF3676 #$Npun.NpF3084 #$Npun.NpR3053
#$Npun.NpR1779 #$Npun.NpR0589 #$Npun.NpF0304 #$Npun.NpR1109
#$Npun.NpF1278 #$Npun.NpF1776 #$Npun.NpF1800 #$Npun.NpR2420
#$Npun.NpR2902 #$Npun.NpR3197 #$Npun.NpR4503 #$Npun.NpF5763
#$Npun.NpF6363 #$Npun.pNpBF146)
<11>>(DEFINE RR-candidates AS (SET-DIFFERENCE RR-class (RESULT 10))
DISPLAY off)
:: "List of length 57 suppressed"
<12>>
Genes Functionally Related to His Kinase
Histidine Kinase
Nostoc punctiforme
NpR3010
Anabaena PCC 7120
Trichodesmium
Synechocystis PCC 6803
Find similar genes
. . . (13 total)
Conserved
BioLingua
<10>> (INTERSECTION-OF (RR-adjacent HK-class))
::
24 elements in INTERSECTION
> (#$Npun.pNpBF205 #$Npun.pNpBF139 #$Npun.NpR6228 #$Npun.NpR5316
#$Npun.NpF4214 #$Npun.NpF3676 #$Npun.NpF3084 #$Npun.NpR3053
#$Npun.NpR1779 #$Npun.NpR0589 #$Npun.NpF0304 #$Npun.NpR1109
#$Npun.NpF1278 #$Npun.NpF1776 #$Npun.NpF1800 #$Npun.NpR2420
#$Npun.NpR2902 #$Npun.NpR3197 #$Npun.NpR4503 #$Npun.NpF5763
#$Npun.NpF6363 #$Npun.pNpBF146)
<11>>(DEFINE RR-candidates AS (SET-DIFFERENCE RR-class (RESULT 10))
DISPLAY off)
:: "List of length 57 suppressed"
<12>>(CONTEXT-OF NpF0304)
::
(<- #$Npun.NpR0302 potassium-dependent ATPase sub) 523
(-> #$Npun.NpF0303 two-component sensor histidine) 85
(-> #$Npun.NpF0304 two-component response regulat) 473
(-> #$Npun.NpF0305 hypothetical protein glr0895 [) 85
(<- #$Npun.NpR0306 primosomal protein N' [Nostoc )
> (#$Npun.NpR0302 #$Npun.NpF0303 #$Npun.NpF0304 #$Npun.NpF0305
#$Npun.NpR0306)
<13>>(ALL-ORTHOLOGS-OF *)
BioLingua
<12>>(CONTEXT-OF NpF0304)
::
(<- #$Npun.NpR0302 potassium-dependent ATPase sub) 523
(-> #$Npun.NpF0303 two-component sensor histidine) 85
(-> #$Npun.NpF0304 two-component response regulat) 473
(-> #$Npun.NpF0305 hypothetical protein glr0895 [) 85
(<- #$Npun.NpR0306 primosomal protein N' [Nostoc )
> (#$Npun.NpR0302 #$Npun.NpF0303 #$Npun.NpF0304 #$Npun.NpF0305
#$Npun.NpR0306)
<13>> (ALL-ORTHOLOGS-OF *)
:: ((#$S7942.sef0159 #$Npun.NpR0302 #$Gvi.glr0573 #$A29413.Av?3368
#$A7120.all3154)
(#$S6803.sll1590 #$Npun.NpF0303 #$Gvi.gll0572 #$A29413.Av?1247
#$A7120.alr3155)
(#$S6803.sll1592 #$P9313.PMT1405 #$Npun.NpF0304 #$Gvi.gll0571
#$A29413.Av?1248 #$A7120.alr3156)
(#$Tery.Te?7017 #$Npun.NpF0305 #$Cwat.Cw?3050)
(#$Tery.Te?2243 #$TeBP1.tll0415 #$S6803.sll0270 #$S8102.SynW1782
#$S7942.sef1895 #$PRO1375.Pro0497 #$P9313.PMT1271 #$PMED4.PMM0497
#$Npun.NpR0306 #$Gvi.gll0025 #$Cwat.Cw?3016 #$A29413.Av?5206
#$A7120.all4248))
<14>>
A new family of proteins?!
A type of transposase?
TRANSPOSON
transposase
Is Npr3008 a
transposase?
BioLingua
<14>>(DEFINE extended-NpR3008 AS
(SEQUENCE-OF NpR3008 FROM -700 TO-END +700)
DISPLAY off)
:: “Results suppressed"
<15>> (BLAST extended-NpR3008 Npun)
::
Query Q-Start Q-End
Subject
S-Start
S-End
1. "Seq 1"
1
2258
#$Npun.chromosome
3706846 3704589
2. "Seq 1"
293
1511
#$Npun.chromosome
4008429 4009647
3. "Seq 1"
293
1512
#$Npun.chromosome
7932036 7930817
4. "Seq 1"
293
1510
#$Npun.chromosome
4228111 4229328
5. "Seq 1"
293
1510
#$Npun.chromosome
3971285 3972502
6. "Seq 1"
293
1510
#$Npun.chromosome
4027833 4029050
7. "Seq 1"
293
1511
#$Npun.chromosome
2121987 2123204
8. "Seq 1"
293
1510
#$Npun.chromosome
2136737 2135521
9. "Seq 1"
397
1510
#$Npun.chromosome
2030748 2031861
10. "Seq 1" 1537
2258
#$Npun.pNpB
42015
42737
11. "Seq 1" 1331
1420
#$Npun.chromosome
8036134 8036045
12. "Seq 1" 1319
1385
#$Npun.chromosome
5915424 5915358
13. "Seq 1" 1319
1385
#$Npun.chromosome
2577387 2577453
> (#$Temp27 #$Temp28 #$Temp29 #$Temp30 #$Temp31 #$Temp32 #$Temp33
#$Temp34 #$Temp35 #$Temp36 #$Temp37 #$Temp38 #$Temp39)
<16>>
E-value
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
4.6d-83
1.8d-8
2.7d-4
2.7d-4
BioLingua
<14>>(DEFINE extended-NpR3008 AS
(SEQUENCE-OF NpR3008 FROM -700 TO-END +700)
DISPLAY off)
:: “Results suppressed"
<15>> (BLAST extended-NpR3008 Npun)
::
Query Q-Start Q-End
Subject
1. "Seq 1"
1
2258
#$Npun.chromosome
2. "Seq 1"
293
1511
#$Npun.chromosome
. . .
S-Start
3706846
4008429
S-End
3704589
4009647
<16>> (FOR-EACH hit IN *
AS (subj S-start)
= (GET-ELEMENTS (subject Subject-start) FROM hit)
AS start = (- S-start 15)
AS end = (+ S-start 40)
AS left-end = (SEQUENCE-OF subj FROM start TO end)
COLLECT left-end)
E-value
0.0
0.0
BioLingua
<14>>(DEFINE extended-NpR3008 AS
(SEQUENCE-OF NpR3008 FROM -700 TO-END +700)
DISPLAY off)
:: “Results suppressed"
<15>> (BLAST extended-NpR3008 Npun)
::
Query Q-Start Q-End
Subject
1. "Seq 1"
1
2258
#$Npun.chromosome
2. "Seq 1"
293
1511
#$Npun.chromosome
. . .
S-Start
3706846
4008429
S-End
3704589
4009647
<16>> (FOR-EACH hit IN *
AS (subj S-start)
= (GET-ELEMENTS (subject Subject-start) FROM hit)
AS start = (- S-start 15)
AS end = (+ S-start 40)
AS left-end = (SEQUENCE-OF subj FROM start TO end)
COLLECT left-end)
::
> ("TACGCTCTATCTTCAGCAAGTTGTTTTTCTTGCTGTATAATTCGGCGATTCTCTTC"
"AAAGAAACGCTAGAGGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA"
"AAACTGGGATGCACCCCTTATTAATGCTCTTTGGAGTCAATACTAATTTTGCCAAA"
"TACCTTTGTGATAGGGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA"
"AAATTAGTTTATTATGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA"
"CACCGATTCACTAATGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA"
"ACTATTGTAGAGACTGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA"
. . .
E-value
0.0
0.0
BioLingua
<17>>(ALIGNMENT-OF * LINE-LENGTH 60 SEGMENT-LENGTH 60)
::
Seq 4
1 TACCTTTGT-GATAGGGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--Seq 7
1 -ACTATTGTAGAGACTGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--Seq 2
1 -AAAGAAACGCTAGAGGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--Seq 5
1 AAATTAGTTTATTA-TGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--Seq 6
1 -CACCGATTCACTAATGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--Seq 8
1 ----------AAACTGGGATGCA-CCCAGTCTCTACAATAGTTCTAGA-GAACACATAACGT
Seq 3
1 ----------AAACTGGGATGCACCCC--TTATTAATGCTCTTTGGAGTCAATAC-TAATTT
Seq 9
1 -----------CATTGTCGCCCCTTGAAGTCATCAAGAC-----TAGGTGTATCAATGACTC
Seq 12
1 ------------------GTTCAGCTTGGTAATAGCTGTAGTTAATAATGCGAGAGCGATGT
Seq 1
1 ---------TACGCTCTATCTTCAGCAAGTTGTTTTTCT--TGCTGTATAATTCGGCGATTC
Seq 10
1 --------------GGTCGGGAAATTGCGAGATTATTCAGTGGCGAAGTAGTGGGAGAACTA
Seq 11
1 ------------TTGAACAAATTTGTTCGTGGAAATGGTAATTGGAAATTTGCTGCGGAATG
Seq 13
1 ------------ATTATTAACTACAGCTATTACCAAGCTGAACAACTGTGTTCTATTGGTTC
consensus
1
Genetic Basis of Differentiation
Nostoc
NH3
+ Anabaena
N2
NH3
Not Synechocystis, Trichodesmium,…
BioLingua
<18>>(DEFINE diff-cb AS (Npun Avar A7120) DISPLAY off)
:: "List of length 3 suppressed"
<19>>(DEFINE non-diff-cb AS
(REMOVE-FROM-SET *loaded-organisms* diff-cb) DISPLAY off)
:: "List of length 10 suppressed"
<20>>(DEFINE diff-cb-specific AS
(COMMON-ORTHOLOGS-OF diff-cb NOT-IN non-diff-cb) DISPLAY off)
:: "List of length 661 suppressed"
BioLingua
• Provides knowledge in accessible form
• Provides tools accessed in common way
• Provides results that can be manipulated
• Provides a programming language that speaks
to biologists
The Death of Science
Credits
West Coast
VCU
- Jeff Shrager
- JP Massar
- Mike Travers
- Austin Hess
- James Mastros
- Sarah Cousins
- Yue Zhao
BioLingua: http://ramsites.net/~biolingua/help
Jeff Elhai:
Center for the Study of Biological Complexity
Virginia Commonwealth University
Phone: 828-0794
E-mail: ElhaiJ@VCU.Edu
Download