Protein sequence - Purdue Genomics Wiki

advertisement
Seq5part2(50244-10000)
Yichun Qian
The genes predicted by FGENESH are better. But I also attached the genes predicted by Augustus
for comparison.
Gene Prediction
FGENESH
FGENESH predicted 7 genes. The predicted genes were then translated into peptides. These
peptides were used as queries to run Blastp in the swissprot database. 3 of them had significant
hits.
Segment 1
62665 - 64698 Forward Strand
1 CDSf: 62665 - 63676 1011bp
2 CDSl: 63853 - 64698
846bp
Retrotrans_gag[pfam03732], Retrotransposon gag protein; Gag or Capsid-like proteins from LTR
retrotransposons.
>ATGGCGACCGACAACTCGCCCGCCGGCGGCGGAATCGACGACGTCTTCCCCGCGCGGTGGAAGAACAAC
ATTCGAGCTTGCCTCGTCCCCTCCCCCGCCGACGGAGGAGGAGGCGGGGCAACCCAAGGCCAAGCAGGA
GGCGGCACCTCGTCGGCTGTCGAGCGAGTCGACGGTGCCAGCGCCCCAATGGGGGGCACGTCGGGCATC
GACCTCGCGTCTGAGACGAAGACGAGCGCCGTCTCCCCGCAACACGTCAACCCCAAGCAAACGGACGACG
CCAACACGCTCGCAAGGGACTTGCTGGGCGTCACCCTCGTACCTGAGACGGCGGTGCAGTCTACCCCTGAC
GTGACTTCGTCACCGCCCGTCGACCAAGAGGTACCGACCGATTCCCATCTCGCGCCTTTTGGATTCAGCCTC
AACCCCCCAAGCGACTTCGCTTTGGTGGACGCTCTCATAGAGGCGAGTCCAAACCCTCTGGGGTATCGTATG
CGGTCACCATGGGACCGGCTGACGGCCGTCTCAACCTACGGGCCCTTAGGGTCCGAGGAAGATGACGAGC
CCGACTTTAGTTGGGATTTCTCTGGACTTGGTAACCCCAGTGCCATGCGGGACTTTATGACCGCGTGCGACT
ACTGCCTTTCCGACTGTTCCGACGGTAGCCGCAGCCTCGGCGACAAGGACTGCGGCCCAAGTCGTGAATGT
TTTCACGTCGATCTAGGGGGTCCCGACGAAGGCAACCATCTTGGTATGCCAGAGAATGGTGACCTTCCTAG
GCCTGTGCCTCACGTTGACATCCTTCGGGAGCTAGCTGTGGTCCCCGTTCCGGCAGGGGGTCATGACCCAC
AACTCGAGCAAATCCGCGAGATGCAGGCCAGGCTCGACGAGGGAGCAGGAACACTTGAGCCGTTCCGCC
GGGACAATAGGCAGGAATGGGCGGGCCAACCTCTGGCCGGAGAAGTGCGTCATCTACCCCAGGGCATCCA
GCACCGCGTCGCCGACGATGTCAGGgtaaggccgccaccggtttccagtggggtcggccagaacctggctgcagcggcaatact
tctccgcgcgatgccggagccatcaaccaccgaggggcggcgtatccagggagagctcaagaacctcctggaggacgccgcggtctgacg
ggccgaaagctccgcctcccgaaggcagGGGTACCCCTCGGAACATCGCGCCGCGACTTCCCGATTCATGCGGGAAG
CCTCGGTCCACACCGGCCGCATGCGTAACATAGCGCATGCGGCCCCGGGTCGCCTCGGCAACGAGCACCAT
CACCATAACTGTTGGGCCCACCTCGACGAGAGGGTGCGCCGAGGCTACCACCCCAGGCGTGGGGGACGCT
ACGACAGCGGGGAGGATCGGAGTCCCTCGCCCAAACCACCTGGTCCGCAGGCTTTCAACCGCGCCATACG
ACGGGCGCCGTTCCCGACCCGGTTCCGAACCCCGACTACTATCACAAAGTACTCGGGGGAGACGAGACCG
GAACTGTGGCTCGCAGACTACCGGCTGGCCTGCCAGCTGGGTGGAACGGACGATGACAACCTCATCATCTG
CAACCTCCCCCTGTTCCTTTCCGACACCGCTCGCGCCTGGCTGGAGCACCTGCCTCCGGGGCAGATCTCCAA
CTGGGACGACCTGGTCCAAGCCTTCGCCGGTAATTTCCAGGGCACGTACGTGCGCCCTGGAAACTCCTGGG
ATCTCCGAAGCTGCCGCCAGCAGCCGGGGGGGTCTCTCCGGGACTACATCCGGCGATTCTCGAAGCAGCG
CACCGAGCTGCCCAACATCGCCGATTCGGATGTCATCGGCGCGTTCCTCGCCGGCACCACCTGCCGTGACCT
GGTGAGCAAGCTGGGTCGCAAGACCCCCACCAGGGCGAGCGAGCTGATGGACATCGCCACCAAGTTCGCC
TCTGGCCAGGAGGCGGTTGAGGCCATCTTCCGGAAGGACAAGCAGCCCCAGGGCCGCCCACCGGAAGAT
GTCCCCGAGGCGTCAACTTAG
Protein sequences:
MATDNSPAGGGIDDVFPARWKNNIRACLVPSPADGGGGGATQGQAGGGTSSAVERVDGASAPMGGTSGID
LASETKTSAVSPQHVNPKQTDDANTLARDLLGVTLVPETAVQSTPDVTSSPPVDQEVPTDSHLAPFGFSLNPPS
DFALVDALIEASPNPLGYRMRSPWDRLTAVSTYGPLGSEEDDEPDFSWDFSGLGNPSAMRDFMTACDYCLSDC
SDGSRSLGDKDCGPSRECFHVDLGGPDEGNHLGMPENGDLPRPVPHVDILRELAVVPVPAGGHDPQLEQIRE
MQARLDEGAGTLEPFRRDNRQEWAGQPLAGEVRHLPQGIQHRVADDVRGYPSEHRAATSRFMREASVHTG
RMRNIAHAAPGRLGNEHHHHNCWAHLDERVRRGYHPRRGGRYDSGEDRSPSPKPPGPQAFNRAIRRAPFPT
RFRTPTTITKYSGETRPELWLADYRLACQLGGTDDDNLIICNLPLFLSDTARAWLEHLPPGQISNWDDLVQAFAG
NFQGTYVRPGNSWDLRSCRQQPGGSLRDYIRRFSKQRTELPNIADSDVIGAFLAGTTCRDLVSKLGRKTPTRAS
ELMDIATKFASGQEAVEAIFRKDKQPQGRPPEDVPEAST
Segment 2
66287 - 69085 Forward Strand
3 exons:
1 CDSf 66287 - 67405 1119bp
2 CDSi 67439 - 67615 177bp
3 CDSl 68270 - 69085 816bp
RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI;
RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses
which have long terminal repeats (LTRs) in their DNA copies but not in their RNA template.
rve[pfam00665], Integrase core domain
RVT_3[pfam13456], Reverse transcriptase-like; This domain is found in plants and appears to be
part of a retrotransposon.
RNase_HI_RT_Ty3[cd09274], Ty3/Gypsy family of RNase HI in long-term repeat retroelements;
RNase_H[cd06222], RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA
hybrid in a sequence non-specific manner
RNase_H[pfam00075], RNase H; RNase H digests the RNA strand of an RNA/DNA hybrid.
Important enzyme in retroviral replication cycle.
RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase)
PRK07238[PRK07238], bifunctional RNase H/acid phosphatase
PRK07708[PRK07708], hypothetical protein; Validated
>ATGCCATTCAGTTTGAGGAATGCGGGTGCAACGTACCAACGGTGCATGAACCACATGTTCGGCGAACACA
TTGGCCGAACGGTCGAGGCCTACGTCGATGACATCGTAGTCAAGACGAGGAAAGCCTCCGACCTCCTTTCC
GACCTTGAAGCGACATTCCGATGTCTCAAGGCGAAAGGCGTGAAGCTCAATCCCGAGAAATGTGTCTTCGG
GGTTCCACGAGGCATGCTCTTGGGGTTCATCGTCTCCGAGCGGGGCATCGAGGCCAACCCGGAGAAGATC
GCGGCCAACACCAGCATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGAGTCACAGGATGCCTTGCGG
CTCTGAGCCGTTTCATCTCGCGCCTCGGCGAAAGAGGCCTACCTCTGTACCGCCTCTTAAGGAAGGCCGAGT
GCTTCACTTGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCAT
CTTGGTGCCCCCCGCTGCCGGAGAAGCCCTCTTGATCTACGTCACCACGACCACTCAGGTGGTTAGCGCCG
CGATTGTGGTTGAGAGACGAGAAGAGGGGCATGCATTGCCCGTACAGAGGCCAGTCTACTTCATCAGTGAG
GTACTGTCCGAGACCAAGATCCGCTACCCACAAATTCAGAAGCTGCTGTACGCAGTGATCCTGACACGACGG
AAGTTGCGACACTACTTCAAGTCTCATCCGGTGACTGTGGTGTCATCCTTCCCCCTGGGGGAGATCATCCAG
TGCCGAGAGGCCTCGGCTAGAATTGCAAAGTGGGCGGTGGAAATCATGGGCGAGACGATCTCGTTCGCCC
CTCGGAAGGCCATCAAGTCCCAGGTCTTGGCGGACTTTGTGGCTGAATGGGTCGACACCCAGCTCCCAACA
GCTCCGATCCAACCGGAACTCTGGACCATGTTTTTCGACGGGTCACTGATGAAGACAGGAGCAGGCGCAG
GCCTGCTCTTGATCTCGCCCCTCAAGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAA
TGTGGCTAAGTACGAGGCTCTAGTCAACGGGTTGCGCATCGCCATCGAGCTGGGGgtctgacgcctcgacgctcgt
ggtgactcgcagCTCGTCATCGACCAAGTCATGAAGAACTCCCACTGCCACGACCCGAAGATGGAGGCCTACTG
CGATGAGGTTCGGCGCCTGGAAGACAAGTTCTACGGGCTCGAGCTCAACCACATCGCCCGACGCCACAAC
GAGACTGCGGACGAGCTGGCTAAAATAGCCTCGGGGCGAACAACGgttcccccagacgtcttctcccgagacctgcat
caaccctccgtcaagaccgacgacacgcccgagcccgagacaccctcggcttagtccgaggcaccctcggctcagtccgaggcgccatcgg
ctcggcccgaggcaccctcggctcaacccgaggcaccctcggcccccgagggtgaggcactgcgcatcgaggaggagcggagaggggtc
atgcctaatcgaaactggcagaccccgtacctgcaatatctccgccgaggagagctacccctcgaccaagccgaagcttggcggttggcgc
ggcgcgccaagtcgttcgtcttgctgggagacgagaaggagctctaccaccgcagcccctcgggcatcctccagcgatgcatttccatcgcc
gaaggccaggagctcctacaagagatacactcgggggcttgtggccatcacgcagcacctcgagcccttgttggaaacgccttccgacaag
gtttctactggccgacggcggtggccgacaccactagaattgtccgcacctgcgaagggtgtcagttctacacaaggcagacccacctaccc
gcttaggccctgcagaccatacccatcacctggtcatttgttgtgtggggtctggacctagttggccccttgcagAAGGCACCCGGGGG
CTACACGCATCTGTTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGGTCCGACCCCTAAACAGCATCAG
GTCCGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGGTCCCGAACTCCATCATCACCGA
CAACGGCACCCAGTTCACCGGCAGAAAGTTCCTGGACTTCTGCGAGGATCACCACATCTGGGTGGACTGG
GCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGGCATGATTCTACAAGGACTCAA
GCCTCGAATCTACAACGACCTCAACAAGTTCGGCAAGCGGTGGATGAAGGAACTCCCCTCGGTGGTCTGGA
GTCTGAGGACGACGCTGAGCCGGGCCACGGGCTTCACACCGTTCTTTCTAGTCTATGGGGCCGAGACCGTC
TTGCCCATAGACTTAGAATACGGTTCCCCGAGGACGAGGGCCTACGACGACCAAAGCAATCGAGCTAATCG
AGAAGACTCACCGGACCAGCTGGAAGAGGCTCGGGACATGGCCTTACTACACTCGGCGCGGTACCAGCAG
TCCTTGCGACGCTACCACGCCCGAGGGGTTCGGTCCCGAGACCTCCAGGTGGGCGACCTGGTGCTTCGGCT
GCGACAAGACGCCCGAGGGCGGCACAAGCTCATGCCTCCCTGGGAAGGGTCGTTCGTCATCGCCAAAGTT
CTGAAGCCTGGGACGTACAAGCTGGCCAACAGTCAAGGCGAGGTCTACAGCAACGCTTGGAACATCCGAC
AGCTACGTCGCTTCTACCCTTAA
Protein sequence:
MPFSLRNAGATYQRCMNHMFGEHIGRTVEAYVDDIVVKTRKASDLLSDLEATFRCLKAKGVKLNPEKCVFGVP
RGMLLGFIVSERGIEANPEKIAANTSMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEA
EEALGNLKALLTNAPILVPPAAGEALLIYVTTTTQVVSAAIVVERREEGHALPVQRPVYFISEVLSETKIRYPQIQKLL
YAVILTRRKLRHYFKSHPVTVVSSFPLGEIIQCREASARIAKWAVEIMGETISFAPRKAIKSQVLADFVAEWVDTQL
PTAPIQPELWTMFFDGSLMKTGAGAGLLLISPLKKHLRYVLRLHFPASNNVAKYEALVNGLRIAIELGLVIDQVM
KNSHCHDPKMEAYCDEVRRLEDKFYGLELNHIARRHNETADELAKIASGRTTKAPGGYTHLLVAIDKFSKWIEVR
PLNSIRSEQAVAFFTNIIHRFGVPNSIITDNGTQFTGRKFLDFCEDHHIWVDWAAVAHPMTNGQVERANGMIL
QGLKPRIYNDLNKFGKRWMKELPSVVWSLRTTLSRATGFTPFFLVYGAETVLPIDLEYGSPRTRAYDDQSNRAN
REDSPDQLEEARDMALLHSARYQQSLRRYHARGVRSRDLQVGDLVLRLRQDARGRHKLMPPWEGSFVIAKVL
KPGTYKLANSQGEVYSNAWNIRQLRRFYP
Segment 3
82383 – 88664 Forward Strand
Exon 1: 82383 - 83722
Exon 2: 84124 - 84298
1338bp
174bp
Exon 3:
Exon 4:
Exon 5:
Exon 6:
Exon 7:
84369 - 85018
85130 - 85433
85920 - 86500
86862 - 87035
87327 - 88664
684bp
303bp
579bp
174bp
1338bp
RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses.
RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI;
rve[pfam00665], Integrase core domain;
DUF4370[pfam14290], Domain of unknown function (DUF4370);
RT_DIRS1[cd03714], RT_DIRS1: Reverse transcriptases (RTs) occurring in the DIRS1 group of
retransposons.
RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase); A reverse
transcriptase gene is usually indicative of a mobile element such as a retrotransposon or
retrovirus.
PRK12829[PRK12829], short chain dehydrogenase; Provisional
PHA03307[PHA03307], transcriptional regulator ICP4; Provisional
>ATGGCGGCCGACAACCCGCCCGCCGGCGGCGGAATCGATGACGTCTTCCCCACGTGGCGGAAGAACGAC
ATTCGGGCTTGTCCCGTCCCCTCCCCCGTCGACGGAGGAGGAGGCGGGGCAACCAAGGCCAAGCAGGAG
GCGGCACCTCGTCGGCTATCGAGCGAGTCGACGGCGCCGGTGCCCCCAACGAGGGGCGCGATGGGCATCG
ACATCGCGTCTGAGACGAAGACGAGCGCCGTCTCCCCGCAACACGCCAACTCCAAGCAAACGGACGACGC
CAGCACGCTCGCAAAAGACTTGTTGGGCGTCACCCTCGTACCTGAGACGACGGTGCAGTCTACCCCTGACG
TGACTTCGTCACCGCCCGTCGACCAAGACGTACCGACCGATTCCCATCTCGCGCCTTTTGGATTCAGCCTCG
ACCCACCAAGCGACTTCGCTTTGGTGGACGCTTTCATAGAGGCGAGTCCAAACCCTCCGGGGTATCGTGTG
CGGTCACCCTGGGACCGGCTGACAGCCGTCTCGACCTACGGGCCCTCGGGTTCCGAGGAAGATGACGAGC
CCGACTTTTGTTGGGATTTCTCTGGACTTGGTAACCCCAGTGCCATGCGGGACTTCATGACCACATGCGACT
ACTGCCTTTCCGACTGTTCCGACGGTAGCCGCAGCCTCGGCGACGAGGACTATGGCCCAAGTCGTGAATGT
TTCCACGTCGACCTAGGGGGTCCCGGCGAAGGAAACCATCCTGGTATACCGGAAAATGGTGATCCCCCTAG
GCCTGCGCCTCGCGTTGACATCCTACGGGAGCTAGCTGTGGTCCCAGTCCCTGCGGGGGTCAGGACTCACA
GCTCGAGCAAATCTGCGAGATGCAGGCCAGGCTCGACGAGGGAGCAGGAACACTTGAGCCGTTCCGCCG
GGACATCGGGCAGGAATGGGCAGGCCAACCTCCGGCCGGAGAAGCGCGCCATCTACCCCAGGGCATCCAA
CACCGCATCGCCGACGATGTCAGGGCAAGGCCGCCACCGGCCTCCAGTGGGGTCGGCCAGAACCTGGCTG
CAGCGGCAATACTTCTCCGCGCGATGCCGGAGCCATCTACCACCGAGGGGCGGCGTATCCAGGGAGAGCTC
AAGAATCTCCTGGAGGATGTCGCGGTCCGACGGGCCGAAAGCTCCGCCTCCCGAAGGCAGGGGTACCCCT
CGGAACATCGCGCCGCGACTTCCCAATTCATGCGGAAAGCCTCGGTCCACACCGGGCGCACGCGCAACACA
GCGCCTGCGGCCCTGGGTCGCCTCGGCAACGAACACCCTCACCGCAACCGTCGAACCCACCTCGACGAGA
gggtgcgccgaggctaccaccccaggcgtgggggacgctacgacagcggggaggattggagtccctcgcccgaaccacccggtccgcag
gctttcagccgggccatacgacgggcgccgttcccgacccggttccgaaccccgactactatcacaaagtactcgggggagacgagaccgg
aactgtggctcgcggactaccggctagcctgccacctgggtggaacagacgatgacaatctcatcatccggaacctccccctgttcctctccg
acaccgctcgagcctggctggagcacctgcctccggggcagatctccaactaggacgacctggtccaagccttcgccggcaacttccagggt
acgtatgtgtgccctgggaactcctgggatctccaaaGCTGCCGCCAGCAGCCGGGGGAGTCTCTCTGGGACTACATCC
GGCAATTCTCGAAGCAGCGCACCGAGTTGCCCAATGTCACCGACTCGGATGTCATCGGCGCGTTCCTCGCC
GACACCACTTGCCGCGACCTGGTTAGCAAGCTGGGTCGCAAGACCCCCACCAGGGCGAGTGaggtgatggac
atcgccaccaagttcgcctctggctaGGATGCGGTTGAGGCCATCTTCCGGAAGGACAAGCAGCCCCAGGGCCGCC
CACCGGAAGATGTCCCCGAGGCGTCAACTCAGCGCGGCATCAAGAAGAAAGGCAAGAAGAAGTCGCAAG
CAAAACGCGACGCCGCCGATGCGAACTTTGTCGCCGCCGCCGAGTACAAGAACCCTCGGAAACCTCCTGG
AGGTGCCAATCTCTTCGACAAGATGCTCAAGGAGCCGTGCCCCTGTCATCAGGGGCCCGTCAAGCACACCC
TTGAGGAGTGCGCCATGCTTCGGCGCCACTTTCACAAAGCCGGGCCACCTGCGGAGGGTGGCCGGGCCCG
CGACGACGATAAGAAGGAGGATCACAAGGCAGGAGAGTTCCCCGAGGTCCACGACTGCTTCATGATCTAC
GGTGGGCAAGTGGCGAACGCCTCGGCTCGGCACCACAAGCAAGAGCGTCGGGAGGTCTGCTCGGTAAAG
GTGGCGGCGCCAGTCTACCTAGACTGGTCCGACAAGCCCATCACCTTCGACCAGGGCGACCACCCCGACCG
CGTGCCGAGCCTGGGGAAGTACCCGCTCGTTGTCGACCCCGTCATCGGCAACGTCAGGCTCACCAAGGTCC
TCATGGACGGAGGCAGCAGCCTCAACGTCATCTACGCCAAGACCCTCGGGCTCCTGCGGATCGATCTGTCCT
Cggtacgggcaggagctgcgccttttcacgggatcatccctgggaagcgcgtccagcccctcggacaactcgatctacccgtctgctttggg
acaccctccaacttctgaaagGAGACCCTCACGTTCGAGGTGGTCGGGTTTCGAGGAACCTACCACGCAGTGCTG
AGGAGGCCATGCTACGCCAAGTTCATGGTCGTCCCCAACTACACCTACCACAAGCTAAAGATGCCAGGCCCC
AACGGGGTCATCACCGTCGGCCCCACGTACCGACACGCGTACGAATGCGACGTGGAGTGCATGGAGTACGC
CGAGGCCCTCGCCAAATCCGAGGCCCTCATCGCCGACCTGGAGAGCCTCTCCAAGGAGGCGCCAGACGTG
AAGCGCCACACCAGCAACTTCGAGCCAACGGAGATGggtaagttcgtccctctcaacaccagcaacgatacctccaagctg
atccggatcgggctccgagctcgaccccaaataggaagcagtctcgtcgactttctccgtgcaaacaccgatgtttttgcatggaatccctcgg
acatgcccggcataccgagggatgtcgccgagcactcgctggatatccgagctagagcccgacccgtgaagcagcctctgcgccggttcga
cgaagaaaagcgcagagccataggcgaggagatccacaagctaatggcggtagggttcatcaaagaggtattccatcccgagtggcttgc
caaccctgtgcttgtgagaaagaaaggagggaaatggcgtatgtgtgtagactacactggtctaaacaaagcatgtccaaaagttccctacc
ctctgcctcgcatcgatcaaatcgtggattccactgctgggtgcgaaaccctgtctttcctcgatgcctactcagGGTATCGCCAAATCA
GGATGAAAGAGTCCGACCAGCTCGCGACTTCTTTCATCACACCTTTCGGCATGTACTGCTATGTTACCATGTC
GTTTGGTTTGAGGAATGCGGGTGCGACATACCAAAGGTGCATGAACCACGTGTTCGGCGAACACATTGGTC
GAACGGTCGAGGCTTACATCGATGACATCGTAGTCAAGACGAGGAAAGCCTCTGACCTCCTTTCCGACCTTG
AAACGACATTCTGGTGTCTCAAGGCGAAAGGTGTAAAGCTCAATCCCGAGAAGTGCGTCTTCGGGGTCCCC
CAAGGCTTGCTCTTGGGGTTTATCGTCTCCGAGCGGGGCATCGAGGCCAACCCAGAGAAAATCGTGGCCAT
CACCAACATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGGGTCACGGGGTGCCTTGCGGCTCTGAGC
CGTTTCATCTCACGCCTCGGCGAAAGAGGCCTGCCTCTGTACCGCCTCTTAAGGAAGGCCGAGTGCTTCACT
TGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCATCTtggtgccc
ccgcggccggagaagccctcttgatctacgtcgccgctaccactcaggtggtcagcgccgcgatcgtggttgagagacgagaagagggaca
tgcattgcctgtccagaggccagtctacttcgtcagtgaggtactgtccgagaccaagatccgctacccacaaattccgagtctcatccggtga
ctgtggtgtcatctttccccctgggggagatcatccagtgccgagaggcctcgggtaggattgcaaagtgggcggtggaaatcatgggcgag
acaatctcgttcgccactcgtaaggccataaagtcccaagtcttggcggactttgtggctgaatgggtcgatacccaGCTCCCGACAGC
TCCGATCCAACCGGAACTCTGGACCATGTTTTTTGACGGGTCGCTGATGAAGACAGGGGCAGGCGCGGGC
CTGCTCTTCATCTCGCCCCTCGGGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAATG
TGGCCGAGTACGAGGCTCTggtcaacgggttgcgcgtcgccatcgagctagggatccgacgtctcgacgctcgcggtgactcgtagc
tcgtcattgactaagtcatgaagaactcccacttctgcgactcgaagatggaagcctactgcgatgaggttcggcgcctggaggacaagttct
atgggctcgagttcaaccacatcgcccgacgctacaacgagactgcggacaagctggctaagatagcctcggggcaaacaacggttccccc
ggacgtcttctcctgagacctgcatcaaccctccgtcaagACCGACGACACGCCCGAGCCCGAGAAGGCCTCGGCCCAGC
CCGAGGCACCCTCGGCCCCCGAGGATGAGGCACTGCGTGTCGAGGAGGAGCGGAGCGGGGTCACGCCTA
ATCGAAACTGGCAGACCCCGAACCTGCAATATCTCCACCGAGGAGAGCTACCCCTCGACCGAGCCGAAGCT
CGGCGGTTGGCGCGGCGTGCCAAGTCGTTCGTCTTGCTGGGGGACGGGAAGGAGCTCTACCATCGCAGCC
CCTCAGGCATCCTCCAGCAATGCATATCCATCACCGAAGGCCAGGAGCTCTTACAAGAAATACACTCGGGGG
CTTGCGGGCATCACGCGGCGCCCCGAGCCCTTGTTGGGAACGCCTTCCGACAAGGTTTCTACTGGCCAACC
GCGGTGGCCGACGCCACTAGAATTGTTCGCACCTGCCAGGGGTGTCAATTCTACGCAAGGCAGACTCACCT
TCCCGCCCAGGCTCTACAGACCATACCCATCACCTGGTCGTTTGCTGTGTGGGGTCTGGACCTCGTCGGCAC
CTTGCAGAAGGCACCCGGGGGCTACACGCACCTGCTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGG
TCCGACCCCTAAACAGCATCAGGTCTGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGG
TCCCGAACTCCATCATCACCGACAACGACACCCAGTTCACCGACAGAAAGTTCCTGGACTTCTGCGAGGATC
ACCACATCCGGGTGGACTGGGCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGG
CATGATCCTGCAAGGACTCAAGCCGTGGATCTACAACAACCTTAACAAGTTCGGCAAGCGATGGATGAAGG
AGCTCCCCTCGGTGGTCTGGAGTCTGAGGACAACGCCGAGCCGAGCCACGGGCTTCACACCGTTCTTTCTA
GTCTATGGGGCCGAGGCCATCTTGCCCATAGACTTAGAATACGGTTCCCCAAGGACGAGGGCCTACAACGA
CCAAAGCAATCGAGCTAACCGAGAAGACTCACTGGACCAGCTGGAAGAGGCTCGGAACATGGCCTTCCTA
CACTCGGCGCGGTATCAGCAGTCCCTGCGACGCTACCACGCCCGAAGGGTTCGGTCCCGAGACCTCCAGGT
GGGCGACTTGGTGCTTCGGCTGCGACAAGACGCCCGAGGGCGGCACAAGCTCACGCCTCCCTGGGAAGG
GTCGTTCGTCATCGCCAAGGTTCTGAAGCCCGGGACGTATAAGCTGGCCAACAGTCAAGGCGAGGTCTACA
ACAACGCTTGGAACATCCGATAG
Protein sequence:
MAADNPPAGGGIDDVFPTWRKNDIRACPVPSPVDGGGGGATKAKQEAAPRRLSSESTAPVPPTRGAMGIDIA
SETKTSAVSPQHANSKQTDDASTLAKDLLGVTLVPETTVQSTPDVTSSPPVDQDVPTDSHLAPFGFSLDPPSDFA
LVDAFIEASPNPPGYRVRSPWDRLTAVSTYGPSGSEEDDEPDFCWDFSGLGNPSAMRDFMTTCDYCLSDCSDG
SRSLGDEDYGPSRECFHVDLGGPGEGNHPGIPENGDPPRPAPRVDILRELAVVPVPAGVRTHSSSKSARCRPGS
TREQEHLSRSAGTSGRNGQANLRPEKRAIYPRASNTASPTMSGQGRHRPPVGSARTWLQRQYFSARCRSHLP
PRGGVSRESSRISWRMSRSDGPKAPPPEGRGTPRNIAPRLPNSCGKPRSTPGARATQRLRPWVASATNTLTATV
EPTSTRGCRQQPGESLWDYIRQFSKQRTELPNVTDSDVIGAFLADTTCRDLVSKLGRKTPTRASEDAVEAIFRKD
KQPQGRPPEDVPEASTQRGIKKKGKKKSQAKRDAADANFVAAAEYKNPRKPPGGANLFDKMLKEPCPCHQG
PVKHTLEECAMLRRHFHKAGPPAEGGRARDDDKKEDHKAGEFPEVHDCFMIYGGQVANASARHHKQERREV
CSVKVAAPVYLDWSDKPITFDQGDHPDRVPSLGKYPLVVDPVIGNVRLTKVLMDGGSSLNVIYAKTLGLLRIDLS
SETLTFEVVGFRGTYHAVLRRPCYAKFMVVPNYTYHKLKMPGPNGVITVGPTYRHAYECDVECMEYAEALAKSE
ALIADLESLSKEAPDVKRHTSNFEPTEMGYRQIRMKESDQLATSFITPFGMYCYVTMSFGLRNAGATYQRCMN
HVFGEHIGRTVEAYIDDIVVKTRKASDLLSDLETTFWCLKAKGVKLNPEKCVFGVPQGLLLGFIVSERGIEANPEKI
VAITNMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEAEEALGNLKALLTNAPILLPTAPI
QPELWTMFFDGSLMKTGAGAGLLFISPLGKHLRYVLRLHFPASNNVAEYEALTDDTPEPEKASAQPEAPSAPED
EALRVEEERSGVTPNRNWQTPNLQYLHRGELPLDRAEARRLARRAKSFVLLGDGKELYHRSPSGILQQCISITEG
QELLQEIHSGACGHHAAPRALVGNAFRQGFYWPTAVADATRIVRTCQGCQFYARQTHLPAQALQTIPITWSFA
VWGLDLVGTLQKAPGGYTHLLVAIDKFSKWIEVRPLNSIRSEQAVAFFTNIIHRFGVPNSIITDNDTQFTDRKFLD
FCEDHHIRVDWAAVAHPMTNGQVERANGMILQGLKPWIYNNLNKFGKRWMKELPSVVWSLRTTPSRATGF
TPFFLVYGAEAILPIDLEYGSPRTRAYNDQSNRANREDSLDQLEEARNMAFLHSARYQQSLRRYHARRVRSRDL
QVGDLVLRLRQDARGRHKLTPPWEGSFVIAKVLKPGTYKLANSQGEVYNNAWNIR
Augustus gene prediction
Augustus predicted 13 genes. The predicted genes were then translated into peptides. These
peptides were used as queries to run Blastp in the swissprot database. Only 2 of them had
significant hits. One belongs to the Reverse transcriptases (RTs) superfamily, the other belongs to
the RNase H superfamily.
Segment 1: 65858 --- 67411
CDS
65858 --- 67411
1553bp
RT_LTR[cd01647]: Reverse transcriptases (RTs) from retrotransposons and retroviruses which
have long terminal repeats (LTRs) in their DNA copies but not in their RNA template.
RT_Rtv[cd01645]: Reverse transcriptases (RTs) from retroviruses (Rtvs).
RT_ZFREV_like[cd03715]: A subfamily of reverse transcriptases (RTs) found in sequences similar
to the intact endogenous retrovirus ZFERV from zebrafish and to Moloney murine leukemia virus
RT.
>ATGCCCGGCATACCGAGGGATGTCGCCGAGCACTCGCTGGATATCCGAGCTGGAGCCCGACCCGTGAAGC
AGCCTTTGCGCCGATTCGACGAAGAAAAGCGCAGAGCCATAGGCGAGGAGATCCACAAGCTAATGGCGGC
AGGGTTCATCAAAGAGGTATTCCACCCCGAATGGCTTGCCAACCCTGTGCTTGTGAGAAAGAAAGGAGGG
AAATGGCGGATGTGTGTAGACTACACTGGTCTAAACAAAGCATGTCCGAAAGTTCCCTACCCTCTACCTCGCA
TCGATCAAATCGTGGATTCCACTGCTGGGTGCGAAACCCTATCTTTCCTTGATGCCTACTCGGGGTATCACCA
GATCAGGATGAAAGAGTCCGACCAGCTCGCGACTTCTTTCATCACACCCTTCGGCATGTACTGTTATGTTACC
ATGCCATTCAGTTTGAGGAATGCGGGTGCAACGTACCAACGGTGCATGAACCACATGTTCGGCGAACACATT
GGCCGAACGGTCGAGGCCTACGTCGATGACATCGTAGTCAAGACGAGGAAAGCCTCCGACCTCCTTTCCGA
CCTTGAAGCGACATTCCGATGTCTCAAGGCGAAAGGCGTGAAGCTCAATCCCGAGAAATGTGTCTTCGGGG
TTCCACGAGGCATGCTCTTGGGGTTCATCGTCTCCGAGCGGGGCATCGAGGCCAACCCGGAGAAGATCGC
GGCCAACACCAGCATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGAGTCACAGGATGCCTTGCGGCT
CTGAGCCGTTTCATCTCGCGCCTCGGCGAAAGAGGCCTACCTCTGTACCGCCTCTTAAGGAAGGCCGAGTG
CTTCACTTGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCATCT
TGGTGCCCCCCGCTGCCGGAGAAGCCCTCTTGATCTACGTCACCACGACCACTCAGGTGGTTAGCGCCGCG
ATTGTGGTTGAGAGACGAGAAGAGGGGCATGCATTGCCCGTACAGAGGCCAGTCTACTTCATCAGTGAGGT
ACTGTCCGAGACCAAGATCCGCTACCCACAAATTCAGAAGCTGCTGTACGCAGTGATCCTGACACGACGGA
AGTTGCGACACTACTTCAAGTCTCATCCGGTGACTGTGGTGTCATCCTTCCCCCTGGGGGAGATCATCCAGT
GCCGAGAGGCCTCGGCTAGAATTGCAAAGTGGGCGGTGGAAATCATGGGCGAGACGATCTCGTTCGCCCC
TCGGAAGGCCATCAAGTCCCAGGTCTTGGCGGACTTTGTGGCTGAATGGGTCGACACCCAGCTCCCAACAG
CTCCGATCCAACCGGAACTCTGGACCATGTTTTTCGACGGGTCACTGATGAAGACAGGAGCAGGCGCAGG
CCTGCTCTTGATCTCGCCCCTCAAGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAAT
GTGGCTAAGTACGAGGCTCTAGTCAACGGGTTGCGCATCGCCATCGAGCTGGGGGTCTGA
Protein sequence:
MPGIPRDVAEHSLDIRAGARPVKQPLRRFDEEKRRAIGEEIHKLMAAGFIKEVFHPEWLANPVLVRKKGGKWR
MCVDYTGLNKACPKVPYPLPRIDQIVDSTAGCETLSFLDAYSGYHQIRMKESDQLATSFITPFGMYCYVTMPFSL
RNAGATYQRCMNHMFGEHIGRTVEAYVDDIVVKTRKASDLLSDLEATFRCLKAKGVKLNPEKCVFGVPRGMLL
GFIVSERGIEANPEKIAANTSMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEAEEALG
NLKALLTNAPILVPPAAGEALLIYVTTTTQVVSAAIVVERREEGHALPVQRPVYFISEVLSETKIRYPQIQKLLYAVILT
RRKLRHYFKSHPVTVVSSFPLGEIIQCREASARIAKWAVEIMGETISFAPRKAIKSQVLADFVAEWVDTQLPTAPI
QPELWTMFFDGSLMKTGAGAGLLLISPLKKHLRYVLRLHFPASNNVAKYEALVNGLRIAIELG
Segment 2: 86898 --- 88664
2 exons
1 CDS
2 CDS
86898---87090
87304---88664
192bp
1360bp
RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI
RVT_3[pfam13456], Reverse transcriptase-like; This domain is found in plants and appears to be
part of a retrotransposon.
RNase_H[cd06222], RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA
hybrid in a sequence non-specific manner
RnhA[COG0328], Ribonuclease HI [DNA replication, recombination, and repair]
PRK07238[PRK07238], bifunctional RNase H/acid phosphatase; Provisional
PRK07708[PRK07708], hypothetical protein; Validated
>ATGTTTTTTGACGGGTCGCTGATGAAGACAGGGGCAGGCGCGGGCCTGCTCTTCATCTCGCCCCTCGGGA
AGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAATGTGGCCGAGTACGAGGCTCTGGTCA
ACGGGTTGCGCGTCGCCATCGAGCTAGGGATCCGACGTCTCGACGCTCGCggtgactcgtagctcgtcattgactaag
tcatgaagaactcccacttctgcgactcgaagatggaagcctactgcgatgaggttcggcgcctggaggacaagttctatgggctcgagttca
accacatcgcccgacgctacaacgagactgcggacaagctggctaagatagcctcggggcaaacaacggttcccccggacgtcttctcctg
agaCCTGCATCAACCCTCCGTCAAGACCGACGACACGCCCGAGCCCGAGAAGGCCTCGGCCCAGCCCGAGG
CACCCTCGGCCCCCGAGGATGAGGCACTGCGTGTCGAGGAGGAGCGGAGCGGGGTCACGCCTAATCGAA
ACTGGCAGACCCCGAACCTGCAATATCTCCACCGAGGAGAGCTACCCCTCGACCGAGCCGAAGCTCGGCGG
TTGGCGCGGCGTGCCAAGTCGTTCGTCTTGCTGGGGGACGGGAAGGAGCTCTACCATCGCAGCCCCTCAG
GCATCCTCCAGCAATGCATATCCATCACCGAAGGCCAGGAGCTCTTACAAGAAATACACTCGGGGGCTTGCG
GGCATCACGCGGCGCCCCGAGCCCTTGTTGGGAACGCCTTCCGACAAGGTTTCTACTGGCCAACCGCGGTG
GCCGACGCCACTAGAATTGTTCGCACCTGCCAGGGGTGTCAATTCTACGCAAGGCAGACTCACCTTCCCGCC
CAGGCTCTACAGACCATACCCATCACCTGGTCGTTTGCTGTGTGGGGTCTGGACCTCGTCGGCACCTTGCAG
AAGGCACCCGGGGGCTACACGCACCTGCTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGGTCCGACC
CCTAAACAGCATCAGGTCTGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGGTCCCGAA
CTCCATCATCACCGACAACGACACCCAGTTCACCGACAGAAAGTTCCTGGACTTCTGCGAGGATCACCACAT
CCGGGTGGACTGGGCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGGCATGATC
CTGCAAGGACTCAAGCCGTGGATCTACAACAACCTTAACAAGTTCGGCAAGCGATGGATGAAGGAGCTCCC
CTCGGTGGTCTGGAGTCTGAGGACAACGCCGAGCCGAGCCACGGGCTTCACACCGTTCTTTCTAGTCTATG
GGGCCGAGGCCATCTTGCCCATAGACTTAGAATACGGTTCCCCAAGGACGAGGGCCTACAACGACCAAAGC
AATCGAGCTAACCGAGAAGACTCACTGGACCAGCTGGAAGAGGCTCGGAACATGGCCTTCCTACACTCGGC
GCGGTATCAGCAGTCCCTGCGACGCTACCACGCCCGAAGGGTTCGGTCCCGAGACCTCCAGGTGGGCGAC
TTGGTGCTTCGGCTGCGACAAGACGCCCGAGGGCGGCACAAGCTCACGCCTCCCTGGGAAGGGTCGTTCG
TCATCGCCAAGGTTCTGAAGCCCGGGACGTATAAGCTGGCCAACAGTCAAGGCGAGGTCTACAACAACGCT
TGGAACATCCGATAG
protein sequence:
MFFDGSLMKTGAGAGLLFISPLGKHLRYVLRLHFPASNNVAEYEALVNGLRVAIELGIRRLDARDLHQPSVKTDD
TPEPEKASAQPEAPSAPEDEALRVEEERSGVTPNRNWQTPNLQYLHRGELPLDRAEARRLARRAKSFVLLGDG
KELYHRSPSGILQQCISITEGQELLQEIHSGACGHHAAPRALVGNAFRQGFYWPTAVADATRIVRTCQGCQFYAR
QTHLPAQALQTIPITWSFAVWGLDLVGTLQKAPGGYTHLLVAIDKFSKWIEVRPLNSIRSEQAVAFFTNIIHRFGV
PNSIITDNDTQFTDRKFLDFCEDHHIRVDWAAVAHPMTNGQVERANGMILQGLKPWIYNNLNKFGKRWMK
ELPSVVWSLRTTPSRATGFTPFFLVYGAEAILPIDLEYGSPRTRAYNDQSNRANREDSLDQLEEARNMAFLHSAR
YQQSLRRYHARRVRSRDLQVGDLVLRLRQDARGRHKLTPPWEGSFVIAKVLKPGTYKLANSQGEVYNNAWNI
R
Download