Seq5part2(50244-10000) Yichun Qian The genes predicted by FGENESH are better. But I also attached the genes predicted by Augustus for comparison. Gene Prediction FGENESH FGENESH predicted 7 genes. The predicted genes were then translated into peptides. These peptides were used as queries to run Blastp in the swissprot database. 3 of them had significant hits. Segment 1 62665 - 64698 Forward Strand 1 CDSf: 62665 - 63676 1011bp 2 CDSl: 63853 - 64698 846bp Retrotrans_gag[pfam03732], Retrotransposon gag protein; Gag or Capsid-like proteins from LTR retrotransposons. >ATGGCGACCGACAACTCGCCCGCCGGCGGCGGAATCGACGACGTCTTCCCCGCGCGGTGGAAGAACAAC ATTCGAGCTTGCCTCGTCCCCTCCCCCGCCGACGGAGGAGGAGGCGGGGCAACCCAAGGCCAAGCAGGA GGCGGCACCTCGTCGGCTGTCGAGCGAGTCGACGGTGCCAGCGCCCCAATGGGGGGCACGTCGGGCATC GACCTCGCGTCTGAGACGAAGACGAGCGCCGTCTCCCCGCAACACGTCAACCCCAAGCAAACGGACGACG CCAACACGCTCGCAAGGGACTTGCTGGGCGTCACCCTCGTACCTGAGACGGCGGTGCAGTCTACCCCTGAC GTGACTTCGTCACCGCCCGTCGACCAAGAGGTACCGACCGATTCCCATCTCGCGCCTTTTGGATTCAGCCTC AACCCCCCAAGCGACTTCGCTTTGGTGGACGCTCTCATAGAGGCGAGTCCAAACCCTCTGGGGTATCGTATG CGGTCACCATGGGACCGGCTGACGGCCGTCTCAACCTACGGGCCCTTAGGGTCCGAGGAAGATGACGAGC CCGACTTTAGTTGGGATTTCTCTGGACTTGGTAACCCCAGTGCCATGCGGGACTTTATGACCGCGTGCGACT ACTGCCTTTCCGACTGTTCCGACGGTAGCCGCAGCCTCGGCGACAAGGACTGCGGCCCAAGTCGTGAATGT TTTCACGTCGATCTAGGGGGTCCCGACGAAGGCAACCATCTTGGTATGCCAGAGAATGGTGACCTTCCTAG GCCTGTGCCTCACGTTGACATCCTTCGGGAGCTAGCTGTGGTCCCCGTTCCGGCAGGGGGTCATGACCCAC AACTCGAGCAAATCCGCGAGATGCAGGCCAGGCTCGACGAGGGAGCAGGAACACTTGAGCCGTTCCGCC GGGACAATAGGCAGGAATGGGCGGGCCAACCTCTGGCCGGAGAAGTGCGTCATCTACCCCAGGGCATCCA GCACCGCGTCGCCGACGATGTCAGGgtaaggccgccaccggtttccagtggggtcggccagaacctggctgcagcggcaatact tctccgcgcgatgccggagccatcaaccaccgaggggcggcgtatccagggagagctcaagaacctcctggaggacgccgcggtctgacg ggccgaaagctccgcctcccgaaggcagGGGTACCCCTCGGAACATCGCGCCGCGACTTCCCGATTCATGCGGGAAG CCTCGGTCCACACCGGCCGCATGCGTAACATAGCGCATGCGGCCCCGGGTCGCCTCGGCAACGAGCACCAT CACCATAACTGTTGGGCCCACCTCGACGAGAGGGTGCGCCGAGGCTACCACCCCAGGCGTGGGGGACGCT ACGACAGCGGGGAGGATCGGAGTCCCTCGCCCAAACCACCTGGTCCGCAGGCTTTCAACCGCGCCATACG ACGGGCGCCGTTCCCGACCCGGTTCCGAACCCCGACTACTATCACAAAGTACTCGGGGGAGACGAGACCG GAACTGTGGCTCGCAGACTACCGGCTGGCCTGCCAGCTGGGTGGAACGGACGATGACAACCTCATCATCTG CAACCTCCCCCTGTTCCTTTCCGACACCGCTCGCGCCTGGCTGGAGCACCTGCCTCCGGGGCAGATCTCCAA CTGGGACGACCTGGTCCAAGCCTTCGCCGGTAATTTCCAGGGCACGTACGTGCGCCCTGGAAACTCCTGGG ATCTCCGAAGCTGCCGCCAGCAGCCGGGGGGGTCTCTCCGGGACTACATCCGGCGATTCTCGAAGCAGCG CACCGAGCTGCCCAACATCGCCGATTCGGATGTCATCGGCGCGTTCCTCGCCGGCACCACCTGCCGTGACCT GGTGAGCAAGCTGGGTCGCAAGACCCCCACCAGGGCGAGCGAGCTGATGGACATCGCCACCAAGTTCGCC TCTGGCCAGGAGGCGGTTGAGGCCATCTTCCGGAAGGACAAGCAGCCCCAGGGCCGCCCACCGGAAGAT GTCCCCGAGGCGTCAACTTAG Protein sequences: MATDNSPAGGGIDDVFPARWKNNIRACLVPSPADGGGGGATQGQAGGGTSSAVERVDGASAPMGGTSGID LASETKTSAVSPQHVNPKQTDDANTLARDLLGVTLVPETAVQSTPDVTSSPPVDQEVPTDSHLAPFGFSLNPPS DFALVDALIEASPNPLGYRMRSPWDRLTAVSTYGPLGSEEDDEPDFSWDFSGLGNPSAMRDFMTACDYCLSDC SDGSRSLGDKDCGPSRECFHVDLGGPDEGNHLGMPENGDLPRPVPHVDILRELAVVPVPAGGHDPQLEQIRE MQARLDEGAGTLEPFRRDNRQEWAGQPLAGEVRHLPQGIQHRVADDVRGYPSEHRAATSRFMREASVHTG RMRNIAHAAPGRLGNEHHHHNCWAHLDERVRRGYHPRRGGRYDSGEDRSPSPKPPGPQAFNRAIRRAPFPT RFRTPTTITKYSGETRPELWLADYRLACQLGGTDDDNLIICNLPLFLSDTARAWLEHLPPGQISNWDDLVQAFAG NFQGTYVRPGNSWDLRSCRQQPGGSLRDYIRRFSKQRTELPNIADSDVIGAFLAGTTCRDLVSKLGRKTPTRAS ELMDIATKFASGQEAVEAIFRKDKQPQGRPPEDVPEAST Segment 2 66287 - 69085 Forward Strand 3 exons: 1 CDSf 66287 - 67405 1119bp 2 CDSi 67439 - 67615 177bp 3 CDSl 68270 - 69085 816bp RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI; RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses which have long terminal repeats (LTRs) in their DNA copies but not in their RNA template. rve[pfam00665], Integrase core domain RVT_3[pfam13456], Reverse transcriptase-like; This domain is found in plants and appears to be part of a retrotransposon. RNase_HI_RT_Ty3[cd09274], Ty3/Gypsy family of RNase HI in long-term repeat retroelements; RNase_H[cd06222], RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner RNase_H[pfam00075], RNase H; RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral replication cycle. RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase) PRK07238[PRK07238], bifunctional RNase H/acid phosphatase PRK07708[PRK07708], hypothetical protein; Validated >ATGCCATTCAGTTTGAGGAATGCGGGTGCAACGTACCAACGGTGCATGAACCACATGTTCGGCGAACACA TTGGCCGAACGGTCGAGGCCTACGTCGATGACATCGTAGTCAAGACGAGGAAAGCCTCCGACCTCCTTTCC GACCTTGAAGCGACATTCCGATGTCTCAAGGCGAAAGGCGTGAAGCTCAATCCCGAGAAATGTGTCTTCGG GGTTCCACGAGGCATGCTCTTGGGGTTCATCGTCTCCGAGCGGGGCATCGAGGCCAACCCGGAGAAGATC GCGGCCAACACCAGCATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGAGTCACAGGATGCCTTGCGG CTCTGAGCCGTTTCATCTCGCGCCTCGGCGAAAGAGGCCTACCTCTGTACCGCCTCTTAAGGAAGGCCGAGT GCTTCACTTGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCAT CTTGGTGCCCCCCGCTGCCGGAGAAGCCCTCTTGATCTACGTCACCACGACCACTCAGGTGGTTAGCGCCG CGATTGTGGTTGAGAGACGAGAAGAGGGGCATGCATTGCCCGTACAGAGGCCAGTCTACTTCATCAGTGAG GTACTGTCCGAGACCAAGATCCGCTACCCACAAATTCAGAAGCTGCTGTACGCAGTGATCCTGACACGACGG AAGTTGCGACACTACTTCAAGTCTCATCCGGTGACTGTGGTGTCATCCTTCCCCCTGGGGGAGATCATCCAG TGCCGAGAGGCCTCGGCTAGAATTGCAAAGTGGGCGGTGGAAATCATGGGCGAGACGATCTCGTTCGCCC CTCGGAAGGCCATCAAGTCCCAGGTCTTGGCGGACTTTGTGGCTGAATGGGTCGACACCCAGCTCCCAACA GCTCCGATCCAACCGGAACTCTGGACCATGTTTTTCGACGGGTCACTGATGAAGACAGGAGCAGGCGCAG GCCTGCTCTTGATCTCGCCCCTCAAGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAA TGTGGCTAAGTACGAGGCTCTAGTCAACGGGTTGCGCATCGCCATCGAGCTGGGGgtctgacgcctcgacgctcgt ggtgactcgcagCTCGTCATCGACCAAGTCATGAAGAACTCCCACTGCCACGACCCGAAGATGGAGGCCTACTG CGATGAGGTTCGGCGCCTGGAAGACAAGTTCTACGGGCTCGAGCTCAACCACATCGCCCGACGCCACAAC GAGACTGCGGACGAGCTGGCTAAAATAGCCTCGGGGCGAACAACGgttcccccagacgtcttctcccgagacctgcat caaccctccgtcaagaccgacgacacgcccgagcccgagacaccctcggcttagtccgaggcaccctcggctcagtccgaggcgccatcgg ctcggcccgaggcaccctcggctcaacccgaggcaccctcggcccccgagggtgaggcactgcgcatcgaggaggagcggagaggggtc atgcctaatcgaaactggcagaccccgtacctgcaatatctccgccgaggagagctacccctcgaccaagccgaagcttggcggttggcgc ggcgcgccaagtcgttcgtcttgctgggagacgagaaggagctctaccaccgcagcccctcgggcatcctccagcgatgcatttccatcgcc gaaggccaggagctcctacaagagatacactcgggggcttgtggccatcacgcagcacctcgagcccttgttggaaacgccttccgacaag gtttctactggccgacggcggtggccgacaccactagaattgtccgcacctgcgaagggtgtcagttctacacaaggcagacccacctaccc gcttaggccctgcagaccatacccatcacctggtcatttgttgtgtggggtctggacctagttggccccttgcagAAGGCACCCGGGGG CTACACGCATCTGTTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGGTCCGACCCCTAAACAGCATCAG GTCCGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGGTCCCGAACTCCATCATCACCGA CAACGGCACCCAGTTCACCGGCAGAAAGTTCCTGGACTTCTGCGAGGATCACCACATCTGGGTGGACTGG GCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGGCATGATTCTACAAGGACTCAA GCCTCGAATCTACAACGACCTCAACAAGTTCGGCAAGCGGTGGATGAAGGAACTCCCCTCGGTGGTCTGGA GTCTGAGGACGACGCTGAGCCGGGCCACGGGCTTCACACCGTTCTTTCTAGTCTATGGGGCCGAGACCGTC TTGCCCATAGACTTAGAATACGGTTCCCCGAGGACGAGGGCCTACGACGACCAAAGCAATCGAGCTAATCG AGAAGACTCACCGGACCAGCTGGAAGAGGCTCGGGACATGGCCTTACTACACTCGGCGCGGTACCAGCAG TCCTTGCGACGCTACCACGCCCGAGGGGTTCGGTCCCGAGACCTCCAGGTGGGCGACCTGGTGCTTCGGCT GCGACAAGACGCCCGAGGGCGGCACAAGCTCATGCCTCCCTGGGAAGGGTCGTTCGTCATCGCCAAAGTT CTGAAGCCTGGGACGTACAAGCTGGCCAACAGTCAAGGCGAGGTCTACAGCAACGCTTGGAACATCCGAC AGCTACGTCGCTTCTACCCTTAA Protein sequence: MPFSLRNAGATYQRCMNHMFGEHIGRTVEAYVDDIVVKTRKASDLLSDLEATFRCLKAKGVKLNPEKCVFGVP RGMLLGFIVSERGIEANPEKIAANTSMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEA EEALGNLKALLTNAPILVPPAAGEALLIYVTTTTQVVSAAIVVERREEGHALPVQRPVYFISEVLSETKIRYPQIQKLL YAVILTRRKLRHYFKSHPVTVVSSFPLGEIIQCREASARIAKWAVEIMGETISFAPRKAIKSQVLADFVAEWVDTQL PTAPIQPELWTMFFDGSLMKTGAGAGLLLISPLKKHLRYVLRLHFPASNNVAKYEALVNGLRIAIELGLVIDQVM KNSHCHDPKMEAYCDEVRRLEDKFYGLELNHIARRHNETADELAKIASGRTTKAPGGYTHLLVAIDKFSKWIEVR PLNSIRSEQAVAFFTNIIHRFGVPNSIITDNGTQFTGRKFLDFCEDHHIWVDWAAVAHPMTNGQVERANGMIL QGLKPRIYNDLNKFGKRWMKELPSVVWSLRTTLSRATGFTPFFLVYGAETVLPIDLEYGSPRTRAYDDQSNRAN REDSPDQLEEARDMALLHSARYQQSLRRYHARGVRSRDLQVGDLVLRLRQDARGRHKLMPPWEGSFVIAKVL KPGTYKLANSQGEVYSNAWNIRQLRRFYP Segment 3 82383 – 88664 Forward Strand Exon 1: 82383 - 83722 Exon 2: 84124 - 84298 1338bp 174bp Exon 3: Exon 4: Exon 5: Exon 6: Exon 7: 84369 - 85018 85130 - 85433 85920 - 86500 86862 - 87035 87327 - 88664 684bp 303bp 579bp 174bp 1338bp RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses. RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI; rve[pfam00665], Integrase core domain; DUF4370[pfam14290], Domain of unknown function (DUF4370); RT_DIRS1[cd03714], RT_DIRS1: Reverse transcriptases (RTs) occurring in the DIRS1 group of retransposons. RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. PRK12829[PRK12829], short chain dehydrogenase; Provisional PHA03307[PHA03307], transcriptional regulator ICP4; Provisional >ATGGCGGCCGACAACCCGCCCGCCGGCGGCGGAATCGATGACGTCTTCCCCACGTGGCGGAAGAACGAC ATTCGGGCTTGTCCCGTCCCCTCCCCCGTCGACGGAGGAGGAGGCGGGGCAACCAAGGCCAAGCAGGAG GCGGCACCTCGTCGGCTATCGAGCGAGTCGACGGCGCCGGTGCCCCCAACGAGGGGCGCGATGGGCATCG ACATCGCGTCTGAGACGAAGACGAGCGCCGTCTCCCCGCAACACGCCAACTCCAAGCAAACGGACGACGC CAGCACGCTCGCAAAAGACTTGTTGGGCGTCACCCTCGTACCTGAGACGACGGTGCAGTCTACCCCTGACG TGACTTCGTCACCGCCCGTCGACCAAGACGTACCGACCGATTCCCATCTCGCGCCTTTTGGATTCAGCCTCG ACCCACCAAGCGACTTCGCTTTGGTGGACGCTTTCATAGAGGCGAGTCCAAACCCTCCGGGGTATCGTGTG CGGTCACCCTGGGACCGGCTGACAGCCGTCTCGACCTACGGGCCCTCGGGTTCCGAGGAAGATGACGAGC CCGACTTTTGTTGGGATTTCTCTGGACTTGGTAACCCCAGTGCCATGCGGGACTTCATGACCACATGCGACT ACTGCCTTTCCGACTGTTCCGACGGTAGCCGCAGCCTCGGCGACGAGGACTATGGCCCAAGTCGTGAATGT TTCCACGTCGACCTAGGGGGTCCCGGCGAAGGAAACCATCCTGGTATACCGGAAAATGGTGATCCCCCTAG GCCTGCGCCTCGCGTTGACATCCTACGGGAGCTAGCTGTGGTCCCAGTCCCTGCGGGGGTCAGGACTCACA GCTCGAGCAAATCTGCGAGATGCAGGCCAGGCTCGACGAGGGAGCAGGAACACTTGAGCCGTTCCGCCG GGACATCGGGCAGGAATGGGCAGGCCAACCTCCGGCCGGAGAAGCGCGCCATCTACCCCAGGGCATCCAA CACCGCATCGCCGACGATGTCAGGGCAAGGCCGCCACCGGCCTCCAGTGGGGTCGGCCAGAACCTGGCTG CAGCGGCAATACTTCTCCGCGCGATGCCGGAGCCATCTACCACCGAGGGGCGGCGTATCCAGGGAGAGCTC AAGAATCTCCTGGAGGATGTCGCGGTCCGACGGGCCGAAAGCTCCGCCTCCCGAAGGCAGGGGTACCCCT CGGAACATCGCGCCGCGACTTCCCAATTCATGCGGAAAGCCTCGGTCCACACCGGGCGCACGCGCAACACA GCGCCTGCGGCCCTGGGTCGCCTCGGCAACGAACACCCTCACCGCAACCGTCGAACCCACCTCGACGAGA gggtgcgccgaggctaccaccccaggcgtgggggacgctacgacagcggggaggattggagtccctcgcccgaaccacccggtccgcag gctttcagccgggccatacgacgggcgccgttcccgacccggttccgaaccccgactactatcacaaagtactcgggggagacgagaccgg aactgtggctcgcggactaccggctagcctgccacctgggtggaacagacgatgacaatctcatcatccggaacctccccctgttcctctccg acaccgctcgagcctggctggagcacctgcctccggggcagatctccaactaggacgacctggtccaagccttcgccggcaacttccagggt acgtatgtgtgccctgggaactcctgggatctccaaaGCTGCCGCCAGCAGCCGGGGGAGTCTCTCTGGGACTACATCC GGCAATTCTCGAAGCAGCGCACCGAGTTGCCCAATGTCACCGACTCGGATGTCATCGGCGCGTTCCTCGCC GACACCACTTGCCGCGACCTGGTTAGCAAGCTGGGTCGCAAGACCCCCACCAGGGCGAGTGaggtgatggac atcgccaccaagttcgcctctggctaGGATGCGGTTGAGGCCATCTTCCGGAAGGACAAGCAGCCCCAGGGCCGCC CACCGGAAGATGTCCCCGAGGCGTCAACTCAGCGCGGCATCAAGAAGAAAGGCAAGAAGAAGTCGCAAG CAAAACGCGACGCCGCCGATGCGAACTTTGTCGCCGCCGCCGAGTACAAGAACCCTCGGAAACCTCCTGG AGGTGCCAATCTCTTCGACAAGATGCTCAAGGAGCCGTGCCCCTGTCATCAGGGGCCCGTCAAGCACACCC TTGAGGAGTGCGCCATGCTTCGGCGCCACTTTCACAAAGCCGGGCCACCTGCGGAGGGTGGCCGGGCCCG CGACGACGATAAGAAGGAGGATCACAAGGCAGGAGAGTTCCCCGAGGTCCACGACTGCTTCATGATCTAC GGTGGGCAAGTGGCGAACGCCTCGGCTCGGCACCACAAGCAAGAGCGTCGGGAGGTCTGCTCGGTAAAG GTGGCGGCGCCAGTCTACCTAGACTGGTCCGACAAGCCCATCACCTTCGACCAGGGCGACCACCCCGACCG CGTGCCGAGCCTGGGGAAGTACCCGCTCGTTGTCGACCCCGTCATCGGCAACGTCAGGCTCACCAAGGTCC TCATGGACGGAGGCAGCAGCCTCAACGTCATCTACGCCAAGACCCTCGGGCTCCTGCGGATCGATCTGTCCT Cggtacgggcaggagctgcgccttttcacgggatcatccctgggaagcgcgtccagcccctcggacaactcgatctacccgtctgctttggg acaccctccaacttctgaaagGAGACCCTCACGTTCGAGGTGGTCGGGTTTCGAGGAACCTACCACGCAGTGCTG AGGAGGCCATGCTACGCCAAGTTCATGGTCGTCCCCAACTACACCTACCACAAGCTAAAGATGCCAGGCCCC AACGGGGTCATCACCGTCGGCCCCACGTACCGACACGCGTACGAATGCGACGTGGAGTGCATGGAGTACGC CGAGGCCCTCGCCAAATCCGAGGCCCTCATCGCCGACCTGGAGAGCCTCTCCAAGGAGGCGCCAGACGTG AAGCGCCACACCAGCAACTTCGAGCCAACGGAGATGggtaagttcgtccctctcaacaccagcaacgatacctccaagctg atccggatcgggctccgagctcgaccccaaataggaagcagtctcgtcgactttctccgtgcaaacaccgatgtttttgcatggaatccctcgg acatgcccggcataccgagggatgtcgccgagcactcgctggatatccgagctagagcccgacccgtgaagcagcctctgcgccggttcga cgaagaaaagcgcagagccataggcgaggagatccacaagctaatggcggtagggttcatcaaagaggtattccatcccgagtggcttgc caaccctgtgcttgtgagaaagaaaggagggaaatggcgtatgtgtgtagactacactggtctaaacaaagcatgtccaaaagttccctacc ctctgcctcgcatcgatcaaatcgtggattccactgctgggtgcgaaaccctgtctttcctcgatgcctactcagGGTATCGCCAAATCA GGATGAAAGAGTCCGACCAGCTCGCGACTTCTTTCATCACACCTTTCGGCATGTACTGCTATGTTACCATGTC GTTTGGTTTGAGGAATGCGGGTGCGACATACCAAAGGTGCATGAACCACGTGTTCGGCGAACACATTGGTC GAACGGTCGAGGCTTACATCGATGACATCGTAGTCAAGACGAGGAAAGCCTCTGACCTCCTTTCCGACCTTG AAACGACATTCTGGTGTCTCAAGGCGAAAGGTGTAAAGCTCAATCCCGAGAAGTGCGTCTTCGGGGTCCCC CAAGGCTTGCTCTTGGGGTTTATCGTCTCCGAGCGGGGCATCGAGGCCAACCCAGAGAAAATCGTGGCCAT CACCAACATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGGGTCACGGGGTGCCTTGCGGCTCTGAGC CGTTTCATCTCACGCCTCGGCGAAAGAGGCCTGCCTCTGTACCGCCTCTTAAGGAAGGCCGAGTGCTTCACT TGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCATCTtggtgccc ccgcggccggagaagccctcttgatctacgtcgccgctaccactcaggtggtcagcgccgcgatcgtggttgagagacgagaagagggaca tgcattgcctgtccagaggccagtctacttcgtcagtgaggtactgtccgagaccaagatccgctacccacaaattccgagtctcatccggtga ctgtggtgtcatctttccccctgggggagatcatccagtgccgagaggcctcgggtaggattgcaaagtgggcggtggaaatcatgggcgag acaatctcgttcgccactcgtaaggccataaagtcccaagtcttggcggactttgtggctgaatgggtcgatacccaGCTCCCGACAGC TCCGATCCAACCGGAACTCTGGACCATGTTTTTTGACGGGTCGCTGATGAAGACAGGGGCAGGCGCGGGC CTGCTCTTCATCTCGCCCCTCGGGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAATG TGGCCGAGTACGAGGCTCTggtcaacgggttgcgcgtcgccatcgagctagggatccgacgtctcgacgctcgcggtgactcgtagc tcgtcattgactaagtcatgaagaactcccacttctgcgactcgaagatggaagcctactgcgatgaggttcggcgcctggaggacaagttct atgggctcgagttcaaccacatcgcccgacgctacaacgagactgcggacaagctggctaagatagcctcggggcaaacaacggttccccc ggacgtcttctcctgagacctgcatcaaccctccgtcaagACCGACGACACGCCCGAGCCCGAGAAGGCCTCGGCCCAGC CCGAGGCACCCTCGGCCCCCGAGGATGAGGCACTGCGTGTCGAGGAGGAGCGGAGCGGGGTCACGCCTA ATCGAAACTGGCAGACCCCGAACCTGCAATATCTCCACCGAGGAGAGCTACCCCTCGACCGAGCCGAAGCT CGGCGGTTGGCGCGGCGTGCCAAGTCGTTCGTCTTGCTGGGGGACGGGAAGGAGCTCTACCATCGCAGCC CCTCAGGCATCCTCCAGCAATGCATATCCATCACCGAAGGCCAGGAGCTCTTACAAGAAATACACTCGGGGG CTTGCGGGCATCACGCGGCGCCCCGAGCCCTTGTTGGGAACGCCTTCCGACAAGGTTTCTACTGGCCAACC GCGGTGGCCGACGCCACTAGAATTGTTCGCACCTGCCAGGGGTGTCAATTCTACGCAAGGCAGACTCACCT TCCCGCCCAGGCTCTACAGACCATACCCATCACCTGGTCGTTTGCTGTGTGGGGTCTGGACCTCGTCGGCAC CTTGCAGAAGGCACCCGGGGGCTACACGCACCTGCTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGG TCCGACCCCTAAACAGCATCAGGTCTGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGG TCCCGAACTCCATCATCACCGACAACGACACCCAGTTCACCGACAGAAAGTTCCTGGACTTCTGCGAGGATC ACCACATCCGGGTGGACTGGGCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGG CATGATCCTGCAAGGACTCAAGCCGTGGATCTACAACAACCTTAACAAGTTCGGCAAGCGATGGATGAAGG AGCTCCCCTCGGTGGTCTGGAGTCTGAGGACAACGCCGAGCCGAGCCACGGGCTTCACACCGTTCTTTCTA GTCTATGGGGCCGAGGCCATCTTGCCCATAGACTTAGAATACGGTTCCCCAAGGACGAGGGCCTACAACGA CCAAAGCAATCGAGCTAACCGAGAAGACTCACTGGACCAGCTGGAAGAGGCTCGGAACATGGCCTTCCTA CACTCGGCGCGGTATCAGCAGTCCCTGCGACGCTACCACGCCCGAAGGGTTCGGTCCCGAGACCTCCAGGT GGGCGACTTGGTGCTTCGGCTGCGACAAGACGCCCGAGGGCGGCACAAGCTCACGCCTCCCTGGGAAGG GTCGTTCGTCATCGCCAAGGTTCTGAAGCCCGGGACGTATAAGCTGGCCAACAGTCAAGGCGAGGTCTACA ACAACGCTTGGAACATCCGATAG Protein sequence: MAADNPPAGGGIDDVFPTWRKNDIRACPVPSPVDGGGGGATKAKQEAAPRRLSSESTAPVPPTRGAMGIDIA SETKTSAVSPQHANSKQTDDASTLAKDLLGVTLVPETTVQSTPDVTSSPPVDQDVPTDSHLAPFGFSLDPPSDFA LVDAFIEASPNPPGYRVRSPWDRLTAVSTYGPSGSEEDDEPDFCWDFSGLGNPSAMRDFMTTCDYCLSDCSDG SRSLGDEDYGPSRECFHVDLGGPGEGNHPGIPENGDPPRPAPRVDILRELAVVPVPAGVRTHSSSKSARCRPGS TREQEHLSRSAGTSGRNGQANLRPEKRAIYPRASNTASPTMSGQGRHRPPVGSARTWLQRQYFSARCRSHLP PRGGVSRESSRISWRMSRSDGPKAPPPEGRGTPRNIAPRLPNSCGKPRSTPGARATQRLRPWVASATNTLTATV EPTSTRGCRQQPGESLWDYIRQFSKQRTELPNVTDSDVIGAFLADTTCRDLVSKLGRKTPTRASEDAVEAIFRKD KQPQGRPPEDVPEASTQRGIKKKGKKKSQAKRDAADANFVAAAEYKNPRKPPGGANLFDKMLKEPCPCHQG PVKHTLEECAMLRRHFHKAGPPAEGGRARDDDKKEDHKAGEFPEVHDCFMIYGGQVANASARHHKQERREV CSVKVAAPVYLDWSDKPITFDQGDHPDRVPSLGKYPLVVDPVIGNVRLTKVLMDGGSSLNVIYAKTLGLLRIDLS SETLTFEVVGFRGTYHAVLRRPCYAKFMVVPNYTYHKLKMPGPNGVITVGPTYRHAYECDVECMEYAEALAKSE ALIADLESLSKEAPDVKRHTSNFEPTEMGYRQIRMKESDQLATSFITPFGMYCYVTMSFGLRNAGATYQRCMN HVFGEHIGRTVEAYIDDIVVKTRKASDLLSDLETTFWCLKAKGVKLNPEKCVFGVPQGLLLGFIVSERGIEANPEKI VAITNMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEAEEALGNLKALLTNAPILLPTAPI QPELWTMFFDGSLMKTGAGAGLLFISPLGKHLRYVLRLHFPASNNVAEYEALTDDTPEPEKASAQPEAPSAPED EALRVEEERSGVTPNRNWQTPNLQYLHRGELPLDRAEARRLARRAKSFVLLGDGKELYHRSPSGILQQCISITEG QELLQEIHSGACGHHAAPRALVGNAFRQGFYWPTAVADATRIVRTCQGCQFYARQTHLPAQALQTIPITWSFA VWGLDLVGTLQKAPGGYTHLLVAIDKFSKWIEVRPLNSIRSEQAVAFFTNIIHRFGVPNSIITDNDTQFTDRKFLD FCEDHHIRVDWAAVAHPMTNGQVERANGMILQGLKPWIYNNLNKFGKRWMKELPSVVWSLRTTPSRATGF TPFFLVYGAEAILPIDLEYGSPRTRAYNDQSNRANREDSLDQLEEARNMAFLHSARYQQSLRRYHARRVRSRDL QVGDLVLRLRQDARGRHKLTPPWEGSFVIAKVLKPGTYKLANSQGEVYNNAWNIR Augustus gene prediction Augustus predicted 13 genes. The predicted genes were then translated into peptides. These peptides were used as queries to run Blastp in the swissprot database. Only 2 of them had significant hits. One belongs to the Reverse transcriptases (RTs) superfamily, the other belongs to the RNase H superfamily. Segment 1: 65858 --- 67411 CDS 65858 --- 67411 1553bp RT_LTR[cd01647]: Reverse transcriptases (RTs) from retrotransposons and retroviruses which have long terminal repeats (LTRs) in their DNA copies but not in their RNA template. RT_Rtv[cd01645]: Reverse transcriptases (RTs) from retroviruses (Rtvs). RT_ZFREV_like[cd03715]: A subfamily of reverse transcriptases (RTs) found in sequences similar to the intact endogenous retrovirus ZFERV from zebrafish and to Moloney murine leukemia virus RT. >ATGCCCGGCATACCGAGGGATGTCGCCGAGCACTCGCTGGATATCCGAGCTGGAGCCCGACCCGTGAAGC AGCCTTTGCGCCGATTCGACGAAGAAAAGCGCAGAGCCATAGGCGAGGAGATCCACAAGCTAATGGCGGC AGGGTTCATCAAAGAGGTATTCCACCCCGAATGGCTTGCCAACCCTGTGCTTGTGAGAAAGAAAGGAGGG AAATGGCGGATGTGTGTAGACTACACTGGTCTAAACAAAGCATGTCCGAAAGTTCCCTACCCTCTACCTCGCA TCGATCAAATCGTGGATTCCACTGCTGGGTGCGAAACCCTATCTTTCCTTGATGCCTACTCGGGGTATCACCA GATCAGGATGAAAGAGTCCGACCAGCTCGCGACTTCTTTCATCACACCCTTCGGCATGTACTGTTATGTTACC ATGCCATTCAGTTTGAGGAATGCGGGTGCAACGTACCAACGGTGCATGAACCACATGTTCGGCGAACACATT GGCCGAACGGTCGAGGCCTACGTCGATGACATCGTAGTCAAGACGAGGAAAGCCTCCGACCTCCTTTCCGA CCTTGAAGCGACATTCCGATGTCTCAAGGCGAAAGGCGTGAAGCTCAATCCCGAGAAATGTGTCTTCGGGG TTCCACGAGGCATGCTCTTGGGGTTCATCGTCTCCGAGCGGGGCATCGAGGCCAACCCGGAGAAGATCGC GGCCAACACCAGCATGGGGCCCATCAAGGACTTGAAAGGCGTACAGAGAGTCACAGGATGCCTTGCGGCT CTGAGCCGTTTCATCTCGCGCCTCGGCGAAAGAGGCCTACCTCTGTACCGCCTCTTAAGGAAGGCCGAGTG CTTCACTTGGACCCCTGAGGCCGAGGAAGCCCTCGGGAACCTGAAGGCGCTCCTCACGAACGCGCCCATCT TGGTGCCCCCCGCTGCCGGAGAAGCCCTCTTGATCTACGTCACCACGACCACTCAGGTGGTTAGCGCCGCG ATTGTGGTTGAGAGACGAGAAGAGGGGCATGCATTGCCCGTACAGAGGCCAGTCTACTTCATCAGTGAGGT ACTGTCCGAGACCAAGATCCGCTACCCACAAATTCAGAAGCTGCTGTACGCAGTGATCCTGACACGACGGA AGTTGCGACACTACTTCAAGTCTCATCCGGTGACTGTGGTGTCATCCTTCCCCCTGGGGGAGATCATCCAGT GCCGAGAGGCCTCGGCTAGAATTGCAAAGTGGGCGGTGGAAATCATGGGCGAGACGATCTCGTTCGCCCC TCGGAAGGCCATCAAGTCCCAGGTCTTGGCGGACTTTGTGGCTGAATGGGTCGACACCCAGCTCCCAACAG CTCCGATCCAACCGGAACTCTGGACCATGTTTTTCGACGGGTCACTGATGAAGACAGGAGCAGGCGCAGG CCTGCTCTTGATCTCGCCCCTCAAGAAGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAAT GTGGCTAAGTACGAGGCTCTAGTCAACGGGTTGCGCATCGCCATCGAGCTGGGGGTCTGA Protein sequence: MPGIPRDVAEHSLDIRAGARPVKQPLRRFDEEKRRAIGEEIHKLMAAGFIKEVFHPEWLANPVLVRKKGGKWR MCVDYTGLNKACPKVPYPLPRIDQIVDSTAGCETLSFLDAYSGYHQIRMKESDQLATSFITPFGMYCYVTMPFSL RNAGATYQRCMNHMFGEHIGRTVEAYVDDIVVKTRKASDLLSDLEATFRCLKAKGVKLNPEKCVFGVPRGMLL GFIVSERGIEANPEKIAANTSMGPIKDLKGVQRVTGCLAALSRFISRLGERGLPLYRLLRKAECFTWTPEAEEALG NLKALLTNAPILVPPAAGEALLIYVTTTTQVVSAAIVVERREEGHALPVQRPVYFISEVLSETKIRYPQIQKLLYAVILT RRKLRHYFKSHPVTVVSSFPLGEIIQCREASARIAKWAVEIMGETISFAPRKAIKSQVLADFVAEWVDTQLPTAPI QPELWTMFFDGSLMKTGAGAGLLLISPLKKHLRYVLRLHFPASNNVAKYEALVNGLRIAIELG Segment 2: 86898 --- 88664 2 exons 1 CDS 2 CDS 86898---87090 87304---88664 192bp 1360bp RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI RVT_3[pfam13456], Reverse transcriptase-like; This domain is found in plants and appears to be part of a retrotransposon. RNase_H[cd06222], RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner RnhA[COG0328], Ribonuclease HI [DNA replication, recombination, and repair] PRK07238[PRK07238], bifunctional RNase H/acid phosphatase; Provisional PRK07708[PRK07708], hypothetical protein; Validated >ATGTTTTTTGACGGGTCGCTGATGAAGACAGGGGCAGGCGCGGGCCTGCTCTTCATCTCGCCCCTCGGGA AGCACCTACGCTACGTGCTACGCCTCCACTTCCCGGCGTCCAACAATGTGGCCGAGTACGAGGCTCTGGTCA ACGGGTTGCGCGTCGCCATCGAGCTAGGGATCCGACGTCTCGACGCTCGCggtgactcgtagctcgtcattgactaag tcatgaagaactcccacttctgcgactcgaagatggaagcctactgcgatgaggttcggcgcctggaggacaagttctatgggctcgagttca accacatcgcccgacgctacaacgagactgcggacaagctggctaagatagcctcggggcaaacaacggttcccccggacgtcttctcctg agaCCTGCATCAACCCTCCGTCAAGACCGACGACACGCCCGAGCCCGAGAAGGCCTCGGCCCAGCCCGAGG CACCCTCGGCCCCCGAGGATGAGGCACTGCGTGTCGAGGAGGAGCGGAGCGGGGTCACGCCTAATCGAA ACTGGCAGACCCCGAACCTGCAATATCTCCACCGAGGAGAGCTACCCCTCGACCGAGCCGAAGCTCGGCGG TTGGCGCGGCGTGCCAAGTCGTTCGTCTTGCTGGGGGACGGGAAGGAGCTCTACCATCGCAGCCCCTCAG GCATCCTCCAGCAATGCATATCCATCACCGAAGGCCAGGAGCTCTTACAAGAAATACACTCGGGGGCTTGCG GGCATCACGCGGCGCCCCGAGCCCTTGTTGGGAACGCCTTCCGACAAGGTTTCTACTGGCCAACCGCGGTG GCCGACGCCACTAGAATTGTTCGCACCTGCCAGGGGTGTCAATTCTACGCAAGGCAGACTCACCTTCCCGCC CAGGCTCTACAGACCATACCCATCACCTGGTCGTTTGCTGTGTGGGGTCTGGACCTCGTCGGCACCTTGCAG AAGGCACCCGGGGGCTACACGCACCTGCTGGTCGCCATCGACAAATTCTCCAAGTGGATCGAGGTCCGACC CCTAAACAGCATCAGGTCTGAACAGGCGGTGGCGTTCTTCACCAACATCATCCATCGCTTTGGGGTCCCGAA CTCCATCATCACCGACAACGACACCCAGTTCACCGACAGAAAGTTCCTGGACTTCTGCGAGGATCACCACAT CCGGGTGGACTGGGCCGCCGTGGCTCACCCCATGACGAATGGGCAAGTAGAGCGTGCCAACGGCATGATC CTGCAAGGACTCAAGCCGTGGATCTACAACAACCTTAACAAGTTCGGCAAGCGATGGATGAAGGAGCTCCC CTCGGTGGTCTGGAGTCTGAGGACAACGCCGAGCCGAGCCACGGGCTTCACACCGTTCTTTCTAGTCTATG GGGCCGAGGCCATCTTGCCCATAGACTTAGAATACGGTTCCCCAAGGACGAGGGCCTACAACGACCAAAGC AATCGAGCTAACCGAGAAGACTCACTGGACCAGCTGGAAGAGGCTCGGAACATGGCCTTCCTACACTCGGC GCGGTATCAGCAGTCCCTGCGACGCTACCACGCCCGAAGGGTTCGGTCCCGAGACCTCCAGGTGGGCGAC TTGGTGCTTCGGCTGCGACAAGACGCCCGAGGGCGGCACAAGCTCACGCCTCCCTGGGAAGGGTCGTTCG TCATCGCCAAGGTTCTGAAGCCCGGGACGTATAAGCTGGCCAACAGTCAAGGCGAGGTCTACAACAACGCT TGGAACATCCGATAG protein sequence: MFFDGSLMKTGAGAGLLFISPLGKHLRYVLRLHFPASNNVAEYEALVNGLRVAIELGIRRLDARDLHQPSVKTDD TPEPEKASAQPEAPSAPEDEALRVEEERSGVTPNRNWQTPNLQYLHRGELPLDRAEARRLARRAKSFVLLGDG KELYHRSPSGILQQCISITEGQELLQEIHSGACGHHAAPRALVGNAFRQGFYWPTAVADATRIVRTCQGCQFYAR QTHLPAQALQTIPITWSFAVWGLDLVGTLQKAPGGYTHLLVAIDKFSKWIEVRPLNSIRSEQAVAFFTNIIHRFGV PNSIITDNDTQFTDRKFLDFCEDHHIRVDWAAVAHPMTNGQVERANGMILQGLKPWIYNNLNKFGKRWMK ELPSVVWSLRTTPSRATGFTPFFLVYGAEAILPIDLEYGSPRTRAYNDQSNRANREDSLDQLEEARNMAFLHSAR YQQSLRRYHARRVRSRDLQVGDLVLRLRQDARGRHKLTPPWEGSFVIAKVLKPGTYKLANSQGEVYNNAWNI R