DUBLIN INSTITUTE OF TECHNOLOGY KEVIN STREET DUBLIN 8 ________________ BSc. (Honours) Degree in Computing Year 4 _______________ Sample Paper ____________ Bioinformatics 1 Mr. Denis Manley TBA Answer question 1 and two other questions. Question 1 is worth 40 marks. Questions, 2, 3, and 4 are worth 30 marks each. Q1 (a) Explain, using examples how, dynamic arrays; pattern matching/extraction and substitution, hash tables and two other functions can be used in the field of bioinformatics. (10 marks) (b) Given the following file format (refer to appendix for an sample example) > gene id, gene name , organism, date of sequence, number of bases DNA sequences [each line is 60 characters in length] Given two such fasta files called: 1. Elastase_Human.fasta 2. Elastase_Monkey.fasta write perl scripts that: i. Print the contents of the descriptor line of each file: Gene id; gene name; the organism; date of sequence and the number of bases for each of the files (6 marks) ii. Determine the % of matching between the sequences in both files: stating any assumptions you make. (9 marks) iii. Determines the length of the largest ORF in the first reading frame of both of the above fasta files, stating any assumptions you make, (15 marks) Q 2. (a) Describe, using suitable examples, the two steps involved in, the “Central Dogma of genetics”, converting a DNA strand into its corresponding amino acid strand. (10 marks) (b) Describe, using a suitable example the three types of point mutations and how they affect the of the conversion of a DNA sequence into an amino acid sequence. (10 marks) (c) Explain why certain (genotype) substitution mutations have no (phenotypic) impact on the organism. (10 marks) Q 3. (a) Distinguish, using suitable examples, the difference between an inducible and a repressible feedback transcriptional control in prokaryotic organisms (15 marks) (b) Q 4. (a) Explain, using a suitable example, any two types of alternative splicing and their impact on the resulting amino acid sequence. (15 marks) Explain the steps involved in attempting to find ORF in DNA sequences .(10 marks) (b) ORF in eukaryotic coding sequences can occur over different reading frames. Explain, using suitable examples, the ways this impacts on finding exons in Eukaryotic DNA sequences. (12 marks) (c) Explain two technique that can be used to find promoter sequences of genes (8 marks) Q 5. (a) What are the three types of “pair-wise” matching that can occur in nucleotide sequences (6 marks) (b) Explain how the “shot-gun” approach that can be used to help determine the sequence of an organisation’s geneome. (12 marks) (c) Explain, using a suitable example, how a dot plot matrix can be used to align pairs of sequences. (8 marks) (d) What are the two important considerations that need to be considered in the reconstruction of shot-gun sequences (4 marks) Appendix 1 The codon translation hash table: %conversion_code = ( "TTT"=> "F", "TCT"=> "S", "TAT"=> "Y", "TTC"=> "F", "TCC"=> "S", "TAC"=> "Y", "TTA"=> "L", "TCA"=> "S", "TAA"=> "_", "TTG"=> "L", "TCG"=> "S", "TAG"=> "_", "CTT"=> "L", "CCT"=> "P", "CAT"=> "H", "CTC"=> "L", "CCC"=> "P", "CAC"=> "H", "CTA"=> "L", "CCA"=> "P", "CAA"=> "Q", "CTG"=> "L", "CCG"=> "P", "CAG"=> "Q", "ATT"=> "I", "ACT"=> "T", "AAT"=> "N", "ATC"=> "I", "ACC"=> "T", "AAC"=> "N", "ATA"=> "I", "ACA"=> "T", "AAA"=> "K", "ATG"=> "_", "ACG"=> "T", "AAG"=> "K", "GTT"=> "V", "GCT"=> "A", "GAT"=> "D", "GTC"=> "V", "GCC"=> "A", "GAC"=> "D", "GTA"=> "V", "GCA"=> "A", "GAA"=> "E", "GTG"=> "V", "GCG"=> "A", "GAG"=> "E", ); "TGT"=> "C", "TGC"=> "C", "TGA"=> "_ ", "TGG"=> "W", "CGT"=> "R", "CGC"=> "R", "CGA"=> "R", "CGG"=> "R", "AGT"=> "S", "AGC"=> "S", "AGA"=> "R", "AGG"=> "R", "GGT"=> "G", "GGC"=> "G", "GGA"=> "G", "GGG"=> "G" Sample Fasta file (only partial DNA sequence is shown) >gi|171361, E. Coli, gamma-lyase(CYS3) gene, 5124bp GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTCGCTTGCGAAA. GCTACAGAGCCAACCCGGTGGACAAACTCGAAGTCATTGTGGACCGAATGAGGCTCAATAA