DUBLIN INSTITUTE OF TECHNOLOGY KEVIN STREET DUBLIN 8 ________________ BSc. (Honours) Degree in Information Technology Year 4 _______________ Sample Paper ____________ Bioinformatics Mr. Denis Manley TBA Answer question 1 and two other questions. Question 1 is worth 40 marks. Questions, 2, 3, and 4 are worth 30 marks each. Q1 (a) Explain, using examples how, dynamic arrays; pattern matching/extraction and substitution, hash tables and two other functions can be used in the field of bioinformatics. (10 marks) (b) Given the following file format (refer to appendix for an sample example) > gene id, gene name , organism, date of sequence, number of bases DNA sequences [each line is 60 characters in length] Given two such fasta files called: 1. Elastase_Human.fasta 2. Elastase_Primate.fasta write perl scripts that: i. Print the contents of the descriptor line of each file: Gene id; gene name; the organism; date of sequence and the number of bases for each of the files (6 marks) ii. Determine the % of matching between both sequences, assumptions you make. (9 marks) iii. Determine and print the length of the all the open reading frames (ORF) in the first reading frame of both of the above fasta files, stating any assumptions you make, (15 marks) Q 2. (a) Describe, using suitable examples, the two steps involved in, the “Central Dogma of genetics”, converting a DNA strand into its corresponding amino acid strand. (10 marks) (b) Describe, using a suitable example the three types of point mutations and how they affect the of the conversion of a DNA sequence into an amino acid sequence. (10 marks) (c) Explain why certain (genotype) substitution mutations have no (phenotypic) impact on the organism. (10 marks) Q 3. (a) Distinguish, using suitable examples, the difference between an inducible and a repressible feedback transcriptional control in prokaryotic organisms (15 marks) (b) Explain what you understand by the term Alternative Splicing in (b) Explain, using a suitable example, any two types of alternative splicing and their impact on the resulting amino acid sequence. (10 marks) Q 4. (a) Explain the steps involved in attempting to find ORF in DNA sequences (5 marks) .(10 marks) (b) ORF in eukaryotic coding sequences can occur over different reading frames. Explain, using suitable examples, the ways this impacts on finding exons in Eukaryotic DNA sequences. (12 marks) (c) How could the DOT PLOT Matrix be used to facilate finding true ORF [you can assume that the experimental amino acid sequences is known] (8 marks) Q 5. (a) What is a pairwise sequence alignment score. (3 marks) (b) Explain, using suitable examples, the significance of main three sets of values in the P.A.M.or Blosum matrix. (9 marks) (c) Distinguish between the numbers associated with each Matrix; e.g. Blosum 80 and P.A.M. 180. (10 marks) (d) The P.A.M matrices and the Blosum matrix, can have different numbers associated with it. Discuss the most appropriate type of sequence similarity analysis for any two of the following Matrices: P.A.M 40; P.A.M. 250 and the Blosum 80 and the Blosum 62. (8 marks) Appendix 1 The codon translation hash table: %conversion_code = ( "TTT"=> "F", "TCT"=> "S", "TAT"=> "Y", "TTC"=> "F", "TCC"=> "S", "TAC"=> "Y", "TTA"=> "L", "TCA"=> "S", "TAA"=> "_", "TTG"=> "L", "TCG"=> "S", "TAG"=> "_", "CTT"=> "L", "CCT"=> "P", "CAT"=> "H", "CTC"=> "L", "CCC"=> "P", "CAC"=> "H", "CTA"=> "L", "CCA"=> "P", "CAA"=> "Q", "CTG"=> "L", "CCG"=> "P", "CAG"=> "Q", "ATT"=> "I", "ACT"=> "T", "AAT"=> "N", "ATC"=> "I", "ACC"=> "T", "AAC"=> "N", "ATA"=> "I", "ACA"=> "T", "AAA"=> "K", "ATG"=> "_", "ACG"=> "T", "AAG"=> "K", "GTT"=> "V", "GCT"=> "A", "GAT"=> "D", "GTC"=> "V", "GCC"=> "A", "GAC"=> "D", "GTA"=> "V", "GCA"=> "A", "GAA"=> "E", "GTG"=> "V", "GCG"=> "A", "GAG"=> "E", ); "TGT"=> "C", "TGC"=> "C", "TGA"=> "_ ", "TGG"=> "W", "CGT"=> "R", "CGC"=> "R", "CGA"=> "R", "CGG"=> "R", "AGT"=> "S", "AGC"=> "S", "AGA"=> "R", "AGG"=> "R", "GGT"=> "G", "GGC"=> "G", "GGA"=> "G", "GGG"=> "G" Sample Fasta file (only partial DNA sequence is shown) >gi|171361, homo sapiens, gamma-lyase(CYS3) gene, 5124bp GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTCGCTTGCGAAA. GCTACAGAGCCAACCCGGTGGACAAACTCGAAGTCATTGTGGACCGAATGAGGCTCAATAA