Sample Paper - Dublin Institute of Technology

advertisement
DUBLIN INSTITUTE OF TECHNOLOGY
KEVIN STREET DUBLIN 8
________________
BSc. (Honours) Degree in Information Technology
Year 4
_______________
Sample Paper
____________
Bioinformatics
Mr. Denis Manley
TBA
Answer question 1 and two other questions.
Question 1 is worth 40 marks.
Questions, 2, 3, and 4 are worth 30 marks each.
Q1
(a)
Explain, using examples how, dynamic arrays; pattern matching/extraction and
substitution, hash tables and two other functions can be used in the field of bioinformatics.
(10 marks)
(b)
Given the following file format (refer to appendix for an sample example)
> gene id, gene name , organism, date of sequence, number of bases
DNA sequences [each line is 60 characters in length]
Given two such fasta files called:
1. Elastase_Human.fasta
2. Elastase_Primate.fasta
write perl scripts that:
i.
Print the contents of the descriptor line of each file: Gene id; gene name; the
organism; date of sequence and the number of bases for each of the files
(6 marks)
ii.
Determine the % of matching between both sequences, assumptions you make.
(9 marks)
iii.
Determine and print the length of the all the open reading frames (ORF) in the first
reading frame of both of the above fasta files, stating any assumptions you make,
(15 marks)
Q 2. (a) Describe, using suitable examples, the two steps involved in, the “Central Dogma of
genetics”, converting a DNA strand into its corresponding amino acid strand.
(10 marks)
(b)
Describe, using a suitable example the three types of point mutations and how they
affect the of the conversion of a DNA sequence into an amino acid sequence.
(10 marks)
(c)
Explain why certain (genotype) substitution mutations have no (phenotypic) impact on
the organism.
(10 marks)
Q 3. (a)
Distinguish, using suitable examples, the difference between an inducible and a
repressible feedback transcriptional control in prokaryotic organisms
(15 marks)
(b)
Explain what you understand by the term Alternative Splicing in
(b)
Explain, using a suitable example, any two types of alternative splicing and their
impact on the resulting amino acid sequence.
(10 marks)
Q 4. (a)
Explain the steps involved in attempting to find ORF in DNA sequences
(5 marks)
.(10 marks)
(b)
ORF in eukaryotic coding sequences can occur over different reading frames. Explain,
using suitable examples, the ways this impacts on finding exons in Eukaryotic DNA
sequences.
(12 marks)
(c)
How could the DOT PLOT Matrix be used to facilate finding true ORF [you can
assume that the experimental amino acid sequences is known]
(8 marks)
Q 5. (a)
What is a pairwise sequence alignment score.
(3 marks)
(b)
Explain, using suitable examples, the significance of main three sets of values in the
P.A.M.or Blosum matrix.
(9 marks)
(c)
Distinguish between the numbers associated with each Matrix; e.g. Blosum 80 and
P.A.M. 180.
(10 marks)
(d)
The P.A.M matrices and the Blosum matrix, can have different numbers associated
with it. Discuss the most appropriate type of sequence similarity analysis for any two
of the following Matrices: P.A.M 40; P.A.M. 250 and the Blosum 80 and the Blosum
62.
(8 marks)
Appendix 1
The codon translation hash table:
%conversion_code = (
"TTT"=> "F", "TCT"=> "S", "TAT"=> "Y",
"TTC"=> "F", "TCC"=> "S", "TAC"=> "Y",
"TTA"=> "L", "TCA"=> "S",
"TAA"=> "_",
"TTG"=> "L", "TCG"=> "S",
"TAG"=> "_",
"CTT"=> "L", "CCT"=> "P",
"CAT"=> "H",
"CTC"=> "L", "CCC"=> "P", "CAC"=> "H",
"CTA"=> "L", "CCA"=> "P",
"CAA"=> "Q",
"CTG"=> "L", "CCG"=> "P", "CAG"=> "Q",
"ATT"=> "I", "ACT"=> "T",
"AAT"=> "N",
"ATC"=> "I", "ACC"=> "T",
"AAC"=> "N",
"ATA"=> "I", "ACA"=> "T",
"AAA"=> "K",
"ATG"=> "_", "ACG"=> "T",
"AAG"=> "K",
"GTT"=> "V", "GCT"=> "A",
"GAT"=> "D",
"GTC"=> "V", "GCC"=> "A", "GAC"=> "D",
"GTA"=> "V", "GCA"=> "A",
"GAA"=> "E",
"GTG"=> "V", "GCG"=> "A",
"GAG"=> "E",
);
"TGT"=> "C",
"TGC"=> "C",
"TGA"=> "_ ",
"TGG"=> "W",
"CGT"=> "R",
"CGC"=> "R",
"CGA"=> "R",
"CGG"=> "R",
"AGT"=> "S",
"AGC"=> "S",
"AGA"=> "R",
"AGG"=> "R",
"GGT"=> "G",
"GGC"=> "G",
"GGA"=> "G",
"GGG"=> "G"
Sample Fasta file (only partial DNA sequence is shown)
>gi|171361, homo sapiens, gamma-lyase(CYS3) gene, 5124bp
GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTCGCTTGCGAAA.
GCTACAGAGCCAACCCGGTGGACAAACTCGAAGTCATTGTGGACCGAATGAGGCTCAATAA
Download