Bioinformatics 1 - Computing - Dublin Institute of Technology

advertisement
DUBLIN INSTITUTE OF TECHNOLOGY
KEVIN STREET DUBLIN 8
________________
BSc. (Honours) Degree in Computing
Year 4
_______________
Sample Paper
____________
Bioinformatics 1
Mr. Denis Manley
TBA
Answer question 1 and two other questions.
Question 1 is worth 40 marks.
Questions, 2, 3, and 4 are worth 30 marks each.
Q1
(a)
Explain, using examples how, dynamic arrays; pattern matching/extraction and
substitution, hash tables and two other functions can be used in the field of bioinformatics.
(10 marks)
(b)
Given the following file format (refer to appendix for an sample example)
> gene id, gene name , organism, date of sequence, number of bases
DNA sequences [each line is 60 characters in length]
Given two such fasta files called:
1. Elastase_Human.fasta
2. Elastase_Monkey.fasta
write perl scripts that:
i.
Print the contents of the descriptor line of each file: Gene id; gene name; the
organism; date of sequence and the number of bases for each of the files
(6 marks)
ii.
Determine the % of matching between the sequences in both files: stating any
assumptions you make.
(9 marks)
iii.
Determines the length of the largest ORF in the first reading frame of both of the
above fasta files, stating any assumptions you make,
(15 marks)
Q 2. (a) Describe, using suitable examples, the two steps involved in, the “Central Dogma of
genetics”, converting a DNA strand into its corresponding amino acid strand.
(10 marks)
(b)
Describe, using a suitable example the three types of point mutations and how they
affect the of the conversion of a DNA sequence into an amino acid sequence.
(10 marks)
(c)
Explain why certain (genotype) substitution mutations have no (phenotypic) impact on
the organism.
(10 marks)
Q 3. (a)
Distinguish, using suitable examples, the difference between an inducible and a
repressible feedback transcriptional control in prokaryotic organisms
(15 marks)
(b)
Q 4. (a)
Explain, using a suitable example, any two types of alternative splicing and their
impact on the resulting amino acid sequence.
(15 marks)
Explain the steps involved in attempting to find ORF in DNA sequences
.(10 marks)
(b)
ORF in eukaryotic coding sequences can occur over different reading frames. Explain,
using suitable examples, the ways this impacts on finding exons in Eukaryotic DNA
sequences.
(12 marks)
(c)
Explain two technique that can be used to find promoter sequences of genes
(8 marks)
Q 5. (a) What are the three types of “pair-wise” matching that can occur in nucleotide
sequences
(6 marks)
(b)
Explain how the “shot-gun” approach that can be used to help determine the
sequence of an organisation’s geneome.
(12 marks)
(c)
Explain, using a suitable example, how a dot plot matrix can be used to align pairs of
sequences.
(8 marks)
(d)
What are the two important considerations that need to be considered in the
reconstruction of shot-gun sequences
(4 marks)
Appendix 1
The codon translation hash table:
%conversion_code = (
"TTT"=> "F", "TCT"=> "S", "TAT"=> "Y",
"TTC"=> "F", "TCC"=> "S", "TAC"=> "Y",
"TTA"=> "L", "TCA"=> "S",
"TAA"=> "_",
"TTG"=> "L", "TCG"=> "S",
"TAG"=> "_",
"CTT"=> "L", "CCT"=> "P",
"CAT"=> "H",
"CTC"=> "L", "CCC"=> "P", "CAC"=> "H",
"CTA"=> "L", "CCA"=> "P",
"CAA"=> "Q",
"CTG"=> "L", "CCG"=> "P", "CAG"=> "Q",
"ATT"=> "I", "ACT"=> "T",
"AAT"=> "N",
"ATC"=> "I", "ACC"=> "T",
"AAC"=> "N",
"ATA"=> "I", "ACA"=> "T",
"AAA"=> "K",
"ATG"=> "_", "ACG"=> "T",
"AAG"=> "K",
"GTT"=> "V", "GCT"=> "A",
"GAT"=> "D",
"GTC"=> "V", "GCC"=> "A", "GAC"=> "D",
"GTA"=> "V", "GCA"=> "A",
"GAA"=> "E",
"GTG"=> "V", "GCG"=> "A",
"GAG"=> "E",
);
"TGT"=> "C",
"TGC"=> "C",
"TGA"=> "_ ",
"TGG"=> "W",
"CGT"=> "R",
"CGC"=> "R",
"CGA"=> "R",
"CGG"=> "R",
"AGT"=> "S",
"AGC"=> "S",
"AGA"=> "R",
"AGG"=> "R",
"GGT"=> "G",
"GGC"=> "G",
"GGA"=> "G",
"GGG"=> "G"
Sample Fasta file (only partial DNA sequence is shown)
>gi|171361, E. Coli, gamma-lyase(CYS3) gene, 5124bp
GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTCGCTTGCGAAA.
GCTACAGAGCCAACCCGGTGGACAAACTCGAAGTCATTGTGGACCGAATGAGGCTCAATAA
Download