X - ODU Computer Science

advertisement
CSEN5303 - Midterm Exam Summer2015
College of Engineering
Department of Electrical Engineering and Computer Science
Course
Teacher
CSEN5303 – Bioinformatics
Ashraf Yaseen
Student Name
Student K#
Section
Date
Total points: 30
Mark your answer with X (You will only receive points for your answers in here)
25-pts
Question
Choice
Q1
Q2
A
B
Q3
Q4
Q5
X
Q6
X
Q10
X
Q11
X
X
Q16
Q17
Q12
Q13
Q14
X
X
Q15
X
X
X
X
X
Q18
Q19
Q20
Q21
A
Q22
Q23
Q24
X
X
X
Q25
X
C
D
Q9
X
X
D
B
Q8
X
C
Question
Choice
Q7
X
X
X
X
X
X
5-pts
TRUE
FALSE
Q26
Q27
X
X
Q28
Q29
Q30
Q31
X
X
X
X
Q32
Q33
Q34
X
X
X
Q35
X
Good Luck
1
CSEN5303 - Midterm Exam Summer2015
Multiple Choices
1. To use blastp to find all the proteins containing a stretch of 10 prolines (PPPPPPPPPPPP), you have
to ________________________ .
A. select a specific protein database
C. properly set the substitution matrix
B. uncheck the Low Complexity filtering box
D. check the Low Complexity filtering box
2. By default, the criterion used by most programs to assemble a multiple sequence alignment is ____ .
A. sequence similarity
C. structural similarity
B. functional similarity
D. evolutionary similarity
3. In a protein family, key amino acids like those involved in the catalytic sites ____ .
A. are not especially conserved or variable
B. are often mutated so the protein function can evolve
C. are allowed only a very specific type of variation
D. are highly conserved because of their importance
4. Which multiple-sequence alignment method should you use if you want to combine the output of
several methods into one single alignment?
A. Tcoffee
C. EXPRESSO
B. MCOFFEE
D. ClustalW2
5. What is the largest amino acid, often found in the hydrophobic core, and usually very conserved?
A. W, tryptophan
C. G, glycine
B. A, alanine
D. L, leucine
6. Gene duplication has been found to be one of the major reasons for genome expansion in
eukaryotes. In general, what would be the selective advantage of gene duplication?
A. Larger genomes are more resistant to spontaneous mutations.
B. Duplicated genes will make more of the protein product.
C. If one gene copy is nonfunctional, a backup is available.
D. Gene duplication will lead to new species evolution.
7. If you want literature information, what is the best website to visit?
A. OMIM
C. Entrez
B. PubMed
D. PROSITE
8. You have two distantly related proteins. Which BLOSUM or PAM matrix is best to use to compare
them?
A. BLOSUM45 or PAM250
C. BLOSUM45 or PAM10
B. BLOSUM80 or PAM250
D. BLOSUM80 or PAM10
9. Which of the following best describes the difference between global and local alignment?
A. Global alignment is usually used for DNA while local alignment is used for protein
B. Global alignment has gaps while local alignment does not
C. Global alignment finds the global maxima while local alignment finds the local maxima
D. Global alignment aligns the whole sequence while local alignment finds the best subsequence
that aligns
2
CSEN5303 - Midterm Exam Summer2015
10. You have a DNA sequence. You want to know which protein in the (NR) database is most similar to
some protein encoded by your DNA. Which blast version should you use?
A. blastn
C. blastp
B. blastx
D. tblastn
11. In PSSM, the score of any amino acid residue is assigned based on:
A. PAM or BLOSUM scoring matrix
B. Its background frequency of occurrence
C. Its frequency of occurrence in a multiple sequence alignment
D. The score of its neighboring residues
12. Which multiple-sequence alignment method should you use if you want to use the structural
information associated with some of your sequences?
A. Tcoffee
C. EXPRESSO
B. MCOFFEE
D. MAFFT
13. Given this segment from a blast result, what is the total raw score for this alignment? Assume that
the score of a match is +2, similar is +1, not similar is -2, open gap is -2, and extending gap is -1.
A. 0
B. -2
C. -3
D. 3
14. PAM matrices are based on global alignments of related proteins having at least ___ amino acid
identity
A. 85%
C. 95%
B. 25%
D. 50%
15. If you want to find the optimal local alignment between two sequences then you should use:
A. BLAST
C. Smith-Waterman algorithm
B. FASTA
D. Needleman-Wunsch algorithm
16. If you have a very large dataset (more than 500 sequences), which of these multiple-sequence
alignment methods is the most suitable?
A. Tcoffee
C. EXPRESSO
B. MUSCLE
D. ProbCons
17. Which of the fowling sentences is true?
A. Mutations of important positions (such as active sites) are almost always harmful
B. You can recognize important positions because they never mutate
C. MSAs reveal these conserved positions
D. All of the above
18. When comparing your sequence with itself you can discover:
A. Repeated domains
B. Motifs repeated many times (low complexity)
C. Mirror regions (palindromes) in nucleic acids
D. All of above
3
CSEN5303 - Midterm Exam Summer2015
19. Usually E values smaller than a certain threshold are considered to demonstrate homology. This
threshold is usually about
A. 10e+4
C. 10e-4
e-40
B. 10
D. 4e-10
20. Protein maturation can involve,
A. Removal of some fragments
B. Addition of lipids or sugars (glycosylation)
C. Chemical modifications
D. Any of the above
21. Which elements make up the secondary structure of proteins?
A. Hydrogen bonds, Van der Waals interactions, and disulfide bridges.
B. Multiple protein chains interacting to form one macromolecule.
C. Alpha helices, beta sheets, and loops.
D. Nucleotide binding motifs, protein channels, hydrophobic domains, and other like motifs.
22. One of the fowling is a database for proteins
A. UniProt
B. GenBank
C. PubMed
D. ENSEMBL
23. RNA contains 4 nucleotides,
A. A, G, C, U
B. G, C, U, T
C. A, G, C, T
D. A, D, G, C
24. To retrieve all protein sequences similar to yours, you should use
A. blastp
C. Google
B. Dotlet
D. ClustalW
25. Which of the following is used to experimentally determine protein 3D structure?
A. X-ray crystallography
C. Cryo-electron microscopy
B. Nuclear magnetic resonance (NMR) Spectroscopy
D. All of the above
4
CSEN5303 - Midterm Exam Summer2015
True/False
26. Bioinformatics is about the application of techniques from computer science to solve problems in
molecular biology.
True
27. A dot plot is a graphic representation of pairwise similarity
True
28. Two proteins that share 30% amino acid identity are 30% homologues
False
29. The default word size in blast is 2
False
30. PSI-BLAST will look deeper into the database for matches to your query protein sequence by
employing a scoring matrix that is customized to your query
True
31. BLOSUM matrices are based on global alignments
False
32. ~ 70% of Prokaryotic genome is coding for proteins, while ~ 5% of Eukaryotic genome is coding for
proteins
True
33. In general, homologous protein sequences have a common ancestor, a similar 3D structure, and
often a similar function
True
34. The aim of research in Bioinformatics is to understand the functioning of living things – to “improve
the quality of life”.
True
35. You can use PSI-Blast to find the optimal local alignment between two sequences
False
5
Download