CSEN5303 - Midterm Exam Summer2015 College of Engineering Department of Electrical Engineering and Computer Science Course Teacher CSEN5303 – Bioinformatics Ashraf Yaseen Student Name Student K# Section Date Total points: 30 Mark your answer with X (You will only receive points for your answers in here) 25-pts Question Choice Q1 Q2 A B Q3 Q4 Q5 X Q6 X Q10 X Q11 X X Q16 Q17 Q12 Q13 Q14 X X Q15 X X X X X Q18 Q19 Q20 Q21 A Q22 Q23 Q24 X X X Q25 X C D Q9 X X D B Q8 X C Question Choice Q7 X X X X X X 5-pts TRUE FALSE Q26 Q27 X X Q28 Q29 Q30 Q31 X X X X Q32 Q33 Q34 X X X Q35 X Good Luck 1 CSEN5303 - Midterm Exam Summer2015 Multiple Choices 1. To use blastp to find all the proteins containing a stretch of 10 prolines (PPPPPPPPPPPP), you have to ________________________ . A. select a specific protein database C. properly set the substitution matrix B. uncheck the Low Complexity filtering box D. check the Low Complexity filtering box 2. By default, the criterion used by most programs to assemble a multiple sequence alignment is ____ . A. sequence similarity C. structural similarity B. functional similarity D. evolutionary similarity 3. In a protein family, key amino acids like those involved in the catalytic sites ____ . A. are not especially conserved or variable B. are often mutated so the protein function can evolve C. are allowed only a very specific type of variation D. are highly conserved because of their importance 4. Which multiple-sequence alignment method should you use if you want to combine the output of several methods into one single alignment? A. Tcoffee C. EXPRESSO B. MCOFFEE D. ClustalW2 5. What is the largest amino acid, often found in the hydrophobic core, and usually very conserved? A. W, tryptophan C. G, glycine B. A, alanine D. L, leucine 6. Gene duplication has been found to be one of the major reasons for genome expansion in eukaryotes. In general, what would be the selective advantage of gene duplication? A. Larger genomes are more resistant to spontaneous mutations. B. Duplicated genes will make more of the protein product. C. If one gene copy is nonfunctional, a backup is available. D. Gene duplication will lead to new species evolution. 7. If you want literature information, what is the best website to visit? A. OMIM C. Entrez B. PubMed D. PROSITE 8. You have two distantly related proteins. Which BLOSUM or PAM matrix is best to use to compare them? A. BLOSUM45 or PAM250 C. BLOSUM45 or PAM10 B. BLOSUM80 or PAM250 D. BLOSUM80 or PAM10 9. Which of the following best describes the difference between global and local alignment? A. Global alignment is usually used for DNA while local alignment is used for protein B. Global alignment has gaps while local alignment does not C. Global alignment finds the global maxima while local alignment finds the local maxima D. Global alignment aligns the whole sequence while local alignment finds the best subsequence that aligns 2 CSEN5303 - Midterm Exam Summer2015 10. You have a DNA sequence. You want to know which protein in the (NR) database is most similar to some protein encoded by your DNA. Which blast version should you use? A. blastn C. blastp B. blastx D. tblastn 11. In PSSM, the score of any amino acid residue is assigned based on: A. PAM or BLOSUM scoring matrix B. Its background frequency of occurrence C. Its frequency of occurrence in a multiple sequence alignment D. The score of its neighboring residues 12. Which multiple-sequence alignment method should you use if you want to use the structural information associated with some of your sequences? A. Tcoffee C. EXPRESSO B. MCOFFEE D. MAFFT 13. Given this segment from a blast result, what is the total raw score for this alignment? Assume that the score of a match is +2, similar is +1, not similar is -2, open gap is -2, and extending gap is -1. A. 0 B. -2 C. -3 D. 3 14. PAM matrices are based on global alignments of related proteins having at least ___ amino acid identity A. 85% C. 95% B. 25% D. 50% 15. If you want to find the optimal local alignment between two sequences then you should use: A. BLAST C. Smith-Waterman algorithm B. FASTA D. Needleman-Wunsch algorithm 16. If you have a very large dataset (more than 500 sequences), which of these multiple-sequence alignment methods is the most suitable? A. Tcoffee C. EXPRESSO B. MUSCLE D. ProbCons 17. Which of the fowling sentences is true? A. Mutations of important positions (such as active sites) are almost always harmful B. You can recognize important positions because they never mutate C. MSAs reveal these conserved positions D. All of the above 18. When comparing your sequence with itself you can discover: A. Repeated domains B. Motifs repeated many times (low complexity) C. Mirror regions (palindromes) in nucleic acids D. All of above 3 CSEN5303 - Midterm Exam Summer2015 19. Usually E values smaller than a certain threshold are considered to demonstrate homology. This threshold is usually about A. 10e+4 C. 10e-4 e-40 B. 10 D. 4e-10 20. Protein maturation can involve, A. Removal of some fragments B. Addition of lipids or sugars (glycosylation) C. Chemical modifications D. Any of the above 21. Which elements make up the secondary structure of proteins? A. Hydrogen bonds, Van der Waals interactions, and disulfide bridges. B. Multiple protein chains interacting to form one macromolecule. C. Alpha helices, beta sheets, and loops. D. Nucleotide binding motifs, protein channels, hydrophobic domains, and other like motifs. 22. One of the fowling is a database for proteins A. UniProt B. GenBank C. PubMed D. ENSEMBL 23. RNA contains 4 nucleotides, A. A, G, C, U B. G, C, U, T C. A, G, C, T D. A, D, G, C 24. To retrieve all protein sequences similar to yours, you should use A. blastp C. Google B. Dotlet D. ClustalW 25. Which of the following is used to experimentally determine protein 3D structure? A. X-ray crystallography C. Cryo-electron microscopy B. Nuclear magnetic resonance (NMR) Spectroscopy D. All of the above 4 CSEN5303 - Midterm Exam Summer2015 True/False 26. Bioinformatics is about the application of techniques from computer science to solve problems in molecular biology. True 27. A dot plot is a graphic representation of pairwise similarity True 28. Two proteins that share 30% amino acid identity are 30% homologues False 29. The default word size in blast is 2 False 30. PSI-BLAST will look deeper into the database for matches to your query protein sequence by employing a scoring matrix that is customized to your query True 31. BLOSUM matrices are based on global alignments False 32. ~ 70% of Prokaryotic genome is coding for proteins, while ~ 5% of Eukaryotic genome is coding for proteins True 33. In general, homologous protein sequences have a common ancestor, a similar 3D structure, and often a similar function True 34. The aim of research in Bioinformatics is to understand the functioning of living things – to “improve the quality of life”. True 35. You can use PSI-Blast to find the optimal local alignment between two sequences False 5