HW3

advertisement

Joshua Wu SID 11174269

CS6325 Introduction to Bioinformatics

Homework Assignment III

Due on April 22

nd

, 2008 in class

Please type the answers to the questions 1-10.

1.

What are the four levels of protein structure?

Primary Structure, Secondary Structure, Tertiary Structure and Quternary

Structure.

2.

Why protein structure prediction is important?

Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. Its aim is the prediction of the three-dimensional structure of proteins from their amino acid sequences, sometimes including additional relevant information such as the structures of related proteins. In other words, it deals with the prediction of a protein's tertiary structure from its primary structure. Protein structure prediction is of high importance in medicine (for example, in drug design) and biotechnology

(for example, in the design of novel enzymes).

3. What is relationship between protein sequence, protein structure and protein function?

Protein structures is primarily determined by protein sequence.

Protein function is primarily determined by protein structure.

4. What are protein 2 nd structure motifs and list some popular 2 nd structure motifs?

2 nd

structure motifs are combinations of 2’ structural elements. For example

Helix-turn-helix, helix-loop-helix ( β-hairpin) , and Greek key ( βαβ) .

5. What are protein domains and list some protein domains?

Protein Domains:

1.

regions that display significant levels of sequence similarity.

2.

The minimal part of a gene that is capable of performing a function

3.

A region of a protein with an experimentally assigned function

4.

Region of a protein structure that recurs in different contexts and proteins

5.

A compact, spatially distinct region of a protein.

Examples:

Joshua Wu SID 11174269

1.

α domains:

Bundles of helices connected by loops

2.

βdomains: mainly antiparallel sheets, usually with 2 sheets forming sandwich.

3.

α/β domains: mainly parallel sheets with intervening helices, also mixed sheets.

4.

α+β domains: mainly segregated helices and sheets

5.

Multidomain(α&β): containing domains from more then one class.

6.

Membrane & cell-surface proteins

6. What is data mining?

Knowledge discovery from data. It is the extraction of interesting patterns or knowledge from huge amount of data.

7. What is the difference between single-linkage, complete-linkage and average-linkage hierarchical clustering methods?

In the Hierarchical clustering, with the distance matrix, the single-linkage takes the minimum distance pair between the cluster; the complete-linkage takes the maximum distance pair, while the average-linkage hierarchical calculates the average.

8. What is the difference between k-means and k-medoids clustering algorithms?

The k-means algorithm takes the mean value of the object in a cluster as a reference point, but K-medoids uses medoids, which is the most centrally located object in a cluster.

9. What is a max frequent pattern and what is a closed frequent pattern? Determine if a closed pattern must be a max pattern or if a max pattern must be a closed pattern.

An item set X is closed if X is frequent and there exists no super-pattern Y X, with the same support as X

An item set X is a max-pattern if X is frequent and there exists no frequent super-pattern Y X

Therefore, a max pattern must be a closed pattern

10. What is a bicluster? Why biclustering microarray data is important?

Biclustering is one technique that allows detection of all signals in the data.

The reason is it enables better representations:

-interrelated clusters (genes may belong more than one cluster)

-local signals (genes correlated over only a few conditions)

-noisy data (allows erratic genes to belong to no cluster)

Download