HW3

advertisement

Introduction to Bioinformatics

Homework assignment 3

Ensembl

In this question you will learn some features of the Ensembl genome browser, which was not taught in class. Don’t be afraid to 'wonder' a bit…

Go to the Ensembl ( http://www.ensembl.org/index.html

) genome browser.

(a) How many Primates genomes are already sequenced and listed in Ensembl?

(b) Go to the Homo sapiens link. Search for the ADARB1. Choose Ensembl protein_coding Gene: ENSG00000197381 (It should be the second result)

1) What is the genomic location?

2) How many transcripts can you see in the Ensembl report?

3) How do they differ?

4) What is the length (amino-acid sites) of the longest protein sequence?

5) How many SNPs are reported for the P78563-4 transcript? What are the locations and what are the SNPs types? (hint: look in the [peptide info]

(c) Scroll down to the Orthologue prediction section.

1) How many transcripts can you see for the Macaca mulatta (rhesus monkey)?

2) Go back to the Orthologue prediction section of the Human gene. Use the [align] option next to the Macaca mulatta . How many gaps/indels can you fined between human and rhesus monkey?

(d) Return again to the Human gene found in (b). Scroll down to the Paralogue prediction section.

1) What is the difference between paralogue and orthologue?

2) What is the level of similarity between ADARB1 and ADARB2 (hint: you don’t have to insert to the ADARB2 link).

3) What is their similar function?

Gene Ontology

Find the trypsin III from human in swissprot. What are the GO annotations for this trypsin.

1. Follow the “cellular component” ontology link and examine the file. Are there more specific terms than “extracellular space” (does it contain any child terms)?

2. Can you find out what other child ter ms exist for proteins which are “cellular component” but not “extracellular”?

3.

Search GO to find out the GO accession for the nucleolus cellular component?

Download