Assignment II Using NCBI databases 1. Find the protein with the accession number: P23367 in the NCBI protein database. (10 points) a. How many amino acids are in the protein? 615 b. What is the function of the protein? DNA mismatch repair 2. Find the gene mutL of Escherichia coli. (15 points) a. How many records did you retrieve in the NCBI Gene database? 33 b. How many mutL genes does one Escherichia coli genome have? 1 3. Searching for the Homo sapiens g6pd protein in the NCBI protein database will result in records from both RefSeq and GenBank. (10 points) a. How many records are from GenBank, RefSeq, and SwissProt? GenBank – 124; RefSeq – 3; SwissProt - 1 b. Read about RefSeq and GenBank (e.g., in http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html#rsgbdiff). In which database you expect to find more records? Why? GenBank. It contains data from numerous individual laboratories, which allows for it to achieve wide-scale coverage, but also increases its redundancy. RefSeq data has been curated and is more regulated. 4. Find the tumor suppressor pp32r1 gene (accession number AF008216) in the nucleotide database. (15 points) a. What is the source organism and the chromosome from which the sequence has been obtained? Homo Sapiens; 4 b. At which nucleotide does translation start? 4453 c. How many amino acids are in the protein? 234 5. Using the NCBI cross-database search, find all entries for Human immunodeficiency virus 2 (HIV-2). (15 points) a. In which database will you be able to find how many coding sequences are in its genome? Entrez Genome b. How many coding sequences are there? 9 6. Using the NCBI genome database, find the entry for the genome of Aquifex aeolicus VF5 genome (without plasmids). (10 points) a. What is the GC content of its chromosome? 43% b. What is the length of its genome? 1,551,335 bp 7. Using the NCBI Genome Project database, answer the following questions: (10 points) a. How many chromosomes are in the genome of Saccharomyces cerevisiae? 16 b. How many different Saccharomyces species can be found in the Genome Project database? 8 species Saccharomyces [orgn] limits: Genome overview 8. Mutations on BRCA1 gene have been reported to be associated with the early onset of breast cancer. (15 points) a. How many non-synonymous mutations of human BRCA1 are in SNP database? Limiting the BRCA1 search in the SNP database to “coding nonsynonymous” and Homo Sapiens yields 1282 results Examine record rs70953662 in SNP database. b. What are the nucleotide alleles? T to A c. What is the amino acid change in between these alleles? Ile to Asn