BT203IU Bioinformatics Assignment 1 Accessing NCBI databases Due date: 19:00 Mon. 28 Feb. 2022 (Submission via email: Bioinfosiu@gmail.com) Aim of assignment 1 Learn how to access and use NCBI databases Question 1: Search Taxonomy database for: 1) Homo sapiens, 2) Heterodoxus macropus, 3) Escherichia coli a. What is the common name of the species? b. How many nucleotide or protein sequence records do you find (show your search results in cropped windows)? Question 2: Use the name “plague thrips” to search the Nucleotide database. a. What is the scientific name of the plague thrips? b. How many sequence records do you find? c. Which genes or genomes of the plague thrips have been sequenced? d. Provide information of the most recent publication that reported the mitochondrial genome of the plague thrips including the authors, year and title of the publication, title of the journal, volume and page numbers. Question 3: Search PubMed for “Thanh NM” (International University). a. How many publications of Thanh NM were deposited in PubMed? b. List the common names of 2 aquatic animals that Thanh NM worked on. c. Provide information of publication by Thanh NM: year and title of the publication, title of the journal, volume and page numbers. Question 4: Search Genome database for Homo sapiens. a. How many records of genome assemblies did your search find? b. Provide the accession number (RefSeq database) for the chromosome 1 of Homo sapiens, the size of the chromosome 1. 1 c. Provide information of the most recent publication that reported the chromosome 1 (from above search) including the authors, year and title of the publication, title of the journal, volume and page numbers. Question 5: Use accession number “CU329670” to search the Nucleotide database. a. What is the type of sequence? What is the length of sequence? What is the full name of database division? b. What is the scientific name of organism? Go to the FEATURES section of the record. Link to the CDS to gain access to the first 5662 nucleotides of the sequence. c. Names of gene and its protein product and the length of protein. d. Write the first four amino acids. e. Write the nucleotide sequence of the coding strand that corresponds to these amino acids. f. Write the nucleotide sequence of the template strand that corresponds to these amino acids. (Note that the definition of the coding strand is the strand of DNA within the gene that is identical to the transcript and the template strand is the strand that is complementary to the coding strand.) 2