Uploaded by Nguyễn Ngọc Vân

Bioinformatics NCBI Database Assignment

BT203IU Bioinformatics
Assignment 1
Accessing NCBI databases
Due date: 19:00 Mon. 28 Feb. 2022
(Submission via email: Bioinfosiu@gmail.com)
Aim of assignment 1
Learn how to access and use NCBI databases
Question 1: Search Taxonomy database for: 1) Homo sapiens, 2) Heterodoxus macropus, 3)
Escherichia coli
a. What is the common name of the species?
b. How many nucleotide or protein sequence records do you find (show your search results in
cropped windows)?
Question 2: Use the name “plague thrips” to search the Nucleotide database.
a. What is the scientific name of the plague thrips?
b. How many sequence records do you find?
c. Which genes or genomes of the plague thrips have been sequenced?
d. Provide information of the most recent publication that reported the mitochondrial genome of the
plague thrips including the authors, year and title of the publication, title of the journal, volume and
page numbers.
Question 3: Search PubMed for “Thanh NM” (International University).
a. How many publications of Thanh NM were deposited in PubMed?
b. List the common names of 2 aquatic animals that Thanh NM worked on.
c. Provide information of publication by Thanh NM: year and title of the publication, title of the
journal, volume and page numbers.
Question 4: Search Genome database for Homo sapiens.
a. How many records of genome assemblies did your search find?
b. Provide the accession number (RefSeq database) for the chromosome 1 of Homo sapiens, the size
of the chromosome 1.
c. Provide information of the most recent publication that reported the chromosome 1 (from above
search) including the authors, year and title of the publication, title of the journal, volume and page
Question 5: Use accession number “CU329670” to search the Nucleotide database.
a. What is the type of sequence? What is the length of sequence? What is the full name of database
b. What is the scientific name of organism?
Go to the FEATURES section of the record. Link to the CDS to gain access to the first 5662
nucleotides of the sequence.
c. Names of gene and its protein product and the length of protein.
d. Write the first four amino acids.
e. Write the nucleotide sequence of the coding strand that corresponds to these amino acids.
f. Write the nucleotide sequence of the template strand that corresponds to these amino acids.
(Note that the definition of the coding strand is the strand of DNA within the gene that is identical
to the transcript and the template strand is the strand that is complementary to the coding strand.)