Practical Exercise

advertisement
Practical 1b Retrieving Biological Information from the Internet
Introduction
During the last practical, you explored a number of biological databases and acquired
some database searching skills. As your needs for biological information change, for
instance, as your research interests change direction, ideally you would be able to find
suitable databases on your own.
Options for finding these databases include, but are not limited to, (1) searching the
Internet with a general purpose search engine such as Google, using well-chosen
keywords, (2) searching PubMed for publications about databases, and (3) identifying
catalogues of databases.
The journal Nucleic Acids Research produces a special annual Database Issue, which
is an example of a database catalogue. In this practical, you will inspect the most recent
Database Issue. Please note that not all biological databases are described in each
Database Issue.
In addition, using the database identification and searching skills you have acquired,
you will learn how to gather information from appropriate databases in order to address
various research needs.
Objectives in General
By the end of this practical you will:
-
Carry out a brief survey of the databases catalogued by NAR
-
Learn how to formulate an opinion on the completeness of certain databases
-
Know how to independently identify useful databases for your research needs
-
Learn how to solve research questions by making use of existing information in
literature and public databases
Practical Exercise
The Nucleic Acids Research (NAR) Database Issue: Identifying useful databases
1 Using either PubMed or Google, find the January 2011 Database Issue of
Nucleic Acids Research.
or
Go to www.nar.oxfordjournals.org, then click on the “2011 Database Issue”
link on the right hand side.
2 Check that you are at the correct URL:
http://nar.oxfordjournals.org/content/39/suppl_1
3 The Nucleic Acids Research (NAR) Database Issue is a special
supplementary issue of the Nucleic Acids Research journal that is published
yearly. The special Database issue features descriptions of new as well as
existing databases containing molecular biology data.
Scroll through the list of articles. Which volume of the Nucleic Acids Research
journal is the 2011 Database Issue published in?
4 At the first article in this volume, “The 2011 Nucleic Acids Research
Database Issue and the online Molecular Biology Database Collection”,
Michael Y. Galperin and Guy R. Cochrane, pp D1-D6, click on the “Database
Summaries” link.
In case you are interested in reading the Galperin and Cochrane (2011)
paper, it is made available to you in the Miscellaneous folder of the Student
Workbin on the IVLE. Since this has been made available to you, please
avoid downloading it directly from the NAR website as it will incur
unnecessary charges to the university.
5 Read the description of the Datafbase Summaries page. Is it a
comprehensive collection of all available biological databases?
6 After clicking on the “Database Summaries” link, click on the “Complete
Category/Summary Paper List” link for a detailed listing (see below).
7 The Category/Summary Paper list is a catalogue of databases published in
the Database Issue. From the list of databases at NAR, locate the databases
that you used in the last practical.
Are they all there?
Do any of them appear more than once?
Is there any difference between them (if yes, please describe some of the
differences)? (hint: read database descriptions)
(example: “PDBe”, “PDBSum”,”PDB” under “Protein Structure” section)
(example: “NCBI Protein Database” under “Protein Sequence Database”)
(example: “UniProt” “UniParc”, “UniRef”,”SwissProt” under “Protein Sequence
Database”)
8 Give a few examples of biological databases for the following data types.
Read the database summary for information about the data type.
Protein sequence Nucleic acid sequence Protein-protein interactions Pathways Genome information Taxonomy information 8 Open a new webpage and go to p53.bii.a-star.edu.sg
What kind of information is shown there?
Is this database in the NAR database list?
What does this tell you about the comprehensiveness of the NAR database
collection?
Problem Scenario
Having acquired some knowledge about online biological databases and basic
database searching skills, this week you shall learn how to source for information on
specific organisms.
The NCBI Coffee Break (http://www.ncbi.nlm.nih.gov/books/NBK2345/) is an
interesting e-book which contains articles reporting recent biomedical discoveries and
highlighting NCBI databases and tools used in the research process.
Your supervisor has read the e-book and found the first two articles on the Neanderthal
man (“Neanderthal man lives on in some of us”) and the woolly mammoth (“From
Africa to the Arctic”) particularly interesting and relevant to your training in
bioinformatics.
He has, therefore, assigned you to collect as much information as you can on the above
two organisms which are already extinct. Your supervisor decided that you should be
independent enough to carry out the task. He has, therefore, instructed you to select
appropriate databases and source for relevant information on your own.
Task
After going through the practical exercises last week, you would have acquired useful
database identification and searching skills. Using these skills, source for information on
the Neanderthal man and the wooly mammoth, then write a short summary (with proper
citations and references to specific databases) on these organisms.
To find information on a specific organism, what kind of database should you search in?
(Hint: You can look through the NAR database catalogue to get the answer)
Eg, go to “NCBI Taxonomy Browser” search “Neanderthal man”, “wooly mammoth”
Hint: A good starting is the UniProt Taxonomy Browser (go
http://www.uniprot.org/taxonomy/) and the NCBI Taxonomy Browser
http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/).
In addition to providing information about specific organisms, these databases also
provide many cross-links to other relevant databases including protein sequence,
nucleotide sequence and genomic sequence databases.
Can you find and list down other taxonomy databases from the NAR database
catalogue?
Do all the taxonomy databases listed contain information on all taxons (organisms) in
the Tree of Life?
Ideally, for each organism, you should at least be able to answer the following
questions:
– What is the scientific name of these organisms?
– What is the full taxonomic lineage of these organisms?
– Are these organisms prokaryote or eukaryote?
– What is the TaxonID for the organism?
– Is the genome of each organism sequenced?
– List the accession number(s) for the genome sequence of each.
Observe the format of the accession numbers of the genome records. Do
they share a common format? Which database are these records rom?
– List the sequencing center for the full genome sequences.
– List the title and Pubmed ID (if available) of the paper(s) with the genome
sequencing results.
– List the name, protein sequence accession number and GeneID of some
protein coding genes present in each organism.
Do both organisms share the same protein coding genes?
– List the names and GeneID of some non-protein coding genes in each
organism.
What are the functions of these genes?
– Provide a short description of the features of the organism. (hint: You can
search in Google or PubMed)
Advanced Section
Your supervisor is happy with your progress in this exercise. Applying the skills and
knowledge you have learnt in this practical, he has requested you to find out as much
as you can about the organism, Dictyostelium discoideum, which he is intending to use
in his experiments.
Ideally, for each organism, you should at least be able to answer the following
questions:
– What is the full taxonomic lineage of Dictyostelium discoideum?
– Is Dictyostelium discoideum a prokaryote or eukaryote?
– What is the TaxonID for the organism Dictyostelium discoideum?
– What is the common name of Dictyostelium discoideum?
– Is the genome of Dictyostelium discoideum sequenced?
– List the accession number(s) for the full genome sequence(s).
– List the sequencing center(s) for the full genome sequences.
– List the title and Pubmed ID (if available) of the paper(s) with the genome
sequencing results.
How many chromosomes does Dictyostelium discoideum have?
List the name and GeneID of some of the genes present in the organism.
– Are there specialized database(s) which contains information specific to
Dictyostelium discoideum?
– Provide a short description of the features of the organism. (From Pubmed or
description from specialized databases)
Download