• X-ray crystallography
• NMR
• cryoEM
• biological molecules
– PDB – Protein Data Bank http://www.pdb.org
free
– NDB – Nucleic Data Bank http://ndbserver.rutgers.edu/
• organic molecules
– CSD – Cambridge Structural Database paid
1957
• Myoglobin structure determined
1970’s
• Discussions how to establish an archive of protein structures
• PDB established at Brookhaven
– Oct 1971, 7 structures
1980’s
• Technology takes off
– molecular biology, instrumentation, computer hardware and software
• Number of structures increases
• Structural biology is able to focus on medical problems
• IUCr requires data deposition to the PDB
1990’s
• Complexity of structures increases
• Structural genomics begins
• 20. 11. 2012 – 86 344 structures in the PDB archive
• 8 225 new structures deposited in 2012 so far
• Depositions by macromolecule type
– 92.6 % Proteins (79 959 structures)
– 2.8 % Nucleic acids (2456 structures)
– 4.5 % Protein-nucleic acid complexes (3905 structures)
• Depositions by experimental technique:
– 88.0% x-ray diffraction (75 957 structures)
– 11.2% solution NMR (9702 structures)
– 0.5% cryo-EM (468 structures) data as of 26. 11. 2012 http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html
• Each structure in the PDB is represented by a 4 character identifier of the form [0-
9][a-z,0-9][a-z,0-9][a-z,0-9]
• 1B3T
http://mmcif.pdb.org/
– all data released in all three formats
legacy format
http://www.wwpdb.org/docs.html
fortran-like 80 column-wide
not structured enough to describe complicated 3D objects
its limits have been broken several times
99,999 atoms, 34 (or 58) chains
readable by most programs
model – chain – residue – atom
based on community-agreed definitions
allows adding new features and customization
mmCIF categories are easily transformed to database tables
not designed to be read by humans, data should be viewed through programs and databases http://ich.vscht.cz/~cechp/mmcif/
http://www.pubmed.gov
http://www.pubmed.org
National Institute of Health (NIH) – U. S. government
National Library of Medicine (NLM)
National Centre for Biotechnology Information (NCBI)
NCBI (founded 1988, http://www.ncbi.nlm.nih.gov/ )
• Genomic sequences GenBank – open access annotated collection of all available nucleotide sequences, doubles each
18 months (October 2008 – 97 381 682 336 bp), new release every 2 months, accession number (U49845) required upon publication
• OMIM – Online Mendelian Inheritance in Man, db of diseases together with their genetic components
• PubChem (http://pubchem.ncbi.nlm.nih.gov/) – db of small organic molecules, includes the information about their bioacivities
• Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery) – federated search engine offering unified access to all NCBI databases
• journal citations and abstracts for biomedical literature
• since 1996 - free access to MEDLINE via
PubMed.
• PubMed - Web-based retrieval system developed by the NCBI at the NLM. It is part of
NCBI's Entrez.
• PubMed contains
– abstracts
– links to full-text articles
– links to other databases
– …and much more
• Most PubMed records are MEDLINE citations .
– citations and author abstracts from approx. 5 200 biomedical journals
– diverse topics: microbiology, delivery of health care, nutrition, pharmacology and environmental health.
– currently over 19 million references dating back to
1948
– new material added Tuesday through Saturday
– about 90% records are from English-language sources or have English abstracts
– Approximately 79% of the citations are included with the published abstract
• Pubmed Central (PMC)
– http://www.pubmedcentral.nih.gov/
– db of free full texts
– since 2007 paper funded by NIH must be freely available through PMC no later tha 12 month since publishing
• NCBI Bookshelf
– http://www.ncbi.nlm.nih.gov/sites/entrez?db=books
– free biomedical books (biochemistry, molecular biology, …)
• created 1960 by NLM
• "Medical Subject Headings."
– the authority list of the biomedical terms
– used for indexing journal articles for MEDLINE
• It imposes uniformity and consistency to the indexing of biomedical literature.
• MeSH Tree.
• Citations are indexed manually.
• http://www.nlm.nih.gov/bsd/disted/video/index.html
• MeSH vocabulary is organized by 16 main branches:
1. Anatomy
2. Organisms
3. Diseases
4. Chemical and Drugs
5. Analytical, Diagnostic and Therapeutic Techniques and
Equipment
6. Psychiatry and Psychology
7. Biological Sciences
8. Natural Sciences
9. Anthropology, Education, Sociology and Social Phenomena
10. Technology, Industry, Agriculture
11. Humanities
12. Information Science
13. Named Groups
14. Health Care
15. Publication Characteristics
16. Geographic Locations
• each citation has a unique PbMed ID (PMID), www.pubmed.org/PMID
• Boolean operators
– must be UPPERCASE!
– AND is default
– parenthesis: salmonella AND (hamburger OR eggs)
• phrase searching
– “kidney failure”
, kidney failure*
, kidney failure[tw]
• author names
– natural or inverted order julia”)
(“julia wong”
,
“wong
– searching last name only – use [au] tag ( wheeler[au]
)
• [ad] – affiliation of the first author
• [all] – all fields
• [au] – author
• [dp] – date of publication, yyyy/mm/dd, last two are optionally
• [ta] – journal title (abbreviated, full), see Journals database http://www.ncbi.nlm.nih.gov/journals
• [mh] - MeSH term
• [majr] – MeSH major topic
• [ti] – title
• [tiab] – title + abstract
• citation sensor
– choi blood 2008
• related articles
– sorted from most to least relevant
• All, Review, Free full text