Bioinformatics
Essentials
Stephanie Tatem Murphy
smurphy@bcc.ctc.edu
DNA
ATGCATTTCGGT
TTACGCCATATA
GCTCGGGAATCA
TGCATCGATCGA
GTAGCTAGCTAG
Model organisms
Protein
PNSADADNDFEDRL
RAGLCDHDKEVQGL
QVRCAVUEEHMHK
KQQEFENIRLDAQRL
EFFAYIFQKEHMKR
What is Bioinformatics?
TGT
ATT
AGA
ACA
ATA
TGT
GCA
ATT
AAT
ATA
CAT
TGG
AAT
AAT
GTA
ATA
AGT
AAT
CAT
CTT
CCT
AGT
AGT
AAT
TAT
TGT
AAA
TTT
ACG
TAT
ACC
TGT
ATT
GTT
TTT
AAC
AAT
GTT
GTT
GTT
TGT
ATT
AGA
ACA
ATA
TGT
GCA
ATT
AAT
ATA
CAT
TGG
AAT
AAT
GTA
ATA
AGT
AAT
CAT
CTT
CCT
AGT
AGT
AAT
TAT
TGT
AAA
TTT
ACG
TAT
ACC
TGT
ATT
GTT
TTT
AAC
AAT
GTT
GTT
GTT
TTC
TGT
AAA
CTG
ATT
ATT
TGT
CTG
TTC
TGT
AAA
CTG
ATT
ATT
TGT
CTG
Which genes are turned off then on ?
Courtesy of Dr. Young Moo Lee
UC Davis
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
Genome
Transcriptome
Proteome
Fundamental Dogma
DNA
Although a few databases already
exist to distribute molecular
information,
the post-genomic era will need
many more to collect, manage,
and publish the coming flood of
new findings.
Map
Databases
PDB
SwissPROT
PIR
RNA
Gene Expression?
Development ?
Proteins
Pathways
Regulatory Pathways?
Metabolism?
Phenotypes
Clinical Data ?
Neuroanatomy?
Populations
GenBank
EMBL
DDBJ
Biodiversity?
Molecular Epidemiology?
Comparative Genomics?
Gene
a
b
c
d
e
…ATGGCCCTGTGGATGCGCCTCCTGCCCCTG…..
DNA base sequence recipe for amino acids
Met: Ala: Leu: Trp: Met: Arg: Leu: Leu: Pro: Leu:
Amino acid sequence = protein = trait
Art by Yelena Ponirovskaya
The Biology Project University of Arizona
http://www.biology.arizona.edu
DNA acitivity – RFLP, Inheritance
http://www.biology.arizona.edu/human_bio/activities/blackett/introduction.html
DNA replication fork
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/03t.html
DNA base pairing
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/08t.html
DNA translation
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/10t.html
The Genetic Code
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/12t.html
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/13t.html
DNA transcription
http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/15t.html
Bioinformatics – a Definition
bio – informatics: bioinformatics is conceptualizing
biology in terms of molecules and applying “informatics
techniques” to understand and organise the
information associated with these molecules, on a large
scale.
In short, bioinformatics is a management information
system for molecular biology and has many practical
applications.
As submitted to the Oxford English Dictionary.
What is Bioinformatics? N. M. Luscombe, et al. Yale University
Method Inform Med 4/2001
Bioinformatics – a Definition
The field of science in which
biology,
computer science, and
information technology
merge into a single
discipline. NCBI, Aug 2001
BIOLOGY
BIO
INFORMATICS
COMPUTER
SCIENCE
INFORMATION
TECHNOLOGY
What’s in a name?
Multiple
Sequence
Alignment
Database
Homology
Searching
Sequence
Analysis
Genome
Mapping
Protein
Analysis
Proteomics
Life Science
Informatics
Sample
Registration &
Tracking
3D
Modeling
Homology
Modeling
Docking
Intellectual
Property
Auditing
Integrated
Data
Repositories
Common
Visual
Interfaces
Bioinformatics Needs
Multidisciplinary teams
biologists, mathematicians, computer scientists,
laboratory technicians
Users and Developers to use / create
scalable database infrastructure
standards to control vocabulary and annotation
new ways of visualizing, analyzing and searching data
new ways of delivering information, tools and results
Faster and larger computer systems
Demo Bioinformatics Company
Onconomics Corporation
http://www.bscs.org/onco/default.htm
From nonprofit BSCS Biological Sciences Curriculum Study
Growth of Bioinformatics
Computer Programming
Personal Computers/ Internet
50 yrs ago DNA & Protein
Structure
20 yrs ago PCR
w.w.w.
Last 10 yrs Human Genome
Project
All fields use computers
Now
Biological
(art, law, communication)
Research
Bioinformatics Computer Skills
www.oreilly.com
Why informatics?
Large size of data sets
Allow students to ask questions of data
Integrate current research into classroom
http://www.ncbi.nlm.nih.go
v/Genbank/genbankstats.ht
ml
>100,000 species are represented in GenBank
all species
128,941
viruses
6,137
bacteria
31,262
archaea
2,100
eukaryota
87,147
The most sequenced organisms in GenBank
Homo sapiens
Mus musculus
Rattus norvegicus
Danio rerio
Zea mays
Oryza sativa
Drosophila melanogaster
Gallus gallus
Arabidopsis thaliana
Updated 8-12-04
GenBank release 142.0
10.7 billion bases
6.5b
5.6b
1.7b
1.4b
0.8b
0.7b
0.5b
0.5b
Table 2-2
Page 18
Online datasets for all the Life Sciences
Environment and Ecology
Population
http://www.prb.org
Water
http://www.waterontheweb.org/
http://www.neptune.washington.edu/
Geography
http://nhd.usgs.gov/
http://data.geocomm.com/
Chemistry
Physics
Biology
Anatomy & Physiology
Earth
http://www.dlese.org/educators/usingdata.html
Agriculture
Nutrition
Plant
http://allometra.com/ath_fasta_mpss.shtml
Why use Bioinformatics?
Data mining requires a
testable hypothesis generated
with regard to the
function or structure of a
gene or protein by
identifying similar sequences
in better characterized
organisms.
To help in uncovering
phylogenetic relationships and
evolutionary patterns.
www.tigr.org
What is Bioinformatics? N. M. Luscombe, et al. Yale University Method Inform Med 4/2001
Biotechnology
Did You or Will You Ever?
Ride in a car? Genetically engineered micro-organisms will someday be used to extract oil
from rocks. Micro-organisms that break down oil spills are already in use.
Drink tap water? Genetically engineered micro-organisms will someday be used to attract
and filter out harmful substances from drinking water.
Have a dog or cat? Vaccines for a number of pet diseases such as rabies will be improved by
genetic engineering.
Wear brightly colored clothes? Many clothing dyes can be made less expensively with
biotechnology, and will last longer.
Take vitamins? Vitamins can be made more potent and less expensively with biotechnology.
Go to the bathroom? Micro-organisms are already an important part of sewage treatment;
genetic engineering will produce bacteria that are more efficient at breaking down wastes.
What Good is Recombinant DNA?
People with diabetes need to take
a drug called insulin. In the past,
this drug was extracted and
purified from ground-up animal
glands. It takes several pounds of
cow or pig glands to produce a
fraction of an ounce of insulin.
http://www.chourave.ch/init/kid/cartoon-00.html
There are still many technical problems to
be solved. Not all gene splices work, and
some that do may fail over time.
The best way for people to enjoy the
benefits and avoid the problems is to
stay informed and up to date about
what’s happening in biotechnology.
Today, the DNA with the
instructions for making insulin
can be spliced into a plasmid,
And produced by bacteria? It’s
faster, easier, and cheaper this
way.
There are also social and
environmental concerns about
biotechnology. Some people fear we
will upset the balance of nature if
“genetically engineered” organisms
escape. Others fear that recombinant
DNA will be used to influence
human size, race, or intelligence.
How Do You Make Recombinant DNA?
First, you need to isolate a specific bit of DNA
with the instructions you want. To do this, you
use restriction enzymes that break up DNA
strands in specific places.
After you have DNA fragments, you sort them by
size, using a gel. DNA is loaded onto the top of the
gel, and then electricity is passed through it. This
causes the DNA pieces to migrate down, and the
small pieces travel further than the large pieces.
Next, you need to add the DNA fragment into a host.
In most research, the host is a plasmid, a ring of DNA
found in some bacteria.
The host DNA has to be exposed to restriction
enzymes to make split ends that will attach to
the fragment. After you mix the new and host
DNA fragments, you need to add enzymes that
will glue them together.
How Do You Make Recombinant DNA?
If you used a plasmid as a host, you need to put it back into a
bacterium. When the bacterium replicates itself, it will copy
the new DNA too. A small population of “gene-spliced”
bacteria can develop into a large population in just a few
days.
http://www.gene.com/gene/research/ biotechnology
What is an Enzyme?
Enzymes are molecules that speed up biological reactions.
For example, the enzyme carbonic anhydrase
enables red blood cells to pick up and dump
carbon dioxide 1 million times faster than
they could without it.
Some characteristics of enzymes:
Enzymes increase the rate
of a chemical reaction.
Enzymes don’t enter into the reaction
themselves. They’re not physically
changed as a result of the reaction. A
single enzyme can act thousands of
times.
Enzymes are highly specific. Like a wrench
that will only fit a 5/16-inch bolt, each
enzyme generally works with only a
particular kind of molecule.
An enzyme increases the odds that two
molecules will meet, so an enzyme is a
“matchmaker”.
Why try to Design Better Enzymes?
Enzymes are fragile….
they lose their shape (de-nature) if the
temperature or acidity go up even a little.
They also de-nature in alcohol or oils.
This is a drag! If you’re adding an enzyme to a
laundry detergent you’d like it to function in hot
water, with bleach!
As we understand more and more about DNA and
how it is de-coded, we can re-write the instructions
for making some enzymes.
By altering their shapes, we may be able to make
enzymes that are sturdier and able to function under
harsher conditions. We may even be able to invent
some completely new enzymes!
Examples of Enzymes
Subtilisin–This enzyme is added to laundry detergent.
It breaks down proteins (like yucky egg yolk stains or
gross dried blood) into tiny fragments that can be rinsed
away from the fibers of the cloth.
Papain-This enzyme breaks up proteins, and is
extracted from the papaya fruit. It’s now added to
contact lens cleaner solution to help dissolve away
gross crusty things from soft contact lenses.
Ceredase-Several thousand people in the United States have
Gaucher disease (low levels of a crucial enzyme that dissolves
fatty deposits in the liver, spleen and bone marrow). They suffer
from bone pain, fractures, swelling and bleeding. Ceredase is a
variation of the enzyme, produced in the laboratory, which can
be used to treat disease.
Vianain-Originally derived from pineapples,
this enzyme offers hope to burn victims. It
helps prepare burned areas of skin grafts by
safely dissolving damaged skin layers that
would otherwise have to be removed surgically.
Journals & Books
Public Library of Science - Open Access Journals
http://www.plosbiology.org
International Society for Computational Biology – Book Reviews
http://www.iscb.org/bioinformaticsBooks.shtml
Free Journals: Biotechniques http://www.BioTechniques.com
Genomeweb http://www.genomeweb.com
Books:
The Cartoon Guide to Genetics, Larry Gonick & Mark Wheelis
ISBN 0062730991 Harper 1983
Introduction to Bioinformatics, Arthur Lesk http://www.oup.com/uk/lesk/bioinf
ISBN 0199251967 Oxford 2002
Fundamental Concepts of Bioinformatics, Dan Krane & Michael Raymer
ISBN 0805346333 Benjamin Cummings 2003
Discovering Genomics, Proteomics, & Bioinformatics, A. Campbell & L. Heyer
ISBN 0805347224 Benjamin Cummings 2002
Understanding Biotechnology, George Acquaah
ISBN 0130945005 Pearson Prentice Hall 2004
Understanding Biotechnology, A. Borem, F. Santos, D. Bowen
ISBN 0131010115 Pearson Prentice Hall 2003
Human Genome Project
http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer2
001/index.shtml
Genomics and Its Impact on Science and Society:
The Human Genome Project and Beyond
U.S. Department of Energy Genome Programs
http://doegenomes.org
www.ncbi.nlm.nih.gov
National
Center for
Biotechnology
Information
A user’s guide to human genome
Nature Genetics www.nature.com/ng/
vol 32, pg 1-79, 01 Sep 2002
 Introduction: putting it together
 Question 8: How can one find all the members of a human gene family?
 Question 12: How does a user find characterized mouse mutants
corresponding to human genes?
 Web resources: Internet resources featured in this guide
Get Schooled for Bioinformatics
• Biology
– Know basics & Have sense of
biological experimentation
• Computer Science
– Programming
C, C++, Perl, JAVA, SAS, CGI
– Database construction UNIX,
LINUX
– Algorithm design
• Math/Statistics
– Probability, Experimental design
• Ethics
• “Core Bioinformatics”
– LIMS
– EST clustering
– Sequence analysis & annotation
Fundamental Dogma
Although a few databases already
exist to distribute molecular
information,
Map
Databases
DNA
RNA
Gene Expression?
the post-genomic era will need
many more to collect, manage,
and publish the coming flood of
new findings.
Biological Research = To enable
the discovery of new biological
insights as well as create a global
perspective from which unifying
principles in biology can be
discerned.
NCBI, Aug 2001
GenBank
EMBL
DDBJ
PDB
Development ?
ProteinsSwissPROT
PIR
Circuits
Regulatory Pathways?
Metabolism?
Phenotypes
Clinical Data ?
Neuroanatomy?
Populations
Biodiversity?
Molecular Epidemiology?
Comparative Genomics?
Ultra – Conserved element
-Only 6 SNP’s
- mouse, rat, human
TGATCCCGGACTCTATGAATTATTGATGAGATATGAGCGTTGA
TTTCCCCTTTCAG
GATGCAAACTCCATTATATTGTTAAAATGGCGATTTAATCGTTG
AGAATAGCTTTG
GTGTGGGTTTTTTCCCCCAACTCATTTGCGCCTCCTTCCTTTT
CATTTAACTCTCT
TAATTAAATCCTTTAACAGATTTTAATCACTTTTTGGAG