Document

advertisement
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Molecular Biology Primer
Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly,
Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael
Sneddon, Hoa Troung, Jerry Wang, Che Fung Yung
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
All Life depends on 3 critical molecules
• DNAs
• Hold information on how cell works
• RNAs
• Act to transfer short pieces of information to different parts
of cell
• Provide templates to synthesize into protein
• Proteins
• Form enzymes that send signals to other cells and regulate
gene activity
• Form body’s major components (e.g. hair, skin, etc.)
An Introduction to Bioinformatics Algorithms
DNA: The Basis of Life
• Humans have about 3 billion base
pairs.
• How do you package it into a cell?
• How does the cell know where in
the highly packed DNA where to
start transcription?
• Special regulatory sequences
• DNA size does not mean more
complex
• Complexity of DNA
• Eukaryotic genomes consist of
variable amounts of DNA
• Single Copy or Unique DNA
• Highly Repetitive DNA
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Discovery of DNA
•
•
DNA Sequences
• Chargaff and Vischer, 1949
• DNA consisting of A, T, G, C
• Adenine, Guanine, Cytosine, Thymine
• Chargaff Rule
• Noticing #A#T and #G#C
• A “strange but possibly meaningless”
phenomenon.
Wow!! A Double Helix
• Watson and Crick, Nature, April 25, 1953
1 Biologist
•
1 Physics Ph.D. Student
900 words
Nobel Prize
•
Rich, 1973
• Structural biologist at MIT.
• DNA’s structure in atomic resolution.
Crick
Watson
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
DNA
• Stores all information of life
• 4 “letters” base pairs. AGTC (adenine, guanine, thymine,
cytosine ) which pair A-T and C-G on complimentary strands.
• DNA is a polymer
• Sugar-Phosphate-Base
• Bases held together by H bonding to the opposite strand
http://www.lbl.gov/Education/HGP-images/dna-medium.gif
An Introduction to Bioinformatics Algorithms
The Purines
www.bioalgorithms.info
The Pyrimidines
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
DNA - Deoxyribonucleic acid
AMP
O
HO
P
O
N
N
HO
O
P
O
O
N
NH2
O 5´ N
1´
3´
2´
NH
N
HO
OH
NH2
Base
CH3
OH
4´
HO
Phosphate Sugar
P
O
NH2
N
O
OH
HO
GMP
N
NH2
O
O
NH
N
4´
HO
O
P
O
HO
OH
TMP
O
N
O 5´ N
1´
3´
2´
HO
CMP
O
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
DNA, continued
Sugar
Phosphate
Base (A,T, C or G)
http://www.bio.miami.edu/dana/104/DNA2.jpg
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
DNA, continued
• DNA has a double helix structure. However,
it is not symmetric. It has a “forward” and
“backward” direction. The ends are labeled
5’ and 3’ after the Carbon atoms in the sugar
component.
5’ AATCGCAAT 3’
3’ TTAGCGTTA 5’
DNA always reads 5’ to 3’ for transcription
replication
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Basic Structure Implications
• DNA is (-) charged due to phosphate:
gel electrophoresis, DNA sequencing (Sanger method)
• H-bonds form between specific bases:
hybridization – replication, transcription, translation
DNA microarrays, hybridization blots, PCR
C-G bound tighter than A-T due to triple H-bond
• DNA-protein interactions (via major & minor grooves):
transcriptional regulation
• DNA polymerization:
5’ to 3’ – phosphodiester bond formed between 5’ phosphate
and 3’ OH
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Double helix of DNA
• The double helix of DNA has these features:
• Concentration of adenine (A) is equal to thymine (T)
• Concentration of cytidine (C) is equal to guanine (G).
• Watson-Crick base-pairing A will only base-pair with T, and C with G
• base-pairs of G and C contain three H-bonds,
• Base-pairs of A and T contain two H-bonds.
• G-C base-pairs are more stable than A-T base-pairs
• Two polynucleotide strands wound around each other.
• The backbone of each consists of alternating deoxyribose and
phosphate groups
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
DNA - replication
• DNA can replicate by
splitting, and rebuilding
each strand.
• Note that the rebuilding
of each strand uses
slightly different
mechanisms due to the
5’ 3’ asymmetry, but
each daughter strand is
an exact replica of the
original strand.
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/D/DNAReplication.html
An Introduction to Bioinformatics Algorithms
Superstructure
Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Superstructure Implications
• DNA in a living cell is in a highly compacted and
structured state
• Transcription factors and RNA polymerase need
ACCESS to do their work
• Transcription is dependent on the structural
state – SEQUENCE alone does not tell the
whole story
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
RNA
• RNA is similar to DNA chemically. It is usually only
a single strand. T(hyamine) is replaced by U(racil)
• Some forms of RNA can form secondary structures
by “pairing up” with itself. This can have change its
properties
dramatically.
DNA and RNA
can pair with
each other.
tRNA linear and 3D view:
http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
RNA, continued
• Several types exist, classified by function
• mRNA – this is what is usually being referred
to when a Bioinformatician says “RNA”. This
is used to carry a gene’s message out of the
nucleus.
• tRNA – transfers genetic information from
mRNA to an amino acid sequence
• rRNA – ribosomal RNA. Part of the ribosome
which is involved in translation.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins….
If there is a job to be
done in the molecular
world of our cells,
usually that job is
done by a protein.
CATALASE
A protein hormone which helps
to regulate your blood sugar
levels
Source:
An enzyme which removes
Hydrogen peroxide from your
body so it does not become toxic
http://courses.washington.edu/conj/protein/insulin2.gif
http://www.biochem.ucl.ac.uk/bsm/pdbsum/1gwf/main.html
18
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins can be
fibrous or globular
Let’s explore the diversity of protein structure and function by
investigating some examples
19
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Fibrous proteins have a structural
role
•Collagen
is the most abundant protein in
vertebrates. Collagen fibers are a major
portion of tendons, bone and skin. Alpha
helices of collagen make up a triple helix
structure giving it tough and flexible
properties.
•Fibroin
fibers make the silk spun by
spiders and silk worms stronger weight
for weight than steel! The soft and
flexible properties come from the beta
structure.
•Keratin
is a tough insoluble protein that
makes up the quills of echidna, your hair
and nails and the rattle of a rattle snake.
The structure comes from alpha helices
that are cross-linked by disulfide bonds.
Source:http://www.prideofindia.net/images/nails.jpg
http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIF
http://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true
20
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
The globular proteins
The globular proteins have a number of biologically important roles. They
include:
Cell motility – proteins link together to form filaments which make movement
possible.
Organic catalysts in biochemical reactions – enzymes
Regulatory proteins – hormones, transcription factors
Membrane proteins – MHC markers, protein channels, gap junctions
Defense against pathogens – poisons/toxins, antibodies, complement
Transport and storage – hemoglobin and myosin
21
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins for cell motility
Above: Myosin (red) and actin filaments
(green) in coordinated muscle
contraction.
Right: Actin bound to the mysoin
binding site (groove in red part of
myosin protein).
Add energy (ATP) and myosin moves,
moving actin with it.
Source: http://www.ebsa.org/npbsn41/maf_home.html
http://sun0.mpimf-
22
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins in the Cell Cytoskeleton
Eukaryote cells have a cytoskeleton made up of straight
hollow cylinders called microtubules (bottom left).
Tubulin
forms
helical
filaments
They help cells maintain their shape, they act like conveyer
belts moving organelles around in the cytoplasm, and they
participate in forming spindle fibres in cell division.
Microtubules are composed of filaments of the protein,
tubulin (top left) . These filaments are compressed like
springs allowing microtubules to ‘stretch and contract’.
13 of these filaments attach side to side, a little like the slats in
a barrel, to form a microtubule. This barrel shaped structure
gives strength to the microtubule.
Source:
heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LA
NG=en
http://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/
23
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins speed up reactions Enzymes
2
2
Catalase speeds up the
breakdown of
hydrogen peroxide,
(H2O2) a toxic by
product of metabolic
reactions, to the
harmless substances,
water and oxygen.
The reaction is extremely
rapid as the enzyme
lowers the energy
needed to kick-start
the reaction (activation
energy)
+
No catalyst =
Input of 71kJ energy required
Energy
Activation
Energy
With catalase
= Input of 8 kJ energy required
Substrate
Product
Progress of reaction
24
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins can regulate metabolism –
hormones
When your body detects an increase in the sugar
content of blood after a meal, the hormone
insulin is released from cells in the pancreas.
Insulin binds to cell membranes and this triggers the
cells to absorb glucose for use or for storage as
glycogen in the liver.
Proteins span membranes –protein
channels
The CFTR membrane protein is an ion channel that
regulates the flow of chloride ions.
Not enough of this protein gets inserted into the
membranes of people suffering Cystic fibrosis. This
causes secretions to become thick as they are not
hydrated. The lungs and secretory ducts become
blocked as a consequence.
Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.html
http://www.cbp.pitt.edu/bradbury/projects.htm
25
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins Defend us against pathogens –
antibodies
Left: Antibodies like IgG found in
humans, recognise and bind to
groups of molecules or epitopes
found on foreign invaders.
Right: The binding site of an
antigen protein (left) interacting
with the epitope of a foreign
antigen (green)
Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.html
http://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htm
http://www.spilya.com/research/
http://www.umass.edu/microbio/chime/
26
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein structure and human disease
In some cases, a single amino acid substitution
can induce a dramatic change in protein structure.
For example, the DF508 mutation of CFTR alters
the a helical content of the protein, and disrupts
intracellular trafficking.
Other changes are subtle. The E6V mutation in the
gene encoding hemoglobin beta causes sicklecell anemia. The substitution introduces a
hydrophobic patch on the protein surface,
leading to clumping of hemoglobin molecules.
Page 27311
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein structure and human disease
Disease
Cystic fibrosis
Sickle-cell anemia
“mad cow” disease
Alzheimer disease
Protein
CFTR
hemoglobin beta
prion protein
amyloid precursor protein
Table 9.5
Page 28312
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins
• Complex organic molecules made up of amino acid
subunits
• there are 20 different types of amino acids.
• different sequences of amino acids fold into different 3-D shapes.
• Proteins can range from fewer than 20 to more than 5000 amino
acids in length.
• Each protein that an organism can produce is encoded in a piece of
the DNA called a “gene”.
• the single-celled bacterium E.coli has about 4300 different genes.
• Humans are believed to have about 30,000 different genes (the exact
number as yet unresolved)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Amino acids
20 possible groups
R
|
H2N--C--COOH
|
H
An Introduction to Bioinformatics Algorithms
Dipeptide
R
|
H2N--C--COOH
|
H
R
|
H2N--C--COOH
|
H
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Dipeptide
This is a peptide bond
R O
R
| II
|
H2N--C--C--NH--C--COOH
|
|
H
H
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins – Amino Acids
Name
1-letter code
Triplet
Glycine
G
GGT,GGC,GGA,GGG
Alanine
A
GCT,GCC,GCA,GCG
Valine
V
GTT,GTC,GTA,GTG
Leucine
L
TTG,TTA,CTT,CTC,CTA,CTG
Isoleucine
I
ATT,ATC,ATA
Histidine
H
CAT,CAC
Serine
S
TCT,TCC,TCA,TCG,AGT,AGC
Threonine
T
ACT,ACC,ACA,ACG
Cysteine
C
TGT,TGC
Methionine
M
ATG
Glutamic Acid
E
GAA,GAG
Aspartic Acid
D
GAT,GAC,AAT,AAC
Lysine
K
AAA,AAG
Arginine
R
CGT,CGC,CGA,CGG,AGA,AGG
Asparagine
N
AAT,AAC
Glutamine
Q
CAA,CAG
Phenylalanine
F
TTT,TTC
Tyrosine
Y
TAT,TAC
Tryptophan
W
TGG
Proline
P
CCT,CCC,CCA,CCG
Terminator (Stop)
*
TAA,TAG,TGA
Protein-Sequence
(Alphaet:
ACDEFGHIKLMNPQRSTV
WY):
• a typical human cell
contains about 100 million
proteins of about 10,000
types
An Introduction to Bioinformatics Algorithms
3/12/2016
www.bioalgorithms.info
34
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins – Amino Acids
Properties of amino acids:
• play a role in the construction of 3-D stuctures in proteins
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Proteins - Summary
• DNA sequence determines protein sequence
• Protein sequence determines protein
structure
• Protein structure determines protein folding
and function
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Polypeptide v. Protein
• A protein is a polypeptide, however to
understand the function of a protein given
only the polypeptide sequence is a very
difficult problem.
• Protein folding an open problem. The 3D
structure depends on many variables.
• Current approaches often work by looking at
the structure of homologous (similar)
proteins.
• Improper folding of a protein is believed to be
the cause of mad cow disease.
http://www.sanger.ac.uk/Users/sgj/thesis/node2.html for more information on folding
An Introduction to Bioinformatics Algorithms
Protein Folding
• Proteins tend to fold into the lowest
free energy conformation.
• Proteins begin to fold while the
peptide is still being translated.
• Proteins bury most of its hydrophobic
residues in an interior core to form an
α helix.
• Most proteins take the form of
secondary structures α helices and β
sheets.
• Molecular chaperones, hsp60 and hsp
70, work with other proteins to help
fold newly synthesized proteins.
• Much of the protein modifications and
folding occurs in the endoplasmic
reticulum and mitochondria.
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein Folding
• Proteins are not linear structures, though they are
built that way
• The amino acids have very different chemical
properties; they interact with each other after the
protein is built
• This causes the protein to start fold and adopting it’s
functional structure
• Proteins may fold in reaction to some ions, and several
separate chains of peptides may join together through their
hydrophobic and hydrophilic amino acids to form a polymer
An Introduction to Bioinformatics Algorithms
Protein Folding (cont’d)
• The structure that a
protein adopts is vital to
it’s chemistry
• Its structure determines
which of its amino acids
are exposed carry out
the protein’s function
• Its structure also
determines what
substrates it can react
with
www.bioalgorithms.info
Download