Protein Folding

advertisement

Proteins

What is a protein?

• A protein is a molecule consisting of amino acids linked in a linear chain through peptide bonds.

Protein primary structure

Peptide formation

There are many kinds of proteins .

• Structural--determine shape and function of cells

• Enzymes--speed up chemical reactions

• Ligand-binding--bind small molecules and transport them to other locations

• muscle nerve

Cells

Structural proteins

• collagen -- in connective tissue such as cartilage

• elastin -- in connective tissue such as cartilage

• keratin--in hair and nails

• actin -- in muscle

• myosin -- in muscle to generate mechanical forces

Enzymes

• glucose isomerase--convert glucose into fructose

• rennin--make cheese

• cellulase--break down cellulose into sugars to make ethanol

• amylase--detergent for machine dish washing

Ligand-binding proteins.

• hemoglobin--transport oxygen from the lungs

• antibodies--bind foreign substances for destruction

The string of amino acids tends to “fold” into a shape.

Hemoglobin structure

Heart of Steel (Hemoglobin) by Julian Voss-Andreae

Protein views (Triose phosphate isomerase)

Visualizing proteins

Amino acids

• There are 20 different standard amino acids

• The different amino acids differ in chemical properties.

• Amino Acid

• Alanine

• Arginine

• Asparagine

• Aspartic acid

• Cysteine

• Glutamic acid

• Glutamine

• Glycine

• Histidine

• Isoleucine

• Leucine

• Lysine

• Methionine

• Phenylalanine

• Proline

• Serine

• Threonine

• Tryptophan

• Tyrosine

• Valine

3-Letter 1-Letter Polarity Acidity Hydrophobicity index

Ala A nonpolar neutral 1.8

Glu

Gln

Gly

His

Arg

Asn

Asp

Cys

Ile

Leu

Lys

Met

Phe

Pro

Ser

Thr

Trp

Tyr

Val

I

R

N

D

C

E

Q

G

H

L

K

M

F

P

S

T

W

Y

V polar polar polar nonpolar neutral polar polar polar polar basic (s) -4.5

neutral acidic acidic neutral nonpolar neutral nonpolar neutral neutral neutral nonpolar neutral polar neutral nonpolar neutral

-3.5

-3.5

2.5

-3.5

-3.5

nonpolar neutral polar basic (w)

-0.4

-3.2

nonpolar neutral nonpolar neutral polar basic nonpolar neutral

4.5

3.8

-3.9

1.9

2.8

-1.6

-0.8

-0.7

-0.9

-1.3

4.2

Hydrophobicity index.

• The larger the index, the stronger the tendency to be internal in the protein; the lower the index, the stronger the tendency to appear near the protein surface.

• Amino acids with high index are called hydrophobic ; with low index are called hydrophilic .

What is the shape of the protein?

• This is the “protein folding problem.”

• The geometry and chemistry of the parts of the protein determine how it behaves in the cell.

DNA

• DNA is deoxyribose nucleic acid.

• It occurs as long molecules in a double helix.

DNA is a long molecule in a double helix

What makes DNA?

• DNA consists of sequences of nucleotides.

• There are 4 kinds of nucleotide:

• Adenine (A), Cytosine (C), Guanine (G), and Thymine (T)

Matching

• Each A has weak (“hydrogen”) bonds with T on the other chain.

• Each C has weak (“hydrogen”) bonds with G on the other chain.

A single chain carries the information

• For example, the two strings might be

ACGGTCAG

TGCCAGTC

• Hence all the information is in the order of A, C, G, T in one of the chains.

• We write DNA as a (long) string of A, C,

G, T for example AGGCTACATAG…

Human DNA

• Humans have 46 chromosomes.

• Each chromosome is essentially a double helix of DNA, with variable numbers of nucleotides, from

50,000,000 to 250,000,000 base pairs.

• There are a total of about

2,860,000,000 nucleotide pairs.

Genes

• A gene is a portion of the DNA that tells how to make a protein.

DNA for beta hemoglobin

• ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG

CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA

GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA

GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG

CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA

AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC

AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT

GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT

GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC

AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG

TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA

CTAA

• Amino Acid

• Alanine

• Arginine

• Asparagine

• Aspartic acid

• Cysteine

• Glutamic acid

• Glutamine

• Glycine

• Histidine

• Isoleucine

• Leucine

• Lysine

• Methionine

• Phenylalanine

• Proline

• Serine

• Threonine

• Tryptophan

• Tyrosine

• Valine

3-Letter 1-Letter Polarity Acidity Hydrophobicity index

Ala A nonpolar neutral 1.8

Glu

Gln

Gly

His

Arg

Asn

Asp

Cys

Ile

Leu

Lys

Met

Phe

Pro

Ser

Thr

Trp

Tyr

Val

I

R

N

D

C

E

Q

G

H

L

K

M

F

P

S

T

W

Y

V polar polar polar nonpolar neutral polar polar polar polar basic (s) -4.5

neutral acidic acidic neutral nonpolar neutral nonpolar neutral neutral neutral nonpolar neutral polar neutral nonpolar neutral

-3.5

-3.5

2.5

-3.5

-3.5

nonpolar neutral polar basic (w)

-0.4

-3.2

nonpolar neutral nonpolar neutral polar basic nonpolar neutral

4.5

3.8

-3.9

1.9

2.8

-1.6

-0.8

-0.7

-0.9

-1.3

4.2

DNA determines the order of amino acids

• ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG

CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA

GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA

GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG

CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA

AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC

AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT

GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT

GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC

AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG

TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA

CTAA

Primary structure for beta hemoglobin--the order

• MVHLTPEEKSAVTALWGKVNVDEVG

GEALGRLLVVYWTQRFFESFGDLSTP

DAVMGNPKVKAHGKKVLGAFSDGLA

HLDNLKGTFATLSELHCDKLHVDPEN

FRLLGNVLVCVLAHHFGKEFTPPVQA

AYQKVVAGVANALAHKYH

Hemoglobin structure

How does DNA determine the order of amino acids?

• Three successive nucleotides form a

“codon.”

• Different codons stand for different amino acids.

Translating codons

• Ala/A GCT, GCC, GCA, GCG Leu/L

• Arg/R CGT, CGC, CGA, CGG, AGA, AGG Lys/K

• Asn/N AAT, AAC

• Asp/D GAT, GAC

Met/M

Phe/F

• Cys/C TGT, TGC

• Gln/Q CAA, CAG

• Glu/E GAA, GAG

• Gly/G GGT, GGC, GGA, GGG

• His/H CAT, CAC

Pro/P

Ser/S

Thr/T

Trp/W

Tyr/Y

• Ile/I ATT, ATC, ATA

• START ATG

Val/V

STOP

TTA, TTG, CTT, CTC, CTA, CTG

AAA, AAG

ATG

TTT, TTC

CCT, CCC, CCA, CCG

TCT, TCC, TCA, TCG, AGT, AGC

ACT, ACC, ACA, ACG

TGG

TAT, TAC

GTT, GTC, GTA, GTG

TAG, TGA, TAA

DNA for beta hemoglobin

• ATG GTG CAT CTG ACTCCTGAGGAGAAGTCTGCCGTTACTG

CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA

GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA

GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG

CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA

AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC

AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT

GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT

GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC

AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG

TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA

CTAA

Primary structure for beta hemoglobin

• M V H L TPEEKSAVTALWGKVNVDEVG

GEALGRLLVVYWTQRFFESFGDLSTP

DAVMGNPKVKAHGKKVLGAFSDGLA

HLDNLKGTFATLSELHCDKLHVDPEN

FRLLGNVLVCVLAHHFGKEFTPPVQA

AYQKVVAGVANALAHKYH

Hemoglobin structure

The order of amino acids is important

• Consider what may happen when the

“wrong” amino acid is in a certain position.

Primary structure for beta hemoglobin

• MVHLTPEEKSAVTALWGKVNVDEVG

GEALGRLLVVYWTQRFFESFGDLSTP

DAVMGNPKVKAHGKKVLGAFSDGLA

HLDNLKGTFATLSELHCDKLHVDPEN

FRLLGNVLVCVLAHHFGKEFTPPVQA

AYQKVVAGVANALAHKYH

Sickle cell anemia beta hemoglobin

• MVHLTP V EKSAVTALWGKVNVDEVG

GEALGRLLVVYWTQRFFESFGDLSTP

DAVMGNPKVKAHGKKVLGAFSDGLA

HLDNLKGTFATLSELHCDKLHVDPEN

FRLLGNVLVCVLAHHFGKEFTPPVQA

AYQKVVAGVANALAHKYH

• Amino Acid

• Alanine

• Arginine

• Asparagine

• Aspartic acid

• Cysteine

• Glutamic acid

• Glutamine

• Glycine

• Histidine

• Isoleucine

• Leucine

• Lysine

• Methionine

• Phenylalanine

• Proline

• Serine

• Threonine

• Tryptophan

• Tyrosine

• Valine

3-Letter 1-Letter Polarity Acidity Hydrophobicity index

Ala A nonpolar neutral 1.8

Glu

Gln

Gly

His

Arg

Asn

Asp

Cys

Ile

Leu

Lys

Met

Phe

Pro

Ser

Thr

Trp

Tyr

Val

I

R

N

D

C

E

Q

G

H

L

K

M

F

P

S

T

W

Y

V polar polar polar nonpolar neutral polar polar polar polar basic (s) -4.5

neutral acidic acidic neutral nonpolar neutral nonpolar neutral neutral neutral nonpolar neutral polar neutral nonpolar neutral

-3.5

-3.5

2.5

-3.5

-3.5

nonpolar neutral polar basic (w)

-0.4

-3.2

nonpolar neutral nonpolar neutral polar basic nonpolar neutral

4.5

3.8

-3.9

1.9

2.8

-1.6

-0.8

-0.7

-0.9

-1.3

4.2

Simple model

• Pretend there are only 2 kinds of amino acid--H and P.

• H stands for “hydrophobic”.

• Pretend that they must be placed on a grid.

• Example: HHPPPPPPPHH

A folding of HHPPPPPPPHH

H

H

H

H P

P

P

P

P

P

P

Another folding of

HHPPPPPPPHH

H

H

P P

H

H

P

P

P P

P

Energy

• HH has energy -1.

• PP has energy 0.

• HP has energy 0.

• PH has energy 0.

• The protein folds so as to minimize the energy.

A folding of HHPPPPPPPHH with energy -2

H

H

P

P

P

P

P

H

H P P

A folding of HHPPPPPPPHH with energy -4

H

H

P P

H

H

P

P

P P

P

A folding of HHPPPPPPPHH with ? energy

H

H

H

H

P

P

P

P

P P

P

The real problem

• There are 20 amino acids.

• Pairs have different energies.

• Typically a protein has about 100 amino acids.

• The protein is in 3 dimensions.

• It does not need to be on a grid.

• It must be worked on a computer.

The Direct Approach

• Write down a formula for the energy E, taking into account the (variable) locations of all amino acids, all charges and electrostatic attractions and repulsions, and all constraints.

• Minimize E.

Indirect Methods

• Statistics of amino acids in known structures

• Neural network models

• Nearest neighbor methods

• Hidden Markov models

Does a method work?

• We want to be able to check some answers, to see whether a method appears to work.

• Professor Zhijun Wu works on some problems related to this.

NMR

• NMR is Nuclear Magnetic Resonance

• Using NMR one can often find the distances between some particular atoms in a protein.

A1 d(1,4)

A4

Distances

A2 d(2,3)

• Here d(1,4) is the distance between the first and fourth atoms.

A3

A1 d(1,4)

Locations

A2

• A1 is at (x11, x12, x13).

• A2 is at (x21, x22, x23).

• A3 is at (x31, x32, x33).

• A4 is at (x41, x42, x43).

d(2,3)

A3

• Once you know all the locations, you know the shape of the protein.

A4

Position Matrix

• Form the matrix X

A1 d(1,4)

A2 d(2,3)

A3

A4 x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43

A1 d(1,4)

Matrix Equation

• It turns out that

A2 d(2,3)

A3

A4

X X T = D where D is a matrix that can be obtained just using all the numbers d(i,j).

A1 d(1,4)

The matrix D

A2 d(2,3)

A3

• If there are n atoms and the last is at the origin, then the entry of D in the ith row and jth column is

(d(i,n) 2 - d(i,j) 2 + d(j,n) 2 ) / 2

A4

Solving the matrix equation

A1

A2 d(2,3)

• Professor Zhijun Wu studies ways to solve such matrix equations rapidly.

A3 d(1,4)

A4

Energy

• HH has energy -1.

• PP has energy 0.

• HP has energy 0.

• PH has energy 0.

• The protein folds so as to minimize the energy.

What is the best folding of

• HPPHPPHPHPPHPHPHHH

• (Careful: answer is on the next slide)

HPPHPPHPHPPHPHPHHH

P

H

P H

P

H

H

P

H

H

P

H

H

P

P

H P

P

with energy -11

Download