Slides

advertisement
Protein Sequences
The Genetic Code
The natural extension of the genetic code…
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Q: How many amino acids are there?
a
b
g
The twenty alpha-amino acids that are encoded by the
genetic code share the generic structure…
a
Atom nomenclature within amino acids
(as used within the PDB)
O
N
CA
C
CB
CG2
OG1
CE
CD
CG
CB
CA
C
O, OXT
z
7
NZ
N
Atom number
Atom name
Residue name
Chain ID
Residue
number
22.126
21.848
20.582
19.724
21.874
21.899
21.761
20.499
19.360
18.610
19.262
19.669
20.495
20.652
19.341
19.502
17.319
16.468
26.173
26.169
25.363
25.215
27.626
28.434
27.465
24.795
23.972
24.700
25.140
22.668
21.675
20.419
19.779
19.003
24.698
25.371
0.149
1.597
1.875
0.973
1.981
0.721
-0.440
3.073
3.469
4.597
5.536
4.145
3.360
4.220
4.628
5.891
4.389
5.384
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
28.61
27.50
26.69
26.48
28.55
29.65
28.77
22.80
22.07
18.49
17.98
24.58
36.59
48.23
53.43
57.07
17.98
17.19
N
C
C
O
C
C
C
N
C
C
O
C
C
C
C
N
N
C
Atom type
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
4
4
B-factor
(aka Temp factor)
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
Occupancy
PRO
PRO
PRO
PRO
PRO
PRO
PRO
LYS
LYS
LYS
LYS
LYS
LYS
LYS
LYS
LYS
ALA
ALA
Z-coordinate
N
CA
C
O
CB
CG
CD
N
CA
C
O
CB
CG
CD
CE
NZ
N
CA
Y-coordinate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
X-coordinate
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
Record name
The .pdb file format
Atom nomenclature within amino acids (as used within the PDB)
-The alpha carbon (CA) is immediately adjacent the most oxidized
carbon (which is the CO2- in amino acids)
-All the other heavy nuclei are named according to the Greek alphabet.
-Put otherwise, LYS can be described by: CA, CB, CG, CD, CE, and NZ.
Lys
To Do: Learn how to name the atoms of all amino acids.
Hint: look at any generic PDB file to get a list of atom types.
Arg
Numbers are used to discriminate between similar positions…
CB
CB
CB
CG
CG
CD1
CG2
CD2
OG1
OD1
ND2
Here are some harder examples…
CB
CB
CD1
CG
CD2
NE2
ND1
CE1
CE2
CG
CZ
CB
CD2
CD2
CG
CE2
CD1
OH
NE1
CE2
CE3
CZ3
CH2
CZ2
Side-chain torsion angles
-With the exception of Ala and Gly, all sidechains
also have torsion angles.
-To Do on your own:
- Count the # of chi’s in each amino acid.
- Determine why Ala doesn’t have a chi angle.
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Fischer
projection
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Terminologies
•
Hydrophobic: Amino acids are those with side chains that do not like
to reside in an aqueous environment. Hence, these amino acids buried
within the hydrophobic core of the protein.
– Aliphatic: Hydrophobic group that contains only carbon or hydrogen atoms.
– Aromatic: A side chain is considered aromatic when it contains an
aromatic ring system.
•
Polar: Polar amino acids are those with side-chains that prefer to
reside in an aqueous environment and hence can be generally found
exposed on the surface of a protein.
It’s actually a bit more complicated…
Twenty Amino acids
TYR: Amphipathic
GLY: Unclassifiable
Hydrophobic (non polar)
Aliphatic
(ALA, VAL, LEU, ILE,
HINT:
MET, PRO)
Polar
Aromatic
You
should
(PHE,
TRP) definitely know this!!!
Polar Neutral
Amide
(ASN, GLN)
-OH
(THR, SER)
-SH
Charged
Acidic
(CYS) (ASP, GLU)
Basic
(HIS,
LYS,ARG)
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Not uncommon amino acids in biochemistry, but they are not encoded
within the genetic code (meaning not incorporated into proteins)…
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Primary structure = the complete set of covalent bonds within a protein
Polypeptides
Linear arrangement of n amino acid residues linked by peptide bonds.
Polymers composed of two, three, a few, and many amino acid residues are called
as dipeptides, tripeptides, oligopeptides and polypeptides.
Proteins are molecules that consist of one or more polypeptide chains.
Q: why is the pentapeptide SGYAL different than LAYGS?
Amino acid to Dipeptide
Amino Acid
1
Amino Acid
2
Note: this
chemistry will not
work as drawn!
Peptide bond
Peptide bond is the amide linkage that is formed between two amino
acids, which results in (net) release of a molecule of water (H2O).
The four atoms in the yellow box form a rigid planar unit and, as we will
see next, there is no rotation around the C-N bond.
The peptide bond has a partial double
bond character, estimated at 40% under
typical conditions. It is this fact that
makes the peptide bond planar and rigid.
A quick aside…
+
..
+
+
A horrible leaving group
..
+
+
+
A viable leaving group
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
-- The primary structure is a complete description of the covalent bond
network within a protein.
-- This is almost(!) completely described by the sequence of amino acids.
-- If you know that the protein is AVG…, you can look up the structures
of A, V and G, plus what you know about peptide bonding allows you to
complete the covalent bond structure.
-- So, when does the primary structure not fully describe the covalent
bond network?
-- BTW, this is a HUGE pet peeve of mine…there is no such thing as a
primary sequence, despite its rather common usage (including in
journal article titles…UGG!).
A primary sequence implies a secondary sequence, which is nonsense.
While there is of course primary, secondary, tertiary and quaternary
structures, there is only the “sequence”.
1.
2.
3.
4.
5.
6.
7.
8.
Overall amino acid structure
Amino acid stereochemistry
Amino acid sidechain structure & classification
‘Non-standard’ amino acids
Amino acid ionization
Formation of the peptide bond
Disulfide bonds
Comparing protein sequences to describe evolutionary
processes.
Multiple sequence alignments
Given the sequences:
INDUSTRY
INTERESTING
IMPORTANT
One example of a MSA is:
IN-DUST--RY
INTERESTING
IMPOR--TANT
But is it better than:
INDU--ST-RY
INTERESTING
IMPOR-T-ANT
Multiple sequence alignments
I-N-DU-ST-RY
I-NTERESTING
IMPO-R--TANT
I--NDU-ST-RYI--NTERESTING
I-MPO-R--TANT
IN-DUTS--RY
INTERESTING
IMPOR--TANT
INDU--ST-RY
INTERESTING
IMPOR-T-ANT
I-NDUS--T-RYINT-ERES-TING
IMPOR--TAN--T
I-N--D--U-S-T-RY
I-N-TE-RE-S-TING
-M-PO--RTA-NT---
Multiple sequence alignments
Possible MSA
Entire column can NOT have only gaps!
I-N-DU-ST-RY
I-NTERESTING
IMPO-R--TANT
I--NDU-ST-RYI--NTERESTING
I-MPO-R--TANT
Can NOT move residues around
Possible
IN-DUTS--RY
INTERESTING
IMPOR--TANT
INDU--ST-RY
INTERESTING
IMPOR-T-ANT
Very few matches!
Too many gaps!
I-NDUS--T-RYINT-ERES-TING
IMPOR--TAN--T
I-N--D--U-S-T-RY
I-N-TE-RE-S-TING
IM-PO--RTA-NT---
Which alignment pairs make the most sense?
More similar amino acids
AVGTLE
VLASID
VS.
AVGTLE
EKWVKV
A-VT-G-R-L-E
AA-TA-Q-V-IE
VS.
AVWF----VLIM
ALWFAMVFILIM
VS.
AVTG-RLE
AATAQ-IE
ESQG----KTD
DTQADGKCRTD
Fewer gaps
Gap location makes more sense
because gaps are less frequent
in nonpolar regions.
A multiple sequence alignment:
-CAPSRPLNENDDGR-QAFELIGTAVNM...
-CVPGRGEMEHDD-RDQVLELFGTVVNL...
-AVPKRAALQNDDGR-QGWELYGTVSAQ...
-AVPTKMNCFNDDGR-QSVNLIGTVSGN...
-ILPARTSMCNDDGR-QTIEMKGTPAGG...
--APGK--NGHKLV--Q-FELKGTYSRT...
AFAPRRIKMVNKLGR-QNFTLLGTFERT...
AYRPDRCNTCNKLGR-QDVELMGTDART...
-YRPEEWFGENKLGR-QSAELIGTDERS...
--APL-ETYWPKLGR-QTGALAGTNSAV...
--RPY-KAGWNKLGR-QSYELGGTNPYI...
---PARAKNMG---R-QSYHL--TMEWQ...
Chothia & Lesk. EMBO J. 5:823826 (1986).
O
O
O
CH2OH
O
CH2-P
H
N
O
H
CHO
N
H
H
O
CH2 -P
H
N
N
AN EXAMPLE MULTIPLE SEQUENCE ALIGNMENT.
Conserved residues are indicated by color. Note that gaps tend to cluster together.
Also gaps at the N- and C-terminal ends are more common. Why?
REGULAR EXPRESSIONS AND SEQUENCE LOGOS.
Regular expressions provide a coarse-grain summary of an alignment segment.
Sequence logos essentially due the same, but without information loss
(cf. http://en.wikipedia.org/wiki/Sequence_logo).
A PHYLOGENETIC TREE DESCRIBES AN EVOLUTIONARY PROCESS.
But from a more pragmatic viewpoint, it also visually describes the similarities and
dissimilarities between sequences within a multiple alignment.
Download