Protein Sequences The Genetic Code The natural extension of the genetic code… 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Q: How many amino acids are there? a b g The twenty alpha-amino acids that are encoded by the genetic code share the generic structure… a Atom nomenclature within amino acids (as used within the PDB) O N CA C CB CG2 OG1 CE CD CG CB CA C O, OXT z 7 NZ N Atom number Atom name Residue name Chain ID Residue number 22.126 21.848 20.582 19.724 21.874 21.899 21.761 20.499 19.360 18.610 19.262 19.669 20.495 20.652 19.341 19.502 17.319 16.468 26.173 26.169 25.363 25.215 27.626 28.434 27.465 24.795 23.972 24.700 25.140 22.668 21.675 20.419 19.779 19.003 24.698 25.371 0.149 1.597 1.875 0.973 1.981 0.721 -0.440 3.073 3.469 4.597 5.536 4.145 3.360 4.220 4.628 5.891 4.389 5.384 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 28.61 27.50 26.69 26.48 28.55 29.65 28.77 22.80 22.07 18.49 17.98 24.58 36.59 48.23 53.43 57.07 17.98 17.19 N C C O C C C N C C O C C C C N N C Atom type 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 B-factor (aka Temp factor) A A A A A A A A A A A A A A A A A A Occupancy PRO PRO PRO PRO PRO PRO PRO LYS LYS LYS LYS LYS LYS LYS LYS LYS ALA ALA Z-coordinate N CA C O CB CG CD N CA C O CB CG CD CE NZ N CA Y-coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 X-coordinate ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Record name The .pdb file format Atom nomenclature within amino acids (as used within the PDB) -The alpha carbon (CA) is immediately adjacent the most oxidized carbon (which is the CO2- in amino acids) -All the other heavy nuclei are named according to the Greek alphabet. -Put otherwise, LYS can be described by: CA, CB, CG, CD, CE, and NZ. Lys To Do: Learn how to name the atoms of all amino acids. Hint: look at any generic PDB file to get a list of atom types. Arg Numbers are used to discriminate between similar positions… CB CB CB CG CG CD1 CG2 CD2 OG1 OD1 ND2 Here are some harder examples… CB CB CD1 CG CD2 NE2 ND1 CE1 CE2 CG CZ CB CD2 CD2 CG CE2 CD1 OH NE1 CE2 CE3 CZ3 CH2 CZ2 Side-chain torsion angles -With the exception of Ala and Gly, all sidechains also have torsion angles. -To Do on your own: - Count the # of chi’s in each amino acid. - Determine why Ala doesn’t have a chi angle. 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Fischer projection 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Terminologies • Hydrophobic: Amino acids are those with side chains that do not like to reside in an aqueous environment. Hence, these amino acids buried within the hydrophobic core of the protein. – Aliphatic: Hydrophobic group that contains only carbon or hydrogen atoms. – Aromatic: A side chain is considered aromatic when it contains an aromatic ring system. • Polar: Polar amino acids are those with side-chains that prefer to reside in an aqueous environment and hence can be generally found exposed on the surface of a protein. It’s actually a bit more complicated… Twenty Amino acids TYR: Amphipathic GLY: Unclassifiable Hydrophobic (non polar) Aliphatic (ALA, VAL, LEU, ILE, HINT: MET, PRO) Polar Aromatic You should (PHE, TRP) definitely know this!!! Polar Neutral Amide (ASN, GLN) -OH (THR, SER) -SH Charged Acidic (CYS) (ASP, GLU) Basic (HIS, LYS,ARG) 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Not uncommon amino acids in biochemistry, but they are not encoded within the genetic code (meaning not incorporated into proteins)… 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Primary structure = the complete set of covalent bonds within a protein Polypeptides Linear arrangement of n amino acid residues linked by peptide bonds. Polymers composed of two, three, a few, and many amino acid residues are called as dipeptides, tripeptides, oligopeptides and polypeptides. Proteins are molecules that consist of one or more polypeptide chains. Q: why is the pentapeptide SGYAL different than LAYGS? Amino acid to Dipeptide Amino Acid 1 Amino Acid 2 Note: this chemistry will not work as drawn! Peptide bond Peptide bond is the amide linkage that is formed between two amino acids, which results in (net) release of a molecule of water (H2O). The four atoms in the yellow box form a rigid planar unit and, as we will see next, there is no rotation around the C-N bond. The peptide bond has a partial double bond character, estimated at 40% under typical conditions. It is this fact that makes the peptide bond planar and rigid. A quick aside… + .. + + A horrible leaving group .. + + + A viable leaving group 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. -- The primary structure is a complete description of the covalent bond network within a protein. -- This is almost(!) completely described by the sequence of amino acids. -- If you know that the protein is AVG…, you can look up the structures of A, V and G, plus what you know about peptide bonding allows you to complete the covalent bond structure. -- So, when does the primary structure not fully describe the covalent bond network? -- BTW, this is a HUGE pet peeve of mine…there is no such thing as a primary sequence, despite its rather common usage (including in journal article titles…UGG!). A primary sequence implies a secondary sequence, which is nonsense. While there is of course primary, secondary, tertiary and quaternary structures, there is only the “sequence”. 1. 2. 3. 4. 5. 6. 7. 8. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘Non-standard’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds Comparing protein sequences to describe evolutionary processes. Multiple sequence alignments Given the sequences: INDUSTRY INTERESTING IMPORTANT One example of a MSA is: IN-DUST--RY INTERESTING IMPOR--TANT But is it better than: INDU--ST-RY INTERESTING IMPOR-T-ANT Multiple sequence alignments I-N-DU-ST-RY I-NTERESTING IMPO-R--TANT I--NDU-ST-RYI--NTERESTING I-MPO-R--TANT IN-DUTS--RY INTERESTING IMPOR--TANT INDU--ST-RY INTERESTING IMPOR-T-ANT I-NDUS--T-RYINT-ERES-TING IMPOR--TAN--T I-N--D--U-S-T-RY I-N-TE-RE-S-TING -M-PO--RTA-NT--- Multiple sequence alignments Possible MSA Entire column can NOT have only gaps! I-N-DU-ST-RY I-NTERESTING IMPO-R--TANT I--NDU-ST-RYI--NTERESTING I-MPO-R--TANT Can NOT move residues around Possible IN-DUTS--RY INTERESTING IMPOR--TANT INDU--ST-RY INTERESTING IMPOR-T-ANT Very few matches! Too many gaps! I-NDUS--T-RYINT-ERES-TING IMPOR--TAN--T I-N--D--U-S-T-RY I-N-TE-RE-S-TING IM-PO--RTA-NT--- Which alignment pairs make the most sense? More similar amino acids AVGTLE VLASID VS. AVGTLE EKWVKV A-VT-G-R-L-E AA-TA-Q-V-IE VS. AVWF----VLIM ALWFAMVFILIM VS. AVTG-RLE AATAQ-IE ESQG----KTD DTQADGKCRTD Fewer gaps Gap location makes more sense because gaps are less frequent in nonpolar regions. A multiple sequence alignment: -CAPSRPLNENDDGR-QAFELIGTAVNM... -CVPGRGEMEHDD-RDQVLELFGTVVNL... -AVPKRAALQNDDGR-QGWELYGTVSAQ... -AVPTKMNCFNDDGR-QSVNLIGTVSGN... -ILPARTSMCNDDGR-QTIEMKGTPAGG... --APGK--NGHKLV--Q-FELKGTYSRT... AFAPRRIKMVNKLGR-QNFTLLGTFERT... AYRPDRCNTCNKLGR-QDVELMGTDART... -YRPEEWFGENKLGR-QSAELIGTDERS... --APL-ETYWPKLGR-QTGALAGTNSAV... --RPY-KAGWNKLGR-QSYELGGTNPYI... ---PARAKNMG---R-QSYHL--TMEWQ... Chothia & Lesk. EMBO J. 5:823826 (1986). O O O CH2OH O CH2-P H N O H CHO N H H O CH2 -P H N N AN EXAMPLE MULTIPLE SEQUENCE ALIGNMENT. Conserved residues are indicated by color. Note that gaps tend to cluster together. Also gaps at the N- and C-terminal ends are more common. Why? REGULAR EXPRESSIONS AND SEQUENCE LOGOS. Regular expressions provide a coarse-grain summary of an alignment segment. Sequence logos essentially due the same, but without information loss (cf. http://en.wikipedia.org/wiki/Sequence_logo). A PHYLOGENETIC TREE DESCRIBES AN EVOLUTIONARY PROCESS. But from a more pragmatic viewpoint, it also visually describes the similarities and dissimilarities between sequences within a multiple alignment.