CHEM 4202 Homework 1 Due 2/18/15 NAME__________________________ Calculations of percent ionization of a particular functional group are vital to determining proper protein interactions. We have learned that using the Henderson-Hasselbalch equation can be useful for any specific buffer. However, we can expand this calculation to take into account ALL ionizable groups in a molecule. This exercise will help us better understand those calculations, and with the help of a computer, make many calculations fast. Our first order of business is to remember where the Henderson-Hasselbalch equation comes from. Given a simple acid equilibrium: RCOOH ↔ H+ + RCOOThe equilibrium constant (or acid dissociation constant) for the reaction expressed by: πΎπ = [π» + ][π πΆππ − ] [π πΆπππ»] (1) where Ka is the acid dissociation constant. If we divide both sides of eq 1 by ([H+]Ka) we obtain the following: 1 [π» + ] 1 [π πΆππ− ] = πΎ ([π πΆπππ»]) (2) π Taking the log of both sides of eq 2 yields: 1 [π πΆππ− ] 1 πππ [π» +] = πππ (πΎ ([π πΆπππ»])) (3) π Since log(xy) = logx + logy and log(x/y) = logx – logy, we can rewrite eq 3: [π πΆππ− ] πππ1 − πππ[π» + ] = πππ1 − ππππΎπ + πππ ([π πΆπππ»]) (4) Finally, we simplify eq 4 by removing the removing the “log1” terms (since log1 = 0) and substituting in the working definitions that pH = οlog[H+] and pKa = οlogKa: [π πΆππ − ] ππ» = ππΎπ + πππ [π πΆπππ»] (5) Eq 5 is a form of the Henderson-Hasselbalch equation, which has been modified to apply to our example. It would be really useful to us if we could look at the overall charge of this species. Yes, our acid has no charge, and the conjugate base has a charge of -1, but how much is charged at different pH values? Since we are interested in the best way to represent our charged compound at different pH values, we see that if we isolate the ratio [RCOOο]/[RCOOH] we can make progress in this regard. By taking the inverse log of both sides of eq 5 we get: [π πΆππ− ] [π πΆπππ»] = 10(ππ»−ππΎπ) (6) In our example spreadsheet, there is a column marked pH. Enter pH values in 0.1 increments from 0-14. (HINT: Use functions to make the process faster.) In the cell beside the pKa=, type your pK value. Let’s assume that for a generic acid the value is 5. In the ratio column, we can now use our eq 6 to calculate the ratio of charged conjugate base to uncharged acid. Remember, to enter a formula we must start with “=”. Since our ratio is equal to 10^pH-pK, we can use cell references to make our calculation quick. (HINT: “$” in front of letter/number cell references lock that reference. CHEM 4202 Homework 1 Due 2/18/15 NAME__________________________ This way, when you drag your formula, you will not change certain aspects of the equation). Fill in the entire column for the ratio at each pH value. What do you notice about the values that you obtained? Do they make sense? What do they mean? While the calculation above may provide some information to us, ultimately, a percent charged would be more helpful. To do this, we must reexamine eq 6. A percent would be part out of the whole, but we are only looking at a ratio. Really what we want is: [π πΆππ− ] [π πΆπππ»]+[π πΆππ − ] (7) Without going into all of the math, we can get this result by manipulating eq 6 to be: [π πΆππ− ] [π πΆπππ»]+[π πΆππ− ] = 10(ππ»−ππΎπ ) 10(ππ»−ππΎπ ) +1 (8) Use eq 8 to fill in the fraction column. What do you notice about these values? Does it make sense? In reality, if we are looking at the fraction charged, we would want the value to be negative since the charge is negative. Therefore, you can fill in the (-)fraction column with the negative of the fraction column. You should now have all of the columns filled in for a general acid. Next, we need to look at what happens with a base. We must reexamine our equations because our base will start charged positive before it loses its proton to become neutral. Therefore, our starting equation will look a little different. ππ» = ππΎπ + πππ [π ππ»2 ] (9) [π ππ»3+ ] The problem is that we want the charged species in the numerator since we want fraction charged. We can fix this problem by flipping the ratio. Remember your log math, this will then switch all of the signs in the equation to look like: [π ππ» + ] −ππ» = −ππΎπ + πππ [π ππ»3 ] 2 (10) If we want to solve for our charged ratio as we did for our acid, we end up with an equation that look like this: [π ππ»3+ ] [π ππ»2 ] = 10(ππΎπ −ππ») (11) After you have filled in the pH column for your general base and picked a pK value (I suggest 10 for practice), you can put your new formula in from eq 11 for the ratio. What do you notice about the values? How do they compare to the values you obtained for the acid ratios? If we want to fill in our fraction charged column for the base, we must make our equation look similar to eq 8. Doing the same substitutions we should end up with: [π ππ»3+ ] [π ππ»2 ] = 10(ππΎπ −ππ») 10(ππΎπ−ππ») +1 (12) Use eq 12 to fill in your fraction charged column. What do you notice about your values? How do they compare to the acid fraction we did earlier? Ultimately, our charged fraction column (+1)fraction is the same values as our fraction column since the fraction carries a positive charge. CHEM 4202 Homework 1 Due 2/18/15 NAME__________________________ Now that we understand how to determine the fraction charged, we can apply the same exact concept to any chemical even if it has more than one ionizable group. For example, what if we wanted to know what the overall charge of an amino acid was at any pH? Remember, amino acids are zwitterions, meaning they have both a positive and negative charged group. However, they many not both be charged at all pH values. How would we figure that out? Well, we could determine the fraction charged of each ionizable group and then add them together. Let’s look at the amino acid alanine. We can look up the pK1 and pK2 values from our book. We can also fill in the pH column. What we now want to do is determine what information is important from our general acid or general base equations. If we want to fill in the COO- column, we could imagine that this is just like the general acid where we are creating a charged conjugate base. Therefore, we just have to enter the formula, eq 8, that we did for the fraction of general acid charged into that column. Remember, we want the value to be negative so that we are saying it has a negative charge. You can do this in one cell by putting your negative sign in front of the equation. The next column is for the NH3+ portion of alanine. This looks like our general base, where the acid is charged with a neutral conjugate base. Therefore, we want to put eq 12 into this column. Remember, we want this to be a positive value representing the positive charge. If we want to know the overall, or net charge, for alanine at any pH, all we need to do is sum the charges of the pieces at each pH. In the net charge column, add the values of the two charged fractions. We now have the ability to find the charge at any pH. We will ultimately want to know when the amino acid, or later the protein, has zero net charge. Due to the limited precision of this example spreadsheet, we will not see zero in the net charge column. However, we can find where zero would be, between the last positive value and the first negative value. Find the pH where the net charge is zero for alanine. This is the pI, or isoelectric point. We can calculate the pI for any of our amino acids if we examine all of the parts. The example spreadsheet is set up for you to try glutamic acid and lysine. Both have pKR values that will have to be included in the calculation. How will you determine which equation to use? You can check your work by looking up the pI online to make sure that you are correct for these amino acids. Now that we have an understanding of how we can calculate net charge, let us examine how the calculation would look for an entire protein. Open the pK values spreadsheet. This spreadsheet gives us a table with all pK values pre-entered so that we do not need to enter them. In addition, the net charge column has a calculation in it. We first need to understand the calculation. If you click on the first calculation box under net charge you can see the formula. Other than being quite long, it should look familiar. On our practice spreadsheet, we had an independent column for each possible fraction charged. In this spreadsheet, the calculation has been combined into one cell. What needs to be calculated in that cell? Well, let’s think about all of the possible ways that a protein can ionize. There are pK1, pK2, and pKR values. If we think about pK1 and pK2, those are not going to matter much, as they will not be ionizable in a protein due to peptide bonds. We do however need to consider the first amine group and the last carboxylic group, as they are not involved in peptide bonds. The first amine group is the first part of the calculation; the carboxylic group is the second part of the calculation. Each part of the formula is separated by a “+”, since we need to add all of the charges together. The remaining portions of the calculation are for each R group that is ionizable. They are in alphabetical order by one letter abbreviation. Our formula is set up to calculate the contribution of each ionizable group. But what if we have the same ionizable group more than one time in a protein, such as 3 lysines? Well, we can use the spreadsheet to add each of the occurrences of that charge, or multiply that portion by the number of time the amino acid appears in the sequence. We can actually have the spreadsheet count the number of times each amino acid occurs in the sequence too. This makes our spreadsheet almost automatic. If we paste the sequence of the protein into cell B33, the number of occurrences of each amino acid will self-populate below. These numbers are then used to multiply by the fraction charged in the formula. Can you find where each amino acid is represented in the formula? Why are there only certain amino acids listed below the sequence? Now you only need to change the reference for the first and last amino acids in the sequence, since the spreadsheet does not automatically do that. CHEM 4202 Homework 1 Due 2/18/15 NAME__________________________ Now that you have a feel for the full spreadsheet to calculate pI of a protein, let’s do some problems. Use the following protein sequence to answer questions 1-3 ALIEDKMACRGNCTPG 1. Using Excel, calculate the pI for the protein to three decimal places. (HINT: this does not mean that I just want you to add zeros!) (2 pts) 2. We know that two sulfhydryl groups can bond together to form disulfide bonds. Let us assume that the sequence forms disulfide bonds. How does this change your calculation (answer in words)? What is the new value of the pI for the protein to three decimal places? (4 pts) 3. Let us now say that our protein has two chains that are linked by the disulfide bond. Chain one is from the A-R and Chain two is from G-G. How does this change your calculation (answer in words)? What is the new value of the pI for the protein to three decimal places? (4 pts) 4. Turn in your completed Example Titration spreadsheet. This can be done electronically. (9 pts) 5. In the real world, we may want to calculate the pI of a protein. While we do not know yet how to find protein sequences, I have included several for you to use. Please calculate the pI of each protein. (6 pts) alpha-amylase [Drosophila melanogaster] MFLAKSIVCLALLAVANAQFDTNYASGRSGMVHLFEWKWDDIAAECENFLGPNGYAGVQVSPVNENAVKDSRPWWERYQPISYKLETRS GNEEQFASMVKRCNAVGVRTYVDVVFNHMAADGGTYGTGGSTASPSSKSYPGVPYSSLDFNPTCAISNYNDANEVRNCELVGLRDLNQG NSYVQDKVVEFLDHLIDLGVAGFRVDAAKHMWPADLAVIYGRLKNLNTDHGFASGSKAYIVQEVIDMGGEAISKSEYTGLGAITEFRHS DSIGKVFRGKDQLQYLTNWGTAWGFAASDRSLVFVDNHDNQRGHGAGGADVLTYKVPKQYKMASAFMLAHPFGTPRVMSSFSFTDTDQG PPTTDGHNIASPIFNSDNSCSGGWVCEHRWRQIYNMVAFRNTVGSDEIQNWWDNGSNQISFSRGSRGFVAFNNDNYDLNSSLQTGLPAG TYCDVISGSKSGSSCTGKTVTVGSDGRASINIGSSEDDGVLAIHVNAKL son of sevenless protein [Drosophila melanogaster] MFSGPSGHAHTISYGGGIGLGTGGGGGSGGSGSGSQGGGGGIGIGGGGVAGLQDCDGYDFTKCENAARWRGLFTPSLKKVLEQVHPRVT AKEDALLYVEKLCLRLLAMLCAKPLPHSVQDVEEKVNKSFPAPIDQWALNEAKEVINSKKRKSVLPTEKVHTLLQKDVLQYKIDSSVSA FLVAVLEYISADILKMAGDYVIKIAHCEITKEDIEVVMNADRVLMDMLNQSEATSCPVPCHFPRSASATYEETVKELIHDEKQYQRDLH MIIRVFREELVKIVSDPRELEPIFSNIMDIYEVTVTLLGSLEDVIEMSQEQSAPCVGSCFEELAEAEEFDVYKKYAYDVTSQASRDALN NLLSKPGASSLTTAGHGFRDAVKYYLPKLLLVPICHAFVYFDYIKHLKDLSSSQDDIESFEQVQGLLHPLHCDLEKVMASLSKERQVPV SGRVRRQLAIERTRELQMKVEHWEDKDVGQNCNEFIREDSLSKLGSGKRIWSERKVFLFDGLMVLCKANTKKQTPSAGATAYDYRLKEK YFMRRVDINDRPDSDDLKNSFELAPRMQPPIVLTAKNAQHKHDWMADLLMVITKSMLDRHLDSILQDIERKHPLRMPSPEIYKFAVPDS GDNIVLEERESAGVPMIKGATLCKLIERLTYHIYADPTFVRTFLTTYRYFCSPQQLLQLLVERFNIPDPSLVYQDTGTAGAGGMGGVGG DKEHKNSHREDWKRYRKEYVQPVQFRVLNVLRHWVDHHFYDFEKDPMLLEKLLNFLEHVNGKSMRKWVDSVLKIVQRKNEQEKSNKKIV YAYGHDPPPIEHHLSVPNDEITLLTLHPLELARQLTLLEFEMYKNVKPSELVGSPWTKKDKEVKSPNLLKIMKHTTNVTRWIEKSITEA ENYEERLAIMQRAIEVMMVMLELNNFNGILSIVAAMGTASVYRLRWTFQGLPERYRKFLEECRELSDDHLKKYQERLRSINPPCVPFFG RYLTNILHLEEGNPDLLANTELINFSKRRKVAEIIGEIQQYQNQPYCLNEESTIRQFFEQLDPFNGLSDKQMSDYLYNESLRIEPRGCK TVPKFPRKWPHIPLKSPGIKPRRQNQTNSSSKLSNSTSSVAAAAAASSTATSIATASAPSLHASSIMDAPTAAAANAGSGTLAGEQSPQ HNPHAFSVFAPVIIPERNTSSWSGTPQHTRTDQNNGEVSVPAPHLPKKPGAHVWANNNSTLASASAMDVVFSPALPEHLPPQSLPDSNP FASDTEAPPSPLPKLVVSPRHETGNRSPFHGRMQNSPTHSTASTVTLTGMSTSGGEEFCAGGFYFNSAHQGQPGAVPISPHVNVPMATN MEYRAVPPPLPPRRKERTESCADMAQKRQAPDAPTLPPRDGELSPPPIPPRLNHSTGISYLRQSHGKSKEFVGNSSLLLPNTSSIMIRR NSAIEKRAAATSQPNQAAAGPISTTLVTVSQAVATDEVLPLPISPAASSSTTTSPLTPAMSPMSPNIPSHPVESTSSSYAHQLRMRQQQ QQQTHPAIYSQHHQHHATHLPHHPHQHHSNPTQSRSSPKEFFPIATSLEGTPKLPPKPSLSANFYNNPDKGTMFLYPSTNEE collagen type IV, isoform C [Drosophila melanogaster] MLPFWKRLLYAAVIAGALVGADAQFWKTAGTAGSIQDSVKHYNRNEPKFPIDDSYDIVDSAGVARGDLPPKNCTAGYAGCVPKCIAEKG NRGLPGPLGPTGLKGEMGFPGMEGPSGDKGQKGDPGPYGQRGDKGERGSPGLHGQAGVPGVQGPAGNPGAPGINGKDGCDGQDGIPGLE GLSGMPGPRGYAGQLGSKGEKGEPAKENGDYAKGEKGEPGWRGTAGLAGPQGFPGEKGERGDSGPYGAKGPRGEHGLKGEKGASCYGPM KPGAPGIKGEKGEPASSFPVKPTHTVMGPRGDMGQKGEPGLVGRKGEPGPEGDTGLDGQKGEKGLPGGPGDRGRQGNFGPPGSTGQKGD RGEPGLNGLPGNPGQKGEPGRAGATGKPGLLGPPGPPGGGRGTPGPPGPKGPRGYVGAPGPQGLNGVDGLPGPQGYNGQKGGAGLPGRP GNEGPPGKKGEKGTAGLNGPKGSIGPIGHPGPPGPEGQKGDAGLPGYGIQGSKGDAGIPGYPGLKGSKGERGFKGNAGAPGDSKLGRPG TPGAAGAPGQKGDAGRPGTPGQKGDMGIKGDVGGKCSSCRAGPKGDKGTSGLPGIPGKDGARGPPGERGYPGERGHDGINGQTGPPGEK GEDGRTGLPGATGEPGKPALCDLSLIEPLKGDKGYPGAPGAKGVQGFKGAEGLPGIPGPKGEFGFKGEKGLSGAPGNDGTPGRAGRDGY CHEM 4202 Homework 1 Due 2/18/15 NAME__________________________ PGIPGQSIKGEPGFHGRDGAKGDKGSFGRSGEKGEPGSCALDEIKMPAKGNKGEPGQTGMPGPPGEDGSPGERGYTGLKGNTGPQGPPG VEGPRGLNGPRGEKGNQGAVGVPGNPGKDGLRGIPGRNGQPGPRGEPGISRPGPMGPPGLNGLQGEKGDRGPTGPIGFPGADGSVGYPG DRGDAGLPGVSGRPGIVGEKGDVGPIGPAGVAGPPGVPGIDGVRGRDGAKGEPGSPGLVGMPGNKGDRGAPGNDGPKGFAGVTGAPGKR GPAGIPGVSGAKGDKGATGLTGNDGPVGGRGPPGAPGLMGIKGDQGLAGAPGQQGLDGMPGEKGNQGFPGLDGPPGLPGDASEKGQKGE PGPSGLRGDTGPAGTPGWPGEKGLPGLAVHGRAGPPGEKGDQGRSGIDGRDGINGEKGEQGLQGVWGQPGEKGSVGAPGIPGAPGMDGL PGAAGAPGAVGYPGDRGDKGEPGLSGLPGLKGETGPVGLQGFTGAPGPKGERGIRGQPGLPATVPDIRGDKGSQGERGYTGEKGEQGER GLTGPAGVAGAKGDRGLQGPPGASGLNGIPGAKGDIGPRGEIGYPGVTIKGEKGLPGRPGRNGRQGLIGAPGLIGERGLPGLAGEPGLV GLPGPIGPAGSKGERGLAGSPGQPGQDGFPGAPGLKGDTGPQGFKGERGLNGFEGQKGDKGDRGLQGPSGLPGLVGQKGDTGYPGLNGN DGPVGAPGERGFTGPKGRDGRDGTPGLPGQKGEPGMLPPPGPKGEPGQPGRNGPKGEPGRPGERGLIGIQGERGEKGERGLIGETGNVG RPGPKGDRGEPGERGYEGAIGLIGQKGEPGAPAPAALDYLTGILITRHSQSETVPACSAGHTELWTGYSLLYVDGNDYAHNQDLGSPGS CVPRFSTLPVLSCGQNNVCNYASRNDKTFWLTTNAAIPMMPVENIEIRQYISRCVVCEAPANVIAVHSQTIEVPDCPNGWEGLWIGYSF LMHTAVGNGGGGQALQSPGSCLEDFRATPFIECNGAKGTCHFYETMTSFWMYNLESSQPFERPQQQTIKAGERQSHVSRCQVCMKNSS projectin [Drosophila melanogaster] VKAINAAGPGEPSDASKPIITKPRKLAPKILDPTKNIRTYNFKSGEPIFLDINISGEPAPDVTWNQNNKSVQTTSFSHIENLPYNTKYI NNNPERKDTGLYKISAHNFYGQDQVEFQINIITKPGKPGGPLEVSEVHKDGCKLKWKKPKDDGGEPVESYLVEKFDPDTGIWLPVGRSD GPEYNVDGLVPGHDYKFRVKAVNKEGESEPLETLGSIIAKDPFSVPTKPGVPEPTDWTRNKVELAWPEPASDGGSPIQGYIVEVKDKYS PLWEKALETNSPTPTATVQGLIEGNEYQFRVVALNKGGLSEPSDPSKIFTAKPRYLAPKIDRRNLRNITLSSGTALKLDANITGEPAPK VEWKLSNYHLQSGKNVTIETPDYYTKLVIRPTQRSDSGEYLVTATNTSGKDSVLVNVVITDKPSPPNGPLQISDVHKEGCHLKWKAPSD DGGTPIEYFQIDKLEPETGCWIPSCRSTEPQVDVTGLSPGNEYKFRVSAVNAEGESQPLVGDESIVARNPFDEPGKPENLKATDWDKDH VDLAWTPPLIDGGSPISCYIIEKQDKYGKWERALDVPADQCKATIPDLVEGQTYKFRVSAVNAAGTGEPSDSTPPIIAKARNKPPIIDR SSLVEVRIKAGQSFTFDCKVSGEPAPQTKWLLKKKEVYSKDNVKVTNVDYNTKLKVNSATRSDSGIYTVFRENANGEDSADVKVTVIDK PAPPNGPLKVDEINSESCTLHWNPPDDDGGQPIDNYVVGKLDETTGRWMTAGETDGPVTALKVGGLTPGHKYKFRVRAKNAQGTSEPLT TAQAIIAKNPFDVPTKPGTPTIKDFDKEFVDLEWTRPEADGGSPITGYVVEKRDKFSPDWEKCAEISDDITNAHVPDLIEGLKYEFRVR AVNKAGPGSPSDATETHVARPKNTPPKIDRNFMSDIKIKAGNVFEFDVPVTGEPLPSKDWTHEGNMIINTDRVKISNFDDRTKIRILDA TSDTGVYTLTARNINGTDRHNVKVTILDAPSVPEPALRNGDVSKNSIVLRWRPPKDDGGSEITHYVVEKMDNEAMRWVPVGDCTDTEIR ADNLIENHDYSFRVRAVNKQGQSQPLTTSQPITAKDPYSHPDKPGQPQATDWGKHFVDLEWSTPKRDGGAPISSYIIEKRPKFGQWERA AVVLGDNCKAHVPELTNGGEYEFRVIAVNRGGPSDPSDPSSTIICKPRFLAPFFDKSLLNDITVHAGKRLGWTLPIEASPRPLITWLYN GKEIGSNSRGESGLFQNELTFEIVSSLRSDEGRYTLILKNEHGSFDASAHATVLDRPSPPKGPLDITKITRDGCHLTWNVPDDDGGSPI LHYIIEKMDLSRSTWSDAGMSTHIVHDVTRLVHRKEYLFRVKAVNAIGESDPLEAVNTIIAKNEFDEPDAPGKPIITDWDRDHIDLQWA VPKSDGGAPISEYIIQKKEKGSPYWTNVRHVPSNKNTTTIPELTEGQEYEFRVIAVNQAGQSEPSEPSDMIMRKPRYLPPKIITPLNEV RIKCGLIFHTDIHFIGEPAPEATWTLNSNPLLSNDRSTITSIGHHSVVHTVNCQRSDSGIYHLLLRNSSGIDEGSFELVVLDRPGPPEG PMEYEEITANSVTISWKPPKDNGGSEISSYVIEKRDLTHGGGWVPAVNYVSAKYNHAVVPRLLEGTMYELRVMAENLQGRSDPLTSDQP VVAKSQYTVPGAPGKPELTDSDKNHITIKWKQPISNGGSCRVDLQACKLGT