Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in mass spectrometry What are the best attributes to predict charge? Review of molecular biology DNA sequence determines protein sequence What are amino acids? N-terminus C-terminus Side chain Amino acids with different side chains have different names Glycine gly G alanine ala A valine val V leucine leu L isoleucine ile I methionine met M porline pro P phenylalanine phe F tryptophan trp W serine ser S cysteine cys C threonine thr T glutamine gln Q asparagine asn N histidine his H tyrosine tyr Y glutamic acid glu E aspartic acid asp D lysine lys K arginine arg R chemical properties of amino acids More properties of amino acids code mass pi pK1 pK2 charge Hydrop hobic? Polar ? A 89.09404 6.01 2.35 9.87 0 T F R 174.20274 10.76 1.82 8.99 + F F N 132.1190 5.41 2.14 8.72 0 F T D 133.10384 2.85 1.99 9.9 - F F C 121.15404 5.05 1.92 10.7 0 F T E 146.14594 3.15 2.1 9.47 - F F Q 146.14594 5.65 2.17 9.13 0 F T G 75.06714 6.06 2.35 9.78 0 T F H 155.15634 7.6 1.8 9.33 + F T I 131.17464 6.05 2.32 9.76 0 T F L 131.17464 6.01 2.33 9.74 0 T F K 146.18934 9.6 2.16 9.06 + F F M 149.20784 5.74 2.13 9.28 0 T F F 165.1918 5.49 2.2 9.31 0 T F P 115.13194 6.3 1.95 10.64 0 T F S 105.09344 5.68 2..19 9.21 0 F T T 119.12034 5.6 2.09 9.1 0 F T W 204.22844 5.89 2.46 9.41 0 T T Y 181.19124 5.64 2.2 9.21 0 F T V 117.14784 6.0 2.39 9.74 0 T F Amino Acids Polymerize to Form Proteins (polypeptides) formation of peptide bond H 0 H 0 -N-C-C-N-C-C-NHR HR H Proteases: enzymes that cut proteins at the peptide bond H 0 H 0 -N-C-C-N-C-C-NHR HR H Most proteases have cleavage specificity. Trypsin cleaves mainly at arginine (R) and lysine (K) Digestion of a protein with trypsin produces peptides of various length Analysis of digestion mixture yields information about proteins in sample Liquid chromatography coupled to mass spectrometry LC column Digested protein mixture peptides are retained for differing times on the LC column Electro-spray ionization Mass spectrometer Peptides may have multiple charges. Charges in dataset are averages from several runs First 4 of ~ 23,000 data pairs are Sequence AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR AAAAADLANR AAAAAQASASAAAK AAAAAVAQGGPIEDAER Charge 2.8 2 1.714286 2 Can peptide sequence be an input? What inputs can we calculate from the input sequence? Some suggestions for inputs from properties of amino acids Length of peptide Mass of peptide First amino acid Last amino acid Factions of amino acids of each type Fractions of hydrophobic, polar, and charged residues Net formal charge Average isoelectric point Average disassociation constant MLP with default options. 600 examples reserved for test set Poor results Other regression options