“Proteomics & Bioinformatics” MBI, Master's Degree Program in Helsinki, Finland Lecture 2 8 May, 2007 Sophia Kossida, BRF, Academy of Athens, Greece Esa Pitkänen, Univeristy of Helsinki, Finland Juho Rousu, University of Helsinki, Finland Gel Image Analysis staining Image acquisition Image analysis /quantification Image analysis is extracting perceptible data out of the 2DE image, and storing it in a database. It involves detecting spots and warping separate images to align like-spots of the same proteins. Spot data comes from the levels of spot darkness which is proportional to the level of proteins staining or dye labeling of particular amino acids. Reiner Westermeier, GE Healthcare LifeSciences, Munich, Germany Image analysis software Some commercially available softwares: ImageMaster2D/ Melanie PDQuest (Bio-Rad, USA) Progenesis (Nonlinear, UK) Delta2D (Decodon, Germany) Melanie http://www.2d-gel-analysis.com/ Melanie http://au.expasy.org/melanie/ PDQuest http://www.bio-rad.com/ Progenesis http://www.nonlinear.com/products/progenesis/ Delta 2D http://www.decodon.com/Solutions/Delta2D/ Staining Detection Pre-labeling - radioisotopes, stable isotopes, fluorescence Intermediate labeling - Fluorescence during equilibration of IPG-strip Staining of gel background –Imidazol Zinc staining Staining of proteins- Organic dyes, silver, fluorescent dyes Blotting – Immuno / affinity detection Scanning Avoiding background, artifacts and noise is essential insufficient destaining, contamination by fingerprints, fluorescent sprinkle, bubbles) gel breaking, gel pieces Use grayscale (complete range) instead of color images. Scan all gel images using the same orientation, placing each gel at the same position on the scanner plate. Avoid scanning too much of the area around the gel. Limit post-processing to crop, mirror and rotation by 90, 180, 270 degrees Avoid producing TIFF files if you can process calibrated image file formats such as *.IMG/INF and *.GEL. usually the TIFF files are produces without grayscale calibration. This means you loose precision or grayscales are distorted nonlinearly making quantitation questionable. Avoid using JPEGfiles for quantitative analysis. Image analysis Manipulation of image/ normalization -separation of overlapping spots, removing lines and speckles Spot detection/ quantification -background subtraction, spot segmentation, land-marking ,spot matching Gel comparison -matching of gels (e.g. normal, diseased, treated),alignment Data analysis -changes in expression Data representation -annotation of spots, linking of data: spots -intensity - MS data Organizing experiments Organizing the experiment: Creating projects, folders and subfolders. Importing gel images Melanie/ImageMaster 2D Platinum 6.0 Import gels Tool box to easy manipulate gels Melanie/ImageMaster 2D Platinum 6.0 Viewing and manipulating images Adjusting contrast Intensity variations in x- and y-direction 3D-view Automatically subtracted background Melanie/ImageMaster 2D Platinum 6.0 Spot detection Adjust the separation between spots Split overlap Eliminate art affects/noise Stain saturation Incomplete resolution Melanie/ImageMaster 2D Platinum 6.0 Spots report A spot report summarizes the information about the selected spots Melanie/ImageMaster 2D Platinum 6.0 Detection/matching Spot detection Spot matching Normalization of spot intensities PTM? Downregulation? Modified from: mouse cardiac; 250 g loading; pH 3-10 IEF strips; 12.5% SDS-PAGE; file ID: sc5bcon vs. sc15iso Matching Reference gel Combining 2D gel images -creating a master gel, a “typical profile”. Melanie/ImageMaster 2D Platinum 6.0 Master gels Combine several images, creating the master image •all the spots on a single image –even those that will never be expressed at the same time, •a summary of groups of replicate gels (average gel) Delta 2D Any point on a gel can be labeled, and automatically transferred from one gel to another. Gel image warping Variations in migration, protein separation, stain artifacts and stain saturation complicate gel matching and quantitation. Compensates for running differences between gels After warping, corresponding spots will have the same position on every image. Expression Comparison of individual experimental gels to master gels. Identification of variant spots Miscellaneous Automatic retrieval of web information. Send out a “Scout” to the web and bring back corresponding data like pI, MW, sequence, function Create a PowerPoint slide from a gel image Delta 2D 2D Gel Databases Swiss-2DPAGE www.expasy.ch GelBank http://www.gelscape.ualberta.ca:8080/htm/gdbIndex.html Cornea 2D-PAGE http://www.cornea-proteomics.com/ World 2DPAGE, Index of 2D gel databases http://ca.expasy.org/ch2d/2d-index.html Swiss 2D PAGE viewer Gel bank cornea World-2DPAGE http://ca.expasy.org/ch2d/2d-index.html Make 2D database A software package to create, convert, publish, interconnect and keep up to date 2DE-databases. Provided by ExPASY The database is queryable via description, accession or spot clicking. Cross-references are provided to other federated 2D PAGE database entries, Medline and SWISS-PROT Entries are linked to images showing the experimentally determined and theoretical protein locations. Search via –clickable images, -keywords It runs on most UNIX-based operating systems (Linux, Solaris/SunOS, IRIX). Being continuously developed, the tool is evolving in concert with the current Proteomics Standards Initiative of the Human Proteome Organization (HUPO). Data can be marked to be public, as well as fully or partially private. An administration Web interface, highly secured, makes external data integration, data export, data privacy control, database publication and versions' control a very easy task to perform. Federated databases A collection of databases that are treated as one entity and viewed through a single user interface (pc.mag.com) Robustness Consistency Maintenance of the database Data quality Limitations of current databases: Do not contain strict/detailed descriptions of protocol (buffers, sample volume, staining techniques all important information for gel comparisons). Designed as 2D (and not proteomics) databases and therefore not readily expandable to incorporate other proteomics data e.g. MS, MDLC. Designed for reference gels, not on-going projects. Guidelines for building a federated 2-DE database http://ca.expasy.org/ch2d/fed-rules.html Individual entries in the database must be accessible by a keyword search. Other methods are possible but not required. The database must be linked to other databases by active hypertext crossreferences, linking together all related databases. Database entries must be at least linked to the main index. A main index has to be supplied that provides a means of querying all databases through one unique query point. Individual protein entries must be available through clickable images. 2DE analysis software designed for use with federated databases, must be able to access individual entries in any federated 2DE databases. for a complete reference, see Appel et al., Electrophoresis 17, 1996, 540-546, 1996): SWISS 2D PAGE http://au.expasy.org/ch2d/ Swiss 2D PAGE viewer Which gel you want to look at Swiss 2D PAGE Swiss-2D PAGE Estimated position Estimated position in human liver sample Vimentin_human (P08670) Peptide Mass Fingerprinting A protein identification technique, that correlates experimental data with theoretical data. Protein Proteolytic digestion “Experimental” MS Computer search Protein sequence from database In silico digestion Theoretical MS Peptide Mass Fingerprinting • Protein digestion with protease (trypsin) • Determination of the mass by MS -Calibration • Database searching -Generation of the peptide map • Comparison with theoretical peptide maps of known proteins -In silico digestion • Identification of the protein based on a probabilistic basis -percent coverage, similarity etc Protein digestion with protease (trypsin) The molecule is cleaved at all the possible sites, which will produce a set of peptides, of varying masses, that are characteristic of that protein. The mass of each peptide will be the sum of the amino acids present including any modifications that those amino acids might have undergone. trypsin Cleaves at lysine and arginine, unless either is followed by proline in C-terminal direction from tutorial written by: Dr J. R. Jefferies, Parasitology Group, Institute of Biological Sciences, University of Wales, UK Determination of mass MALDI - MS is used to measure the masses of the proteolytic peptide fragments. Every peak corresponds to the exact mass (m/z) of a peptide ion Select: Monoisotopic peaks [M+H]+ i.e. singly charged 1051.54 1086.52 1094.56 1111.59 1244.64 1421.7 1476.67 1542.84 1613.88 1664.97 1763.79 1777.82 Peak list Isotopes Isotopes are different forms of an element, each having different atomic mass. They have a nuclei with the same number of protons (same atomic number) but different numbers of neutrons. Naturally occurring isotopes Isotope (A) mass Abundance, % Isotope (A+1) mass Abundance, % Isotope (A+2) mass 12C 12 98,93 13C 13.0033548378 1.07 C14 14.003241988 1H 1.0078250321 99.9885 2H 2.0141017780 0.0115 3H 3.0160492675 14N 14.0030740052 99.632 15N 15.0001088984 0.368 16O 15.9949146221 99.757 17O 16.99913150 0.038 18O 17.9991604 modified from: http://www.ionsource.com/Card/Mass/mass.htm Monoisotopic- /Average mass The monoisotopic mass is the mass of the isotopic peak whose elemental composition is composed of the most abundant isotopes of those elements 1156.3 average mass A simulated isotopic distribution of the [M+H]+ ion of a compound (polyananine) Monoisotopic mass is expressed in atomic mass unites (amu), or in daltons (Da). Accuracy The higher the accuracy, the better and more specific the protein hit. Accurate measurements of peptide masses Accurate databases Relies on the ability to search data already present in various databases Effect of Mass Accuracy and Mass Tolerance search m/z mass tolerance (Da) # hits 1529 1 478 1529,7 0,1 164 1529,73 0,01 25 1529,734 0,001 4 1529,7348 0,0001 2 Tryptic digestion of human hemoglobin alpha chain yields 14 tryptic peptides, of which the peptide VGAHAGEYHAEALER has an exact monoisotopic mass of 1528,7348 Da. The singly charged ion of this peptide has an m/z value of 1529,7348. The result of searching SWISS PROT database against all human and mouse proteins. Lieber, Introduction to Proteomics Database search Peptide mass fingerprinting provides evidence for the most probable identity of a protein. The genome should be verified for the organism that you are working on. If not, then the next most ideal situation is that there is good cDNA data available. If neither of these are the case then it is worth checking if there are any expressed sequence tags (EST) that can be used. The quality of the Protein identification will depend upon: Quality of the mass spectrometry data The accuracy of the database The power of the search algorithms and software used Tools for fingerprinting Mascot (Matrix Science) Aldente (ExPasy) Profound MS-Fit (Prospector; UCSF) Several of the available peptide mass fingerprinting programs use more sophisticated scoring algorithms. Correct for scoring bias due to protein size, in which larger proteins give rise to greater number of peptides, for tendendy of smaller peptides I databases to have a greater number of matches with search m/z values. Some of these algorithms also apply probability based statistics to better define the significance of protein identifications. Mascot PMF Mascot PMF score > 5% probability that the match is a random events, of no significance The significance of the result depends on the size of the database being searched Mascot PMF results Probability to be random Entry name Coverage similarity coverage % of protein length covered by the experimental peptides Mascot protein view ALDENTE Aldente is a tool to identify proteins from peptide mass fingerprinting data (http://www.expasy.org/tools/aldente/): Aldente, protein window Aldente, peptide view Aldente, results S = S1 * S2 S = final score for this entry S1 = sum of each peptide score S2 = protein level score The scoring is tunable: the weights of each parameter in the score, can be defined independently newt Swiss prot entry Aldente results Profound http://prowl.rockefeller.edu/prowl-cgi/profound.exe Profound results Graphic display of results MS-FIT University of California, San Fransisco UCSF Mass Spectrometry Facility http://prospector.ucsf.edu/prospector/4.0.8/html/msfit.htm MS Fit Ms fit detailed report