How we analyzed proteomic data? Session III 台大生技教改暑期課程 Topics for session III Image analysis (for 2-DE gel) 2. Mass data analysis 3. Protein structure analysis 1. 1. Image analysis Examples of 2-DE results Healthy control Patient D Digest to peptide fragment MS analysis Unexpected variation between gels Interior variation between 2-DE experiments – Same loading amount? – Same gel condition? – Same staining condition? Exterior variation after gel developed – Unwanted spots (dye or reagent deposit) – Dirty spots (hair, dust) What image analysis software do? – – – – – – – – spot detection unwanted spot filtering background subtraction normalization image matching expression comparison pI/MW calibration data organization Currently available 2-DE image analysis software – Melanie 4 – – – – – – (Swiss Institute of Bioinformatics, SIB) Phoretix 2D (Nonlinear dynamics) Progenesis (Nonlinear dynamics) Z3 and Z4000 (Compugen) Delta2D (Sunergia group) A-GelFox 2D (Alpha innotech) Flicker (NCI, through internet) Spot detection One of the first and most important steps in 2-DE analysis. Locating the spots in the gel image defining their shape calculating measurement information (volume and area) Spot detection Filtering Removal of unwanted spots: What’s unwanted spots: dust on gel stain deposit bulky spots Background subtraction General background subtraction method: No background Mode of non-spot Manual background Lowest boundary Average boundary Normalization General normalization method: Total spot volume Single spot Total volume ratio Image matching Expression comparison Two fold up or down expression are thought to be significant. pI/MW calibration observed or experimental pI/MW PI calibration MW calibration Data organization Spot annotation 2. Mass data analysis Useful proteomic resource http://tw.expasy.org/ Useful proteomic resource Matrix Science - Mascot http://www.matrixscience.com/ Three major functions in Mascot Peptide Mass Fingerprint (PMF): The experimental data are a list of peptide mass values from an enzymatic digest of a protein. (MALDI-TOF) Sequence Query: One or more peptide mass values associated with information such as partial or ambiguous sequence strings, amino acid composition information, MS/MS fragment ion masses, etc. A super-set of a sequence tag query. MS/MS Ion Search: Identification based on raw MS/MS data from one or more peptides. (LC/MS/MS) Difference between MALDI-TOF and LC/MS/MS MALDI-TOF LC/MS/MS 2-1 PMF analysis Raw data for PMF m/z 899.2076 905.2126 909.1917 915.2181 925.4637 938.3972 1044.3007 1050.3141 1060.2797 1066.2889 1072.2991 1078.3169 1084.2657 1088.2793 1094.2947 1104.2638 1110.2837 1271.3163 Relative intensity 2980.8123 1471.3723 2312.2317 1533.8486 1881.7635 1528.9462 2111.9482 2396.1550 4689.0698 7302.0029 5688.8511 8919.1113 1474.5900 3180.5122 4573.5195 1546.4652 1470.9734 1498.0187 Mascot PMF query form Mascot PMF parameters Your name; Email Search title Database Taxonomy Enzyme Monoisotopic or Average Modifications Protein Mass Peptide tol. ± Mass values Missed cleavages Data file Query Database Database Comment EST EST divisions of Genbank, (currently EST_human, EST_mouse, EST_others) MSDB Comprehensive, non-identical protein database NCBInr Comprehensive, non-identical protein database OWL Non-identical protein database (obsolete) Random Random sequences for verifying scoring statistics SwissProt High quality, curated protein database Taxonomy Ensure the hit list will only contain entries from the selected species speed up a search bring a weak match Enzyme Name Cleave Don't cleave N or C term Trypsin KR P CTERM Arg-C R P CTERM Asp-N BD NTERM Asp-N_ambic DE NTERM Chymotrypsin FYWL CNBr M CTERM Formic_acid D CTERM Lys-C K Lys-C/P K CTERM PepsinA FL CTERM Tryp-CNBr KRM P CTERM TrypChymo FYWLKR P CTERM Trypsin/P KR V8-DE BDEZ P CTERM V8-E EZ P CTERM CNBr+Trypsin P P None see notes semiTrypsin see notes CTERM CTERM M KR CTERM CTERM P CTERM Enzyme "None" is not an allowed choice for a Peptide Mass Fingerprint, where the specificity of an enzyme is essential. If the search fails to produce a positive match, then try again with semiTrypsin (below) before resorting to "None". "semiTrypsin" means that Mascot will search for peptides that show tryptic specificity (KR not P) at one terminus, but where the other terminus may be a non-tryptic cleavage. This is a half-way house between choosing "Trypsin" and "None". Monoisotopic or Average nominal mass values: calculated from integer atomic weights. (H=1, C=12, N=14, O=16), not practical in proteomics. Average mass values: equivalent to taking the centroid of the complete isotopic envelope Monoisotopic mass value: the mass of the first peak of the isotope distribution. Monoisotopic or Average For peptides and proteins, the difference between an average and a monoisotopic weight is approximately 0.06%. Insulin (5.8 kD) Albumin (66.4 kD) Monoisotopic Tol: 1 Da Monoisotopic MW -1.01 Average Tol: 1 Da Average MW Monoisotopic Tol: 2 Da Average Tol: 2 Da Modifications Most protein samples exhibit some degree of modification. Natural post-translational modifications: phosphorylation and glycosylation. Deliberate modifications: deliberately introduced during sample work-up, such as cysteine derivatisation. Modifications Fixed modifications are applied universally, to every instance of the specified residue(s) or terminus. Example: Carboxymethyl (Cys) means that all calculations will use 161 Da as the mass of cysteine. Variable modifications are those which may or may not be present. Example: if Oxidation (Met) is selected, and a peptide contains 3 methionines, Mascot will test for a match with the experimental data for that peptide containing 0, 1, 2, or 3 oxidised methionine residues. Modifications Fixed modifications are applied universally, to every instance of the specified residue(s) or terminus. Example: Carboxymethyl (Cys) means that all calculations will use 161 Da as the mass of cysteine. Variable modifications are those which may or may not be present. Example: if Oxidation (Met) is selected, and a peptide contains 3 methionines, Mascot will test for a match with the experimental data for that peptide containing 0, 1, 2, or 3 oxidised methionine residues. Peptide tol. ± The error window on experimental peptide mass values % fraction expressed as a percentage mmu absolute milli-mass units, i.e. units of .001 Da ppm fraction expressed as parts per million Da absolute units of Da Missed cleavages Missed cleavage = 0, complete digestion Missed cleavage >=1, incomplete digestion Submit and processing Concise protein summary protein summary PMF protein view (I) Protein name Score and Expect MW and pI coverage PMF protein view (II) Match peptides RMS error No match peptides Protein information 2-2 MS/MS analysis Raw data for MS/MS Parent ion Daughter ion Mascot MS/MS query form Protein summary Most possible candidate MS/MS Protein view (I) The sum of all highest scores within each peptide group MS/MS Protein view (II) Protein score: The sum of all highest scores within each peptide group Peptide view 3. Protein structure analysis Research Collaboratory for Structural Bioinformatics (RCSB) Protein data bank (PDB) Proteosome Example: 3D structure for proteosome Example: 3D structure for proteosome Example: 3D structure for proteosome Example: 3D structure for proteosome Stereo view Rotation Secondary Structure