Bioinformatics, Genomics, and Proteomics (Part II) Proteomics • The comprehensive study of all the proteins of a cell, tissue, body fluid, or organism from a variety of perspectives, including structure, function, expression, profiling, and proteinprotein interactions. • Insight into the proteins that are present in a cell or tissue under particular biological conditions can aid in our understanding of the cell’s activities. • Genomic sequence has limitation. Limitation in Genomics • Some annotated open reading frames (ORFs) are subsequently found not to encoded proteins. • Others encode proteins whose functions cannot be predicted from the sequence. • Post translational modifications that influence the protein function and cellular localization often cannot be predicted from the sequence. • mRNA levels do not always correlate with protein levels, and interactions between proteins cannot be accessed by genomics. Proteomics • On the other hand, a protein’s function can sometimes be inferred by determining the condition under which it is expressed and active. • From a practical stand point, proteomics can be used to track clinical disorders and detect targets for therapeutic treatments. Proteomics - Complications • In eukaryotes, there are many more proteins than genes due to the alternative splicing, post translation modifications, and post transcriptional modifications to RNA (RNA editing.) • It is impossible to account experimentally for every member of a proteome with a single technique because proteins are susceptible to degradation; have different properties, including solubilities; and range considerably in abundance. 2D PAGE • First dimension – isoelectric focusing is performed to first separate the proteins in a mixture on the basis of their net charge. • The protein mixture is applied to a pH gradient gel. When an electric current is applied, protein will migrate to ward either the anode (+) or cathode (-), depending on their net charge. • As proteins move through the pH gradient, they will gain or lose protons until they reach a point in the gel where their net charge is zero. • The pH in this position of the gel is known as the isoelectric point. 2D PAGE • Second dimension – separate by molecular weight. • Several proteins in a sample may have the same isoelectric point and therefore migrate to the same position in the gel. • Proteins are further separated on the basis of differences in their molecular weights (MW) by electrophoresis, at a right angle to the first dimension, through a sodium dodecyl sulfate (SDS) polyacrylamide gel. Gel is visualized by Coomassie blue or silver protein stain. • A 2D polyacrylamide gel can resolve up to 2,000 different proteins. 2D PAGE 2D PAGE • The pattern of stained spots is captured by densitometric scanning. • Databases have been established with images of 2D PAGE from different cell types. • Software packages are available for detecting spots, matching patterns between gels, and quantifying the protein content of the spots. • The next task after separation is to excise the individual proteins from the gel, and to identify as many of the proteins as possible using mass spectrometry (MS.) 2D PAGE - Limitation • Proteins with either low or high molecular weights, those that are found in cellular membranes, and those that are present in small amounts are not readily resolved by 2D PAGE. • Highly charged proteins, such as ribosomal proteins and histone proteins, are not separated by standard conditions. MALDI - MS • A spot is excised from the gel and treated with trypsin. • Purified trypsin peptides are separated by MALDI – time of flight (TOF) MS. • The set of peptide masses from the unknown protein are used to search a database, and the best match is determined. ESI – MS - MS • A spot is excised from the gel and treated with trypsin. • Purified trypsin peptides are separated according to their mass/charge (m/z) ratios, and the amino acid sequence of a selected peptide is determined with a MS. • The unknown protein is identified by searching a protein database with the amino acid sequences from two or more peptides. Protein Expression profiling • The 2D differential in-gel electrophoresis method for quantitative analysis of protein expression. • The proteins of two proteomes are labeled with the fluorescent dyes Cy3 and Cy5, respectively. • The samples are combined and run on 2D PAGE. • The gel is scanned for each fluorescent dye, and the relative levels of two dyes in each protein spot are recorded. • The gel is stained with protein dye and unknown spot is excised and treated with trypsin. • The peptides are separated by ESI-MS-MS, and the amino acid sequences are determined. ICAT- LC - MS - MS • Proteins from two proteomes are labeled with light and heavy ICAT reagent. • The samples are combined and treated with trypsin. • The peptides are captured by affinity chromatography using avidin, and fractionated by LC. • The ration of light:heavy is determined by MS. • Amino acid sequences are determined by ESI-MS-MS. Protein Microarray • Conceptually, protein microarrays are similar to DNA microarrays. • They consist of large numbers of proteins individually immobilized in known positions on the coated surface of glass slide or silicon chip. • The proteins arrayed can be antibodies specific for each protein in an organism, purified recombinant proteins, or short synthetic peptides. • There are many ways of attaching a protein to a support surface. • The major objective of any coupling system is maintenance of protein structure and function. Protein Microarray • Some systems bind proteins to a chemical group that coat the surface of the support. • With other protocols, recombinant proteins are prepared with a short amino acid sequence (tag) at N or C terminus that bind to a recognition sequence on the support. In this case, all the protein molecules are uniformly oriented. • Instead of spotting proteins on a flat surface, some microarrays are engineered with tiny depression (nanowells) that keep each protein moist and prevent mixing with adjacent proteins. Protein Microarray • The purpose of protein microarray analyses is to detect, on a large scale, the molecules that a protein interacts with. • These interacting molecules can be other proteins, nucleic acid sequences, or low molecular-weight compounds. • Protein populations from different samples can be compared, for example, in control versus treated samples or in normal versus diseases tissues. Protein Microarray - Visualizing • Direct labeling – to label the test samples directly with a fluorescent dye and then detect the labeled molecules that bind to the proteins of a microarray with a laser scanner. Two-dye strategy (e.g. Cy3 or Cy5) can be used to compare proteins in two different sample on a single array. • Sandwich style – the sample molecules are biotinylated, and after the initial incubation, a streptavidin-fluorescent-dye conjugate that binds to biotin to facilitated the detection of sample molecules is applied. Protein array detection method Analytical VS Functional • Analytical protein microarray. Different types of ligand, including antibodies, antigens, DNA or RNA aptamers, carbohydrates or small molecules, with high affinity and specificity, are spotted down onto a surface. • These chips can be used for monitoring protein expression level, protein profiling and clinical diagnostics. • Similar to the procedure in DNA microarray experiments, protein samples from two biological states to be compared are separately labeled with red or green fluorescent dyes, mixed, and incubated with the chips. • Spots in red or green color identify an excess of proteins from one state over the other. Analytical VS Functional • Functional protein microarray. Native proteins or peptides are individually purified or synthesized using highthroughput approaches and arrayed onto a suitable surface to form the functional protein microarrays. • These chips are used to analyze protein activities, binding properties and post-translational modifications. • With the proper detection method, functional protein microarrays can be used to identify the substrates of enzymes of interest. • Consequently, this class of chips is particularly useful in drug and drug-target identification and in building biological networks. http://www.nature.com/nature/journal/v422/n6928/images/nature01512-f1.2.jpg Analytical microarray • Analytical microarrays are used for protein profiling, that is, detection and quantification of proteins present in a sample. • It could be antibody microarray or antigen microarray. • Antibody microarrays are often probed with proteins from biological sources, such as plasma or serum, or proteins that are secreted from cells in culture to determine disease-specific profiles. • For example, antibody microarrays that specifically detect cytokines have been formulated. Analytical microarray • Cytokine antibody microarrays are used to examine cytokines in both normal and diseased states, and from a variety of sources after various treatments. • A sandwich immunoassay is often used to detect cytokines that bind to immobilized antibodies. • After the microarray is treated, biotynylated cytokine antibodies are added and bind to the corresponding captured cytokine. • For visualization, a streptavidin-fluorescent-dye conjugate attaches to the biotin of the secondary antibody. • The signals are detected with a laser scanner. Cytokine antibody microarray Analytical microarray • Plasma samples from individuals with Alzheimer disease and those from individuals with no dementia were applied to a microarray made up of antibodies against 120 cytokines. • Eighteen cytokines were found to be associated with Alzheimer disease. • The levels of 7 of these were higher and 11 were lower in individuals with Alzheimer disease than in the subjects without dementia. • Possibly, the Alzheimer disease-specific cytokine signature may provide basis for a diagnosis test. Analytical microarray – Antigen • Another type of analytical microarray is protein (antigens) microarray. Proteins are attached to a solid support and then probed with antibodies, mostly in serum samples. • The purpose of these studies is to discover whether the production of antibodies against specific proteins correlates with particular diseases or biological process. • A microarray of 5,000 different human proteins was created and used to determine if serum from ovarian cancer patients has a distinctive set of antibodies in comparison to the antibody population of healthy individuals. Analytical microarray – Antigen • The initial results revealed 94 proteins that were specific ally recognized by antibody in the sera from the ovarian cancer patients. • With further testing, three proteins were consistently found to be specific for ovarian cancer. • The ovarian-cancer-specific proteins may help in the early detection of the disease. • The earlier ovarian cancer is diagnosed, the better the chance of survival. Analytical microarray • Analytical antibody microarray is also used to detect whether posttranslational modifications, such as phosphorylation of tyrosine or glycosylation, are associated with specific diseases. • Proteins are fist captured by primary antibodies immobilized on a microarray. • Then, the microarray is flooded with biotynylated antiphosphotyrosine antibody. • Next, streptavidin conjugated with a fluorescent dye is added, and the protein spot with the fluorescent is detected. • Detection of glycan group is performed in similar manner. Analytical microarray – Reverse phase • A multiprotein sample, for example, from a cell lysate or tissue specimen, is immobilized in a single spot on a support. • Several such multiprotein samples are spotted on the microarray. • Then, the microarray is probed with a single target molecule. • The advantage is that a large number of samples can be compared at one time. • With a reverse-phase microarray, the presence of specific proteins in multiple complex samples can be readily determined. Reverse-phase microarray format Functional protein microarray • Functional protein microarrays feature large sets of individual proteins that are used predominately to determine interactions with other proteins or low molecular-weight compounds, such as lipids, drugs, and metabolites. • Ideally, the functional protein array should consist of all possible proteins of a proteome under study. • To obtain comprehensive representation of a proteome, a library containing all of the protein coding sequences is first constructed. • A library of cloned protein-encoding ORFs has been dubbed an ORFeome. Functional protein microarray • The starting point for producing an ORFeome is usually PCR amplification of the coding sequences for cloning into a vector. • For prokaryotic organisms, the protein-coding sequences can often be readily identified from genomic sequences. • On the other hand, full-length cDNA libraries are the primary sources of the coding sequences of a eukaryotic proteome. Integration and excision of bacteriophage λ into and from the E. coli genome via recombination between attachment (att) sites in the bacteria and bacteriophage DNA. Primer pair used to amplify ORFs for recombinational cloning to generate an ORFeoem. Recombinational cloning • Primer pair is used to amplify ORFs resulting in PCRamplified ORF with attachment sites (attB1 and attB2). • Recombination between PCR-amplified ORF and a donor vector with attP1 and attP2 sites on either side of the ccdB gene results in an entry clone in which ORF is flanked by attL1 and attL2 sites. • The selectable marker (SM1)selects transformed cells with an entry clone. • The protein encoded by ccdB is toxic to transformed cells with non-recombined donor vector molecule. Recombinational cloning • The next step is the expression of each cloned ORF. • Recombination between the entry clone with attL1 and attL2 sites and a destination vector with attR1 and attR2 results in an expression clone with attB1 and attB2 sites flanking the ORF. • The selectable marker (SM2) selects transformed cells with an expression clone. • Cells with an intact destination vector that did not undergo recombination are killed by CcdB protein. • For construction of a microarray, each protein encoded by ORF is isolated by affinity purification. Protein – protein interaction mapping • Proteins seldom act alone. On average, one protein interacts with five others. • The two-hybrid method is used to determine pairwise protein-protein interactions. • The underlying principle of this assay is that the physical connection between two proteins reconstitutes an active transcription factor that initiates the expression of a reporter gene. • Generally transcription factors have two domains, DNA binding domain and activation domain. • These two domains need not to be part of the same protein to be functioning. Yeast Two-Hybrid System • The availability of complete genome sequences, make it possible to use the yeast two-hybrid system to screen for all possible interactions between the proteins in an organism rather than to test one bait at a time. • The ORFs from an organism’s genome are cloned into two plasmid vectors, one that expresses the bait (target) and another that produces the prey (interacting proteins are to be identified.) • Each is introduced into yeast cells by transfection. • A high-throughput mating method is then used to introduce each bait plasmid into yeast cells with each prey plasmid, and the hybrids are screened for expression of the reporter gene. http://www.sumanasinc.com/webcontent/animatio ns/content/yeasttwohybrid.html Complementation assay for detecting pairwise protein interactions in mammalian cells. Large-scale screens for protein interactions using the yeast two-hybrid system. Two libraries are prepared, one containing genomic DNA fragments or cDNAs fused to the DNA coding sequence for the DNA binding domain (bait library) and the other fused to activation domain (prey library.) Protein interaction map of calcium signaling protein clusters of D. melanogaster. Protein Arrays • The yeast two-hybrid system is powerful but it has some shortcomings. • The assay is based on transcription, so the bait and the prey proteins must enter the nucleus and interact in a cellular location very different from their normal environment. • Microarray can be used to screen for proteinprotein interactions. Protein Arrays • All the ORFs from the yeast genome are used to express each yeast protein tagged with a glutathioneS-transferase (GST) epitope. • GST-tagged proteins are purified and spotted onto glass slides to generate protein microarrays. • The protein under investigation is labeled and added to the array under gentle conditions that allow the proteins to interact. • The spots on microarrays are then analyzed for the intensity of the signal from the labeled interacting test protein. TAP tag procedure for protein interactions • Two DNA sequences (tag1 and tag2), each encoding a short amino acid sequence with high affinity for a specific molecule, are cloned together and fused in frame to the 3’ end of a cDNA. • The tagged cDNA construct is introduced into a host cell, where it is transcribed and translated. • Other cellular proteins bind to the protein encoded by cDNA X. The complex interacting proteins are separated by the binding of tag1 to its affinity partner. • The cluster is eluted from the affinity partner by cleaving off tag 1. TAP tag procedure for protein interactions • A second purification step is carried out with tag 2 and its affinity partner. • The proteins of the cluster are separated by onedimensional PAGE. • Single bands are excised from the gel and treated with trypsin. • Peptide amino acid sequences are obtained with ESIMS-MS and searched against a protein database.