Uploaded by idanancy2001

Genomics and Proteomics Lecture Notes

Eukaryotic genomes:
All of the eukaryotic nuclear genomes that have been studied are, like the human
version, divided into two or more linear DNA molecules, each contained in a different
chromosome; all eukaryotes also possess smaller, usually circular, mitochondrial
The only general eukaryotic feature not illustrated by the human genome is the
presence in plants and other photosynthetic organisms of a third genome, located in
the chloroplasts.
The smallest eukaryotic genomes being less than 10 Mb in length, and the largest over
100 000 Mb.
The nuclear genome of the yeast S. cerevisiae, which at 12 Mb is 0.004 times the size
of the human nuclear genome.
The genes themselves are more compact, having fewer introns, and the spaces
between the genes are relatively short, with much less space taken up by genomewide repeats and other non-coding sequences.
More complex organisms have fewer compact genomes.
Organelle genome:
Human mitochondrial genome:
o Small in size (18kb).
o Limited in function.
o 13 protein encoding genes.
o Genes encode proteins related to electron transport activity.
Chloroplast genome:
o Partial set of genes involved in photosynthesis. (Involved in light and dark reactions).
Genomics of Microbes and Microbiomes:
o Microbial genomics is largely the identification and characterization of their genetic
compositions. The ability to process and analyse the genomic data collected from
microbial organisms is a cornerstone of modern bioinformatics.
o The microbiome is the collection of all microbes, such as bacteria, fungi, viruses, and
their genes, that naturally live on our bodies and inside us. Although microbes are so
small that they require a microscope to see them, they contribute in big ways to
human health and wellness.
o A person’s core microbiome is formed in the first years of life but can change over
time in response to different factors including diet, medications, and environmental
o Differences in the microbiome may lead to different health effects from
environmental exposures and may also help determine individual susceptibility to
certain illnesses.
o Some microbes alter environmental substances in ways that make them more toxic,
while others act as a buffer and make environmental substances less harmful.
Genome sequencing technologies: Next Gen Sequencing:
Introduction to early sequencing techniques:
In 1975, Sanger introduced the concept of sequencing DNA, which suggests a rapid
method of sequencing that determines sequences based on Primed synthesis with DNA
Later in 1977, Allan Maxam and Walter Gilbert went suggested a better way to
sequence DNA by Chemical degradation of DNA in which terminally labelled DNA
fragments were chemically cleaved at specific bases and separated by gel electrophoresis.
Later ABI has introduced the first commercial DNA sequencer based on automated
technology in an ABI Prism 3700 with 96 capillaries.
The first human genome was sequenced in 2003, taking an effort of 13 years, with an
estimated cost of 2.5 billion USD.
Birth of Next gen sequencing:
The first NGS device was designed on 2005 and named as GS20. This was don’t by
combining single molecule emulsion PCR with Pyro sequencing (Shot gun sequencing
procedure, and sequencing by synthesis) of the entire genome of Mycoplasma genitalia.
In pyrosequencing the DNA synthesis is performed within a complex reaction that
includes ATP sulfurylase and luciferase enzymes and adenosine 5′ phosphosulfate and
luciferin substrates in such a way that, the pyrophosphate group releases upon addition of a
nucleotide, resulting in the production of detectable light. Later, GS FLX titanium was
developed in 2006.
2nd Gen sequencing of HT NGS:
The second-generation HT-NGS platforms can generate about five hundred million
bases of raw sequence (Roche) to billions of bases in a single run (Illumina, SOLiD).
The principle is based on the emulsion PCR amplification of DNA fragments, to make the
light signal strong enough for reliable base detection by the CCD cameras. Although the PCR
amplification has revolutionized DNA analysis,
3rd Gen Sequencing of HT NGS:
In some instances, 2nd Gen sequencing may introduce base sequence errors, thus
changing the relative frequency and abundance of various DNA fragments that existed before
To overcome this, the ultimate miniaturization into the nanoscale and the minimal use
of biochemicals, would be achievable if the sequence could be determined directly from a
single DNA molecule, without the need for PCR amplification and its potential for distortion
of abundance levels.
This sequencing from a single DNA molecule is now called as the “third generation of
HT-NGS technology”.
Genome Assembly:
o Genome assembly refers to the process of putting nucleotide sequence into the correct
order. Genome assembly is made easier by the existence of public databases.
o If a genome assembly is finished to the level of whole linear chromosomes, the ends
will contain tandem (consecutive) repeat sequences found within telomeres, ranging
from 5-mer to 27-mer repeated several thousand times, which both protect the end of
the chromosome from deterioration, chromosomal fusion, or recombination, and as a
mechanism for senescence and triggering apoptosis.
Comparative Genomics and its applications:
o Comparative genomics is the direct comparison of complete genetic material of one
organism against that of another to gain a better understanding of how species
evolved and to determine the function of genes and noncoding regions in genomes.
o It includes a comparison of gene number, gene content, and gene location, the length
and number of coding regions (called exons) within genes, the amount of non-coding
DNA in each genome, and conserved regions maintained in both prokaryotic and
eukaryotic groups of organisms.
o Comparative genomics not only can trace out the evolutionary relationship between
organisms but also differences and similarities within and between species.
Genome Correspondence:
The method of determining the correct correspondence of chromosomal segments and
functional elements across the species compared is the first step in comparative genomics.
This involves determining orthologous segments of DNA that descend from the same
region in the common ancestor of the species compared, and paralogous regions that arose by
duplication events prior to the divergence of the species compared.
1. Once genome correspondence is established, comparative genomics can aid gene
2. Comparative genomics provides a powerful way to distinguish regulatory motifs
from non-functional patterns based on their conservation.
3. Comparative genomics has wide applications in the field of molecular medicine
and molecular evolution.
4. Comparative genomics is used identification of drug targets of many infectious
5. Comparative analysis of genomes of individuals with genetic disease against
healthy individuals may reveal clues of eliminating that disease.
6. Comparative genomics helps in selecting model organisms.
7. Comparative genomics also helps in the clustering of regulatory sites, which can
help in the recognition of unknown regulatory regions in other genomes.
Computational Tools for gene expression analysis:
Computational tools for data integration:
1. Desiderata:
If researcher A wants to use a database kept and maintained by researcher B,
the “quick and dirty” solution is for researcher A to write a program that will
translate data from one format into another.
2. Data Standards:
Technical standards that define representations of data and hence provide an
understanding of data that is common to all database developers.
Standards are more relevant to future data sets.
Standards are indeed an essential element of efforts to achieve data integration
of future datasets.
3. Data Normalization:
Data normalization is the process through which data taken on the “same”
biological phenomenon by different instruments, procedures, or researchers
can be rendered comparable.
4. Data Warehousing:
Data warehousing is a centralized approach to data integration.
The maintainer of the data warehouse obtains data from other sources and
converts them into a common format, with a global data schema and indexing
system for integration and navigation.
5. Data Federation: Data federation calls for scientists to maintain their own
specialized databases encapsulating their particular areas of expertise and retain
control of the primary data, while still making it available to other researchers.
Data Presentation:
1. Graphical interfaces: Graphical interface is used for molecular visualization.
2. Tangible physical interfaces: Tangible, physical models that a human being can
manipulate directly with his or her hands are an extension of the two-dimensional
graphical environment.
3. Automated literature searching: the availability of full-text articles in digital
formats such as PDF, HTML, or TIF files has limited the possibilities for
computer searching and retrieval of full text in databases. In the future, wider use
of structured documents tagged with XML will make intelligent searching of full
text feasible, fast, and informative and will allow readers to locate, retrieve, and
manipulate specific parts of a publication.
Hierarchial Clustering:
o Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy
of clusters. Strategies for hierarchical clustering generally fall into two categories:
1. Agglomerative: This is a "bottom-up" approach: Each observation starts in
its own cluster, and pairs of clusters are merged as one moves up the
2. Divisive: This is a "top-down" approach: All observations start in one
cluster, and splits are performed recursively as one moves down the
o MATLAB, NCSS, Stata are somme of the exampled for Hierarchial classification
o Expanded as “Sequence tagged sites”.
o It is a relatively short, easily PCR-amplified sequence (200 to 500 bp) which can be
specifically amplified by PCR and detected in the presence of all other genomic
sequences and whose location in the genome is mapped.
o It was first introduced by Olson et al in 1989.
o STS-based PCR produces a simple and reproducible pattern on agarose or
polyacrylamide gel.
o The DNA sequence of an STS may contain repetitive elements, sequences that appear
elsewhere in the genome, but as long as the sequences at both ends of the site are
unique and conserved, researches can uniquely identify this portion of genome using
tools usually present in any laboratory.
o STS include markers such as microsatellites (SSRs, STMS or SSRPs), SCARs, CAPs,
and ISSRs.
Expressed sequence tags (ESTs):
o Expressed sequence tags (ESTs) are fragments of mRNA sequences derived through
single sequencing reactions performed on randomly selected clones from cDNA
o To date, over 45 million ESTs have been generated from over 1400 different species
of eukaryotes.
o EST projects are used to either complement existing genome projects or serve as lowcost alternatives for purposes of gene discovery.
o However, with improvements in accuracy and coverage, they are beginning to find
application in fields such as phylogenetics, transcript profiling and proteomics.
o The GSS division of GenBank is similar to the EST division, with the exception that
most of the sequences are genomic in origin, rather than cDNA (mRNA).
o It should be noted that two classes (exon trapped products and gene trapped products)
may be derived via a cDNA intermediate.
o Care should be taken when analyzing sequences from either of these classes, as a
splicing event could have occurred and the sequence represented in the record may be
interrupted when compared to genomic sequence.
o The GSS division contains (but is not limited to) the following types of data:
o random "single pass read" genome survey sequences.
o cosmid/BAC/YAC end sequences
o exon trapped genomic sequences
o Alu PCR sequences
o transposon-tagged sequences.
Transcriptome analysis:
o Transcriptome Analysis is the study of the transcriptome, of the complete set
of RNA transcripts that are produced by the genome, under specific
circumstances or in a specific cell, using high-throughput methods.
o The transcriptomic techniques have been particularly useful in identifying the
functions of genes.
Transcriptomics also allows identification of pathways that respond to or
ameliorate environmental stresses.
Uses of Transcriptome Analysis:
o Transcriptome Analysis is most commonly used to compare specific pairs of
samples. The differences may be due to different external environmental
o Transcriptome studies can classify cancer beyond anatomical location and
histopathology. Outcome predictions can establish gene-based benchmarks to
predict tumor prognosis and therapy response.
o The transcriptomes of stem cells help to understand the processes of cellular
differentiation or embryonic development.
o Because of its very broad approach transcriptome analysis is a great source for
identifying targets for treatment.
Unit III
Molecular Systems Biology:
o Molecular systems Biology integrates many types of molecular knowledge, which can
best be achieved by the synergistic use of models and experimental data.
o Two main approaches of systems biology can be distinguished.
o Top-down systems biology is a method to characterize cells using systemwide data originating from the Omics in combination with modelling.
o Bottom-up systems biology does not start with data but with a detailed model
of a molecular network on the basis of its molecular properties.
o In this approach, molecular networks can be quantitatively studied leading to
predictive models that can be applied in drug design and optimization of
product formation in bioengineering.
Kinetic models:
o Kinetic model approach is a type of modelling often used in systems biology.
o Kinetic models use experimentally determined kinetic parameters and network
structure, and has proven to be very promising.
o Many such type of kinetic models are found in the database for systems biology
o The first kinetic model was developed by Hoefnagel et al for pyruvate metabolism in
Lactococcus lactis.
o when precise and detailed knowledge of the kinetics of the molecular components is
available, so-called computer experimentation can be carried out which serves as an
adequate substitute for true experimentation.
Biomass objective function:
o Biomass objective function describes the rate at which all of the biomass precursors
are made in the correct proportions.
o One can formulate biomass objective function at a different level of detail.
Basic Level:
o The formulation process starts with defining the macromolecular content on the cell
and then the metabolites that make up each macromolecular group.
o With this information, it is possible to detail the required number of metabolites that
are needed along with associated reaction pathways.
Intermediate level:
o This level calculates the necessary biosynthetic energy that is needed to synthesis the
macromolecules whose building blocks are directly accounted for in a curated
metabolic network.
Advance Level:
o Advanced biomass objective functions can be formed by detailing the necessary
vitamins, elements, and cofactors required for growth as well as determining core
components necessary for cellular viability.
o Inclusion of vitamins, elements, and cofactors allow for the analysis of a broader
coverage of network functionality and required network activity.
Another advanced approach is to not only define the wild-type biomass content of the
cell, but to generate a separate biomass objective function that contains the minimally
functional content of the cell.
Biotechnological applications of system biology:
o Identifying or engineering microbial strains.
o Transcriptomics.
o Proteomics and metabolomics
o Predictive computational models and machine learning
o Re-engineering strains
o Bioprocessing
o Meta-omics.
Pharmacogenomics and Drug Discovery:
o Pharmacogenomics (sometimes called pharmacogenetics) is a field of research that
studies how a person’s genes affect how he or she responds to medications.
o Pharmacogenetic studies can be used at various stages of drug development.
o The effect of drug target polymorphisms on drug response can be assessed and
o This prevents the occurrence of severe adverse drug reactions and helps in better
outcome of clinical trials.
o The variations in drug response can be better studied with the wider application of
pharmacogenomic methods like genome wide scans, haplotype analysis and candidate
gene approaches.
o The efficacy of a drug, to a great extent, is determined by appropriate target selection,
which can be guided by pharmacogenomic methods.
o Pharmacogenomics can also be used to identify the target population that would
benefit the most from the drug.
Agricultural Genomics:
o Agricultural genomics is a rich field that has been contributing to advances in crop
development for decades.
o From sequencing reference genomes to genotyping for genome-wide association
studies to genomic prediction, advances in technology and applications have led to
breakthroughs in crop improvement.
o These innovations have resulted in elite cultivars that have been selected for
agriculturally desirable traits, including high yield, stress tolerance and pest
o One potential way for genomics to lend itself to crop improvement and food security
is through the collection-wide sequencing and classification of established seed banks
or gene banks, in which important agricultural species are stored and maintained in
large collections organized by taxonomy and origin.
o The preservation of seed resources ensures that the natural genetic diversity captured
by the collections will not be lost.
o Accurately cataloguing these resources by using genomic data provides precise and
usable information for breeders and scientists, while simplifying the effort by
identifying redundancy in the collection.
Qualitative proteome technology:
o Proteomics is crucial for early disease diagnosis, prognosis andto monitor the disease
Proteomics is one of the most significant methodologies to comprehend the gene
function although, it is much more complex compared with genomic.
o The below picture depicts an overview of proteomic technology
Gel Based approaches:
o SDS-PAGE is a high resolving technique for the separation of proteins according to
their size, thus facilitates the approximation of molecular weight.
o Different proteins in mixture migrate with different velocities according to the
ratio between its charge and mass.
o Example:
The protein profiling of Mycoplasma bovis and Mycoplasma
agalactiae through SDS-PAGE has high diagnostic value as these
species are difficult to differentiate with routine diagnostic procedures.
2. 2 D Gel Electrophoresis:
o The two-dimensional polyacrylamide gel electrophoreses (2D-PAGE) is an
efficient and reliable method for separation of proteins on the basis of their
mass and charge.
o 2D-PAGE is capable of resolving ~5,000 different proteins successively,
depending on the size of gel.
o The proteins are separated by charge in the first dimension while in second
dimension separated on the basis of differences between their mass.
o The 2-DE is successfully applied for the characterization of post-translational
modifications, mutant proteins and evaluation of metabolic pathways.
o Example:
o Listeria monocytogenes involved in the host–pathogen interactions
were analyzed with 2-DE and 30 different proteins of two strains were
3. 2D-DIGE:
o It is expanded as “Two-dimensional differential gel electrophoresis”.
o 2D-DIGE utilizes the proteins labeled with CyDye that can be easily
visualized by exciting the dye at a specific wavelength.
o Example:
o Cell wall proteins (CWPs) of toxic dinoflagellates Alexandrium
catenella labeled with Cy3 have been identified through 2D-DIGE.
o The 2-DE remains a method of choice in proteomic research, though certain
limitations enervate its potential as a principal separation technique in modern
Gel Free Approaches:
o In recent years, most developmental endeavours have been focused on alternative
approaches, such as promising gel-free proteomics.
o With the appearance of MS-based proteomics, an entirely new toolbox has become
available for quantitative analysis. Other important examples include ICAT labelling,
MS based SILAC (Stable Isotopic Labeling with Amino Acids in Cell Culture).
o These novel approaches were initially pitched as replacements for gel-based methods.
Quantative Proteome Technology:
1. ICAT labelling:
o The ICAT is an isotopic labeling method in which chemical labelling reagents are
used for quantification of proteins.
o The ICAT has also expanded the range of proteins that can be analyzed and
permits the accurate quantification and sequence identification of proteins from
complex mixtures.
o The ICAT reagents comprise affinity tag for isolation of labeled peptides,
isotopically coded linker and reactive group.
o Example:
o The systemic proteome quantification was carried out possible through
ICAT during cell cycle of Saccharomyces cerevisiae that supported the
cognition of gene functions.
2. Stable Isotopic Labelling with Amino Acids in Cell Culture:
o SILAC is an MS-based approach for quantitative proteomics that depends on
metabolic labelling of whole cellular proteome.
o The SILAC has been developed as an expedient technique to study the
post translational
o Additionally, SILAC is a vital technique for secreted pathways and secreted
proteins in cell culture.
o Example:
o SILAC was used for quantitative proteome analysis of B. subtilis in
two physiological states such as growth during phosphate and
succinate starvation.
o The intracellular stability of almost 600 proteins from human adenocarcinoma
cells have been analysed through “dynamic SILAC” and the overall protein
turnover rate was determined.
3. Isobaric tag for relative and absolute quantitation:
o iTRAQ is multiplex protein labelling technique for protein quantification
based on tandem mass spectrometry.
o This technique relies on labelling the protein with isobaric tags (8-plex and 4plex) for relative and absolute quantitation.
o The technique comprises labelling of the N-terminus and side chain amine
groups of proteins, fractionated through liquid chromatography and finally
analysed through MS.
o It is essential to find the gene regulation to understand the disease mechanism,
therefore protein quantitation using iTRAQ is an appropriate method that
helps to identify and quantify the protein simultaneously.
o Example:
iTRAQ has been applied for quantitative analysis of membrane and
cellular proteins of Thermobifida fusca grown in the absence and
presence of cellulose.
o iTRAQ was a useful tool for determination of molecular process
involved in development and function of natural killer (NK) cells.
4. X-ray crystallography:
o X-ray crystallography is the most preferred technique for three dimensional
structure determination of proteins.
o The highly purified crystallized samples are exposed to X-rays and the
subsequent diffraction patterns are processed to produce information about the
size of the repeating unit that forms the crystal and crystal packing symmetry.
o Example:
o X-ray crystallography revealed the structure of C-terminal fragment of
FtsZ and binding complex of FtsZ-ZipA of E. coli.
Functional Proteome Technology:
o Functional proteomics constitutes an emerging research area in the proteomic field
that is “focused to monitor and analyse the spatial and temporal properties of the
molecular networks and fluxes involved in living cells”.
The 2 major targets of functional proteomics are,
1. The elucidation of biological functions of unknown proteins and
The definition of cellular mechanisms at the molecular level.
Conclusion and Futures:
o Understanding protein function and unravelling cellular mechanisms at the molecular
level constitute a major need in modern biology.
o With the availability of full genome sequences, these goals can be achieved by
determining which macromolecules interact with a given protein in a specific manner.
o Proteomic analyses of the protein complexes occurring in vivo will disclose the
identity of the individual components and whether they differ from a territory to
Methods, algorithm and Tools in computational proteomics:
o In the fifties amino acid sequencing was already possible using Edman degradation
and the first computer programs for amino acid sequencing appeared.
o Iterative searching forms the basis of many of the modern and more complex software
algorithms for correlating MS data to peptide and sequence databases.
o The first MS database search engine with probability-based scoring function was
released in 1999 and formed the basis of the popular semicommercial search engine
Spectrum Interpretation:
o Proteins and peptides can be identified both from MS (PMF) and MS/MS (PFF)
o In MS (PMF) the mass pattern obtained from measuring the masses of purified or
simple protein mixture, enzymatically digested or chemically cleaved, is compared to
theoretical mass patterns generated in silico from a protein database.
o In MS/MS both intact mass of the peptide (parent ion mass) and fragment ions are
o De novo sequencing can also be used for validating search results obtained from
database dependent search algorithms.
Sequences tagging approach:
o The sequence tagging approach, a method similar to de novo sequencing, aims at
finding short tags and delta masses which can be used to search a sequence database.
o The sequence tag approach has recently regained momentum due to the high mass
accuracy of modern mass spectrometers.
o The sequence tagging approach makes the computational task of finding the correct
de novo sequence path less complex, more accurate, and computationally less
expensive for modified peptides.
1. Accessing tools: CPTAC, GDC, PDC, TCIA
2. Data Processing: CDAP, COSMO, DREAM AI, MassQC, MSInspector, OMICS
3. Data analysis: ARHT, Black Sheep, ChEA3.
Proteome Database:
o netXtProt is a database of the human proteome containing information on over 20,000
human proteins, the vast majority of which their existence has been determined
through direct identification or through the identification of their mRNA transcripts.
o https://www.nextprot.org/.
o Operating Institution: Swiss Institute of Bioinformatics
o Protopedia is a wiki style database containing information on a wide array of
biological macromolecules
o The site offers generalized information about the molecule of interest in addition to
providing 3D visuals on the structure of the molecule and information on its
interactions, locations, and targets.
o Link: http://proteopedia.org/wiki/index.php/Main_Page.
o Operating Institution: The Israel Structural Proteomics Center.
Protein Data Bank:
o The Protein Data Bank is a database of proteins and nucleic acids.
o The site offerors scientists and educators the ability to both upload new information
on known sequences and proteins or completely novel entities, and download
sequence and other scientific information as well.
o The database allows for searching based on sequence, function, ligand, and drug
targets as well as offering visual representations of the 3D structure of proteins, the
pathways they are involved in, mechanism of drug and ligand interaction, and where
their sequence lies on human chromosomes.
o Link: https://www.rcsb.org/
o Operating Institutions: Rutgers University, University of San Diego Supercomputer
Centre, University of San Francisco.
Human Protein Atlas:
o The Human Protein Atlas is a database containing the location of all known human
proteins with three major sub-divisions.
o The Tissue Atlas showing the protein expression levels and their localization in all
human tissues, the Cell Atlas shows the expression and sub-cellular localization of
proteins in 64 human cell lines, the Pathology Atlas shows the abhorrent expression
patterns of proteins in 18 different types of human cancers.
o In addition to these Atlases the database contains information on wide classes of
human proteins, protocols and methods for human protein experimentation, and
access to antibodies and cell lines for research.
o Link: https://www.proteinatlas.org/
o Operating Institutions: KTH Royal Institute of Technology, Uppsala University,
Science for Life Laboratory.
o STRING is a proteomic database focusing on the networks and interactions of
proteins in a wide array of species.
o STRING allows for the searching of one or multiple proteins at a time with the ability
to additionally limit the search to the desired species.
o Link: https://string-db.org/
o Operating Institutions: Swiss Institute of Bioinformatics, Novo Nordisk Foundation
Centre for Protein Research, European Molecular Biology Laboratory - Heidelberg,
University of Zurich.
Techniques to Study Protein-Protein Interaction:
Types of Protein-Protein intertaction:
o Protein interactions are fundamentally characterized as “stable or transient”, and both
types of interactions can be either strong or weak.
o Stable interactions are those associated with proteins that are purified as multi-subunit
complexes, and the subunits of these complexes can be identical or different.
o Transient interactions are temporary in nature and typically require a set of conditions
that promote the interaction, such as phosphorylation, conformational changes or
localization to discrete areas of the cell.
Methods to Study Protein-Protein Interaction:
1. Co-immuno precipitation:
o The interacting protein is bound to the target antigen, which is bound by the
antibody that is immobilized to the support.
o Immunoprecipitated proteins and their binding partners are commonly
detected by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDSPAGE) and western blot analysis.
2. Pull Down Assays:
o Pull-down assays are similar in methodology to co-immunoprecipitation
because of the use of beaded support to purify interacting proteins.
o The difference between these two approaches, though, is that while co-IP uses
antibodies to capture protein complexes, pull-down assays use a "bait" protein
to purify any proteins in a lysate that bind to the bait.
o Pull-down assays are ideal for studying strong or stable interactions or those
for which no antibody is available for co-immunoprecipitation.
3. Crosslinking Protein Interaction Analysis:
o Crosslinking interacting proteins is an approach to stabilize or permanently
adjoin the components of interaction complexes.
o Once the components of an interaction are covalently crosslinked, other steps.
4. Label transfer protein interaction analysis:
o Label transfer involves crosslinking interacting molecules (i.e., bait and prey
proteins) with a labelled crosslinking agent and then cleaving the linkage
between the bait and prey so that the label remains attached to the prey.
o This method is particularly valuable because of its ability to identify proteins
that interact weakly or transiently with the protein of interest.
5. Far western blot analysis:
o Just as pull-down assays differ from co-IP in the detection of protein–protein
interactions by using tagged proteins instead of antibodies, so is far–western
blot analysis different from western blotanalysis, as protein–protein
interactions are detected by incubating electrophoresed proteins with a
purified, tagged bait protein instead of a target protein-specific antibody,
o The term "far" was adopted to emphasize this distinction.
Interactome databases:
o Several public databases collect published PPI data and provide researchers access to
their curated datasets. These usually reference the original publication and the
experimental method that determined every individual interaction.
o The 6 databases for interactome studies are
1. Biological General Repository for Interaction Datasets (BioGRID).
2. Molecular INTeraction database (MINT).
3. Biomolecular Interaction Network Database (BIND)
4. Database of Interacting Proteins (DIP)
5. IntAct molecular interaction database
6. Human Protein Reference Database (HPRD)
o Definition: The study of posttranslational modifications of the proteins associated
with a particular genome.
o Posttranslational modifications of proteins possess key functions in the regulation of
various cellular processes.
Application of Proteomics in Clinical and biomedicine:
Application in Biomarker Discovery:
o In medicine, a biomarker can be a traceable substance that is introduced into an
organism as a means to examine organ function or other aspects of health.
o Proteomics technology has been extensively used in the molecular medicine
especially for biomarker discovery.
o By analyzing of a global protein profiling in the body fluids, proteomics can identify
invaluable disease-specific biomarkers.
o Expression of proteomics provides biomarker detection through comparison of
protein expression profile between normal samples vs. disease affected ones.
o After identifying biomarkers by mass spectrometry-based approach, biomarkers need
to process using bioinformatics analyses and also need to be reproduced in different
Application in Drug Discovery:
o As the drug discovery is an inherently complex process and values high-cost, new
emerging technologies such as proteomics can facilitate and accelerate discovery
o Drug discovery has many stages 3 and indeed it is a multidisciplinary field using
genomics, proteomics, metabolomics, bioinformatics, and system biology.
o Proteomics plays a major role in target identification step
o Proteomics studies also are useful for drug action, toxicity, resistance, and its efficacy
under examination.
Application of Proteomics in Agriculture:
o Since proteins are the main constituents of plants and its foods, proteomic technology
can monitor and characterize the protein content of foods and their changes during
production using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE)
and chromatography techniques in combination with mass-spectrometry.
o Concerning plant science, plant-proteomics methods can help to identify quality
biomarkers to design better and safer breeding.
o Proteomics is a useful tool for identification of microbial contaminations in plants.
o Proteomic techniques are increasingly used for breed quality control.
o Proteomics studies have identified numerous proteins that play crucial roles in plant
growth and development.
o Proteomics is used to study seed growth regulators.
o Proteomics is used to evaluate genetically modified crops.
o Also proteomics approach is an efficient tool to analyse agriculture crop biomass.