Protein Analysis Tools - HSLS

advertisement
Protein Analysis Tools
2nd April, 2012
Ansuman Chattopadhyay, PhD,
Head Molecular Biology Information Service
Health Sciences Library System
University of Pittsburgh
ansuman@pitt.edu
http://www.hsls.pitt.edu/guides/genetics
What we’ll do:

Brief overview of CLC Main Workbench

find genomic context of a protein sequence

search for the presence of conserved
domains

create a multiple sequence alignment plot
What we’ll do:

analyze primary structure such as, hydrophobicity,
hydrophylicity, antigenicity, repeat sequence detection
etc.

predict secondary structure


predict post translational modification such as,
 Phosphorylation, glycosylation, ….
search for interacting partners

predict domain driven protein-protein interactions
Workshop Resources
http://www.hsls.pitt.edu/molbio/tutorials
HSLS MolBio Videos
Sequence Analysis Software Suits




Wisconsin GCG
VectorNTI
DNA STAR-LaserGene
Geneious
 CLC
Main
Why CLC Main ?







Windows
Mac
Linux
DNA, RNA, Protein,
Microarray Data Analysis
Regular Update
HSLS Licensed
CLC Main Access

HSLS CLC Main Registration


Link: http://www.hsls.pitt.edu/molbio/clcmain
Access via Pitt - Network Connect

Instruction video: http://goo.gl/JNjMt
CLC Main Workbench Overview



Graphical Users Interface
Protein sequences Import
Sequence Navigation

CLC Main Graphical User Interface
(GUI)
CLC Main
Navigate a protein
sequence
Videos



CLC Main –getting started (basic navigation
steps): http://media.hsls.pitt.edu/media/molbi
ovideos/clc-navigation-ac0312.swf
CLC Main Workbench Walkthrough (Part1):
http://media.hsls.pitt.edu/media/molbiovideos/
clcmain-walkthrough-part1-ac0112.swf
CLC Main Workbench Walkthrough (Part2):
http://media.hsls.pitt.edu/media/molbiovideos/
clcmain-walkthrough-part2-ac0112.swf
Import a Protein
Sequence
Protein Sequence

Human PLCg1




Refseq no: NP_002651
Uniprot Accession Number: P19174
FASTA file
Raw sequence
CLC features:
Search, Import, Create new sequence
Videos

Import a DNA /Protein sequence into CLC
Main
(Part1):http://media.hsls.pitt.edu/media/molbi
ovideos/clc-import-part1-ac0112.swf

Import a DNA /Protein sequence into CLC
Main (Part
2):http://media.hsls.pitt.edu/media/molbiovide
os/clc-import-part2-ac0112.swf
CLC protein sequence
Protein sequence manipulation

Create a new protein with PLCg1 SH2-SH2SH3 domains
Sequence Alignment

Pair-wise Alignment



Global
Local
Multiple Sequence Alignment
Sequence Alignment
Pair-wise Sequence Alignment
Multiple Sequence Alignment
Multiple Sequence Alignment

Tools: ClustalW and T-coffee
PLCg1 Orthologous sequences

PLCg1:

Mouse:
Rat:
Cow:
Dog:
Zebra fish:
NP_067255
NP_037319
NP_776850
XP_542998
NP_919388

Human:
NP_002651

NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651




Videos

Create a multiple sequence alignment plot
using CLC(part1):
http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212
part1.swf

Create a multiple sequence alignment plot
using CLC (part2):
http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212part2.swf

Create a multiple sequence alignment plot:
http://media.hsls.pitt.edu/media/clres2705/msa.swf

Compare two peptide sequences.:
http://media.hsls.pitt.edu/media/clres2705/blast2.swf

Starting with a short peptide sequence find:


the whole protein sequence
orthologs in other species (nematode)
Tool:
UCSC BLAT
NCBI BLAST against SwissProt
Peptide to whole protein

Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Videos

Place a mRNA or peptide sequence into
the human genome (BLAT):
http://www.hsls.pitt.edu/molbio/videos/play?v=12e

Find homologous sequences:
http://media.hsls.pitt.edu/media/clres2705/blast.swf
Find homologous sequence
SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Sequence Manipulation & Format Conversion

Sequence Manipulation Suite


http://bioinformatics.org/sms2/
Readseq

http://thr.cit.nih.gov/molbio/readseq/
GenePept
FASTA
Hands-On

Retrieve amino acid sequence present
between position 25 to 45 in Sequence A
(MS Word Doc)


Identify the rat gene which encodes this peptide
fragment and retrieve its whole protein sequence
Find the fruit fly homolog of this protein.


What % identity the fruit fly protein shares with its rat
homolog?
Predict potential MAPK phosphorylation sites present in
the fruit fly protein
Protein Domain Search: InterPro Scan

InterPro is a database of protein families, domains,
regions, repeats and sites in which identifiable features
found in known proteins can be applied to new protein
sequences.
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Videos:



Find protein domains, PTM, secondary str etc:
http://media.hsls.pitt.edu/media/clres2705/unipro
t.swf
Start with a protein pattern and find what
proteins posses that domain:
http://media.hsls.pitt.edu/media/clres2705/scanp
rosite.swf
Search for protein domains,repeats and sites:
http://media.hsls.pitt.edu/media/clres2705/interpr
o.swf
Protein Domain Search: ScanProsite
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Pattern Search

[AC]-x-V-x(4)-{ED}:

This pattern is translated as: [Ala or Cys]-any-Valany-any-any-any-{any but Glu or Asp}

F-[GSTV]-P-R-L-[G>]
Pattern Search
Protein Primary Structure Analysis

Tool: ExPASy from SIB




Calculated Mol Wt
Theoritical PI
Extinction coefficients
Estimated half-life


Hydropathicity plot : Kyte & Doolittle
Hydrophilicity plot: Hopp T.P., Woods K.R
Antigenic Site Prediction

Tool: Emboss Antigenic
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
EmBoss Antigenic

Antigenic predicts potentially antigenic regions of a protein sequence, using
the method of Kolaskar and Tongaonkar.Analysis of data from
experimentally determined antigenic sites on proteins
has
revealed that the hydrophobic residues Cys, Leu
and Val, if they occur on the surface of a protein,
are more likely to be a part of antigenic sites. A
semi-empirical method which makes use of physicochemical properties of
amino acid residues and their frequencies of occurrence in experimentally
known segmental epitopes was developed by Kolaskar and Tongaonkar to
predict antigenic determinants on proteins. Application of this method to a
large number of proteins has shown that their method can predict antigenic
determinants with about 75% accuracy which is better than most of the
known methods. This method is based on a single parameter and thus very simple
to use.
Transmembrane Region prediction
Transmembrane Site Prediction
Tool: TMHMM Server
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Protein Secondary Structure
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Protein-Protein Interactions Prediction
Tool: STRING
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Hands-on

Take the human BCL2 protein sequence and





Find its domain architecture
Predict the topology of its transmembrane region
Design suitable antigenic site for antibody generation
What is its calculated Mol Wt and Ext Coefficient?
Predict its secondary structure


What % of this protein possesses alpha helical structure?
Predict its potential interacting partners
Hands-on



Prediction of potential phosphorylation sites
present in a protein sequence.
Sequence: human BCL2
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIF
SSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLR
QAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD
GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWI
QDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Phosphorylation Site Prediction:
Tool: NetPhos
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Phosphorylation Site Prediction:
Tool: GPS
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Thank you!
Any questions?
Carrie Iwema
iwema@pitt.edu
412-383-6887
Ansuman Chattopadhyay
ansuman@pitt.edu
412-648-1297
http://www.hsls.pitt.edu/guides/genetics
Download