Corrections SEQUENCE 4 • • • • • • • • • • • • • • • • • • • • • • • • >seq4 MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEH EFDIIAYKTTFWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVE NADLVLVVDNHNRYDICNVYYRNKSGTDHTVVANTDGNLAELDELRWFKY RKLQYTWIDGEWSTPSRAYSHVTPENLASSAPTTGLKADDVALRRTYFGP NVMPVKLSPFYELVYKEVLSPFYIFQAISVTVWYIDDYVWYAALIIVMSL YSVIMTLRQTRSQQRRLQSMVVEHDEVQVIRENGRVLTLDSSEIVPGDVL VIPPQGCMMYCDAVLLNGTCIVNESMLTGESIPITKSAISDDGHEKIFSI DKHGKNIIFNGTKVLQTKYYKGQNVKALVIRTAYSTTKGQLIRAIMYPKP ADFKFFRELMKFIGVLAIVAFFGFMYTSFILFYRGSSIGKIIIRALDLVT IVVPPALPAVMGIGIFYAQRRLRQKSIYCISPTTINTCGAIDVVCFDKTG TLTEDGLDFYALRVVNDAKIGDNIVQIAANDSCQNVVRAIATCHTLSKIN NELHGDPLDVIMFEQTGYSLEEDDSESHESIESIQPILIRPPKDSSLPDC QIVKQFTFSSGLQRQSVIVTEEDSMKAYCKGSPEMIMSLCRPETVPENFH DIVEEYSQHGYRLIAVAEKELVVGSEVQKTPRQSIECDLTLIGLVALENR LKPVTTEVIQKLNEANIRSVMVTGDNLLTALSVARECGIIVPNKSAYLIE HENGVVDRRGRTVLTIREKEDHHTERQPKIVDLTKMTNKDCQFAISGSTF SVVTHEYPDLLDQLVLVCNVFARMAPEQKQLLVEHLQDVGQTVAMCGDGA NDCAALKAAHAGISLSEAEASIAAPFTSKVADIRCVITLISEGRAALVTS YSAFLCMAGYSLTQFISILLLYWIATSYSQMQFLFIDIAIVTNLAFLSSK TRAHKELASTPPPTSILSTASMVSLFGQLAIGGMAQVAVFCLITMQSWFI PFMPTHHDNDEDRKSLQGTAIFYVSLFHYIVLYFVFAAGPPYRASIASNK AFLISMIGVTVTCIAIVVFYVTPIQYFLGCLQMPQEFRFIILAVATVTAV ISIIYDRCVDWISERLREKIRQRRKGA Compute pI/Mw tool !!! If you choose the wrong format for the sequence… With the correct format: ProtParam • SAPS SAPS (1) SAPS (2) doi: 10.1093/bioinformatics/bti797 http://en.wikipedia.org/wiki/Coiled_coil The coiled-coil domains are annotated according to 3D structure data (experimental data) Coiled-coil prediction • Coils Coils prediction http://www.ch.embnet.org/software/COILS_form.html Coiled-coil prediction • PairCoil (not always working…) Paircoil prediction Coiled-coil prediction • PairCoil2 Parcoil2 results Coiled-coil prediction • Sliding window (Protscale) Sliding window amino acid scaleexample: • Bad results---- Bad results…. Sliding windows and amino acid scales Transmembrane domain: alpha-helix of 20 amino acids (hydrophobic) • -> amino acid scales: hydrophobicity and alpha helix • -> sliding window size: 20 amino acids Protscale Amino acid scale: Kyte and Doolittle (hydrophobicity) Sliding window size: 21 amino acids Protscale Amino acid scale: Chou&Fasman (alpha helix) Sliding window size: 21 Sliding windows and amino acid scales Transmembrane domain: alpha-helix of 20 amino acids (hydrophobic) • -> amino acid scales: hydrophobicity and alpha helix • -> sliding window size: 20 amino acids Method based HMM or NN HMMTOP HMMTOP • Protein: seq4 • Length: 1127 • N-terminus: IN • Number of transmembrane helices: 8 Transmembrane helices: 65-82 409-432 445-468 916-940 970-993 1020-1039 10521072 1089-1106 TMHMM (1) TMHMM (2) TMpred (1) PSORT II (1) - Look for the presence of a signal peptide. No signal peptide Signal peptides are often predicted as ‘transmembrane’ domains (or vice versa) as they amino acids with similar biochemical properties (hydrophic and alpha helix). Transmembrane: resume HMMTOP (8 TM) 1 in Big loop 1130 PSORT II (10 TM) 1 1130 Tmpred (10 TM) 1 in 1130 TMHMM (11 TM) 1 out 1130 ? missed TM P39986 O14072 Q9HD20 Q9Y139 P90747 Q9LT02 Q12697 O74431 Q9N323 Q27533 Q21286 Q9NQ11 O14022 Q95050 Q9XXW1 Consensus Segment Helix ...........mtkksfvsspivrdstllvpksliakpyvlpffplyatfaqlyfqqydryikgpewtfvylgtlvslnilvmLmpaW.nvkikakfny......... ...........mgskalitspdissgqlyiklptffhlyvwpfalfvypyigyvyqnklyseevryltyiavg...tihalfwLageW.ntkvyclmtc......... ..........................................................................................mekweelnshqp...... lvqyvslhvriptpltgvvlpfvplylsafylwinvtggqendttnndvitadnqtttdnittwndvgfigvlaiaflhiltlLfcyW.svhvlafltc......... ................mgvdqlvetiipynlrsiathlyvppftiitaiwtyvwlnifgyeey.yelgmlgyaaifvilalvlLfchW.mmpvrcflmc......... ............mssfrvggkvvekvdlcrkkqlvwrldvwpfailytvwlttivpsidfsd.....acialgglsafhilvlLfttW.svdfkcfvqfskmnlellv rsasennrgsfsghddvhnqhseylkpdyhekfypqyapnlhyqrfyiaeedlvigiaayqts.kfwyiiynlccfltfglvyLltrW.lphlkvkly.......... ngsgvysdeeeitemmleelnihpvlrresvgeaaglsedgccqilylveedlevgiagyktn.ksryrlyqaiclltlglayLifrW.lpkyfirfv.......... ...mstnnyqtlsqnkadrmgpggsrrprnsqhatastpsassckeqqkdvehefdiiayktt.fwrtfffyalsfgtcgifrLflhW.fpkrliqfr.......... .............................................mtlesgdhtltlfayrtg.pfrtilfyaltvltlgifrLilhW.kqkwdvkmr.......... mtsereplldtttrnrvydttdnpstkimkrekdnpkakttsfnqgklnigeetcdlyayket.igrqilfwlltivtlgfyqLlayW.vkslfvkvr.......... ....................msadssplvgstptgygtltigtsidplsssvssvrlsgycgs.pwrvigyhvvvwmmagiplLlfrW.kplwgvrlr.......... ...................mdsielkqlvpendsepgtprqllfqhydisneetigikpfksi.pakvyilrvteiltlgllhLiltW.lpefrlkwi.......... ...........................mrvssieaemenpidvdktdvegelkikqvtllren.........ivkkivfflvaifcsd.rpsvlkkvfy......... aleffiffllsltitygiliirkhiqslflkpsllkdsdyviiytineeyntfyntnyfkkyishinhmihtfikkkkknikknikkWnfqkynilflqfvcnlldii -----------------------------------------------------------------------------------L---W--------------------------------------------------------------------------------------------------------------------------------------------------------------aaaaaaaaaaaaaaaaaaaaaa-------------bbbbbbbbbbbbbbbbbbbbb-------------------- • The protein is known to contain 12 TM: one TM is missing at the N-terminus • The possible ways to find the correct protein topology is to do a multiple alignment with other family members, or to do some 3D experiment (which are difficult with proteins containing transmembrane domains) SEQ4 = Q9N323 Kristian Axelsen: personnal communication P0AER0 The Aquaglyceroporin contains ½ transmembrane regions which can not be predicted by programs, because the region is too short (less than 20 amino acids). There is no way to predict such transmembrane regions, except by doing 3D experiments. 3D experiments is the only way to confirm and ‘predict’ correctly transmembrane domains. Similarity analysis could then help to predict such regions in other protein of the same family. M3 and M7 are ‘demi’ transmembrane: not predictable http://www.uniprot.org/uniprot/P0AER0#section_features Look for the transmembrane regions of P31243 (try the different transmembrane prediction programs): your conclusions ? No transmembrane domains are found by any program because this protein, a porin, is anchored in the membrane by a specific 3D structure called beta barrel which does not have any alpha helix…. ‘beta barrel’ Mainly composed of beta-sheets in a 16-stranded beta-barrel formation and forms a pore in the membrane 1.7 - 2.5 nm in diameter. Note that the orientation of the strands is such that side chains alternately point into the interior and exterior of the pore; the former are strongly polar residues while the latter are very hydrophobic. Beta barrel Porin from Rhodobacter Alignment of the 2 isoforms The gene has two in-frame initiation codons and two different proteins are made by alternative initiation (of translation) According to this publication (PubMed: 11274159), there is a 'Dual targeting of spinach protoporphyrinogen oxidase II to mitochondria and chloroplasts by alternative use of two inframe initiation codons'. Immunoblot analysis of Protox II in spinach leaf. Total leaf Watanabe N et al. J. Biol. Chem. 2001;276:20474-20481 ©2001 by American Society for Biochemistry and Molecular Biology chloro mito Q94IG7 – Long isoform • wolfPSORT: chloroplast • TargetP: chloroplast • CH score: 0.826 • MI score: 0.026 • ER score: 0.101 • Other location: 0.060 • SignalP-NN: not secreted • score (D): 0.285 • SignalP-HMM: not secreted • SP probability: 6.2% • SA probability: 0.2% • ChloroP: chloroplast • prediction score: 0.549 • MITOPROT: mitochondria !!! exported to mitochondria with a probability of 0.71 !!!! Q94IG7 – Short isoform • wolfPSORT: mitochondrial • TargetP: mitochondrial • CH score: 0.123 • MI score: 0.504 • ER score: 0.048 • Other location: 0.400 • SignalP-NN: not secreted • score (D): 0.195 • SignalP-HMM: not secreted • SP probability: 3.1% • SA probability: 5% • ChloroP: not in chloroplast • prediction score: 0.473 • MITOPROT: other location • exported to mitochondria with a probability of 0.33 !!!!!! Cystein (61 modifications) and serine (46 modifications) are the amino acids with the highest number of known associated PTM. Beware: Resid considers the selenocystein as a PTM…this is not the case ! Phosphorylation P03372 P03372 P03372 UniProt data: Experimentally proved http://www.phosphosite.org/proteinAction.do?id=968&showAllSites=true The phosphorylation sites are localized on the ‘surface’ of the protein (homodimer) (where the amino acid are accessible to the kinases !) O-glycosylation P02724 Myristoylation P51876 P51876 NMT Myristoylator (Not predictable…) Protein: secreted protein (P02751, fibronectin) (predictable…) (predictable…) (Not predictable…) Can be predicted: -Subcellular location (PSORT, TargetP) -Domains (InterPro) -Signal -Sulfation -N-glycosylation -O-glycosylation -Phosphorylation (Not predictable…) THE END