embr201540796-sup-0001-Appendix

advertisement
1
Appendix Supplementary Methods
GALNT3 Stable Integration
A synthetic full open reading frame (ORF) of GALNT3 (Genewiz, USA) was flanked by 5’
and 3´ ApaI and XbaI restriction sites and inserted into the ApaI/XbaI site of pAAVS1Sig2A-GALNT2-ZFN integration vector (pAAVS1-T2) from which the existing ApaI/XbaI
GALNT2 cassette had been removed. The integration vector generated was designated
pAAVS1-T3. The endogenous AAVS1 promoter, as previously described for GalNAc-T2,
drove expression of integrated pAAVS1-T3.
Sample Processing and Lectin Weak Affinity Chromatography
Secretomes: Cell culture supernatant was cleared for cell debris, 2X diluted using LACA+H
buffer and loaded onto a 0.8 mL VVA (Viscious Villosa) lectin (Vectorlabs) column twice,
followed by 20X CV wash with the same buffer. Finally, bound glycoproteins were eluted by
heating the lectin slurry in 0.02% RapiGest (Waters). Total cell lysates: Packed cells were
lysed in 0.1% RapiGest (Waters) in 50 mM ammonium bicarbonate using a Sonic
Dismembrator (Fischer Scientific) and the solutions were cleared by centrifugation (1,000g
for 10 min). The cleared lysate and secretome samples were heated for 10 min at 80C,
followed by reduction (5 mM DTT, 60C, 0.5 h), alkylation (10 mM iodoacetamide, RT, 30
min), and digestion with Trypsin (25 µg/sample) (Roche) (37C, ON). Digests were treated
with concentrated TFA (6 μl, 37C, 20 min) and cleared by centrifugation.
Quantitative O-glycoproteomic Strategy
The glycopeptide quantification based on M/L isotope labeled doublet ratios was evaluated to
estimate a meaningful cut-off ratio for substantial changes. First, we labeled equal amounts of
a HeLa cell digest (Thermo Scientific) with L or M isotopes, mixed these in a ratio 1:1, and
analyzed them by nLC-MS/MS. Assuming a normal (Gaussian) distribution we determined
that 99% of all quantified doublet M/L ratios were within +/-1 (Log10 scale) matching a
normal (Gaussian) distribution (Fig. 1). Next we labeled and mixed 1:1 digests from HepG2SC
and HepG2SCT2, and subjected these to LWAC separation. Comparing the distribution of
labeled peptides from the LWAC flow-through with the distribution of labeled glycopeptides
from the LWAC elution fractions showed that the quantitated peptide M/L ratios were
normally distributed with 99% falling within +/-1 (Log10), whereas the glycopeptide M/L
ratios followed a more asymmetric distribution with left and right “shoulders”, indicating a
higher abundance of “light” or “medium” labeled glycopeptides. Importantly, the labeled
glycopeptides produced doublets with varying ratios of the isotopic ions as well as a
2
significant number of single precursor ions without evidence of ion pairs. These arose either
from light or medium labeled samples and could obviously not be quantitated. We therefore
classified identified labeled precursor ions as doublets or singlets, where doublets refer to
glycopeptides where the precursor peak area for both light and medium labeled peptide and
the relative abundances could be calculated, while singlets refer to unique glycopeptides
where only one of the light or medium peptide precursors were identified. In the case of
singlets these cannot be automatically recognized as representing differential expression by
standard proteomics software (Proteome Discoverer), so we chose to consider singlets
manually and assigned them as such. When analysing differential O-glycoproteomes of
isogenic cells with and without a single GalNAc-T isoform, quantitative differences in
doublets would suggest that loss or gain of a GalNAc-T isoform resulted in partial changes in
glycosylation capacity, while occurrence of singlets would suggest complete loss of
glycosylation capacity. With respect to singlets it is obviously clear that these could also be
derived from experimental (or technical) errors including sample variations and ions hidden
by chemical noise, but also by true biological variations such as a change in the proteome
with loss or gain of expression of particular proteins their processed products. Nevertheless, it
is reasonable to consider singlets as candidates for isoform-specific O-glycosylation events
Isoelectric focusing
LWAC fractions from total cell lysate digests were screened by preliminary LC-MS for
glycopeptide content, and fractions most enriched in glycopeptides were pooled together,
dried by vacuum centrifugation, reconstituted in IPG rehydration buffer, and submitted to IEF
fractionation [1]. Isoelectric focusing was performed by a 3100 OFFGEL fractionator
(Agilent) using pH 3-10 strips (GE Healthcare) 12 fractions were collected and desalted by
custom Stage Tips (C18 sorbent from Empore 3 M) and submitted to LC-MS and HCD/ETDMS/MS as described below.
Mass spectrometry
EASY-nLC 1000 UHPLC (Thermo Scientific) interfaced via nanoSpray Flex ion source to an
LTQ-Orbitrap Velos Pro spectrometer (Thermo Scientific) was used for glycopeptide
analysis. A precursor MS1 scan (m/z 350–1,700) of intact peptides was acquired in the
Orbitrap at a nominal resolution setting of 30,000, followed by Orbitrap HCD-MS2 and ETDMS2 (m/z of 100–2,000) of the five most abundant multiply charged precursors in the MS1
spectrum; a minimum MS1 signal threshold of 50,000 was used for triggering data-dependent
fragmentation events; MS2 spectra were acquired at a resolution of 7,500 for HCD MS2 and
15,000 for ETD MS2. Activation times were 30 and 200 ms for HCD and ETD
fragmentation, respectively; isolation width was 4 mass units, and usually 1 microscan was
3
collected for each spectrum. Automatic gain control targets were 1,000,000 ions for Orbitrap
MS1 and 100,000 for MS2 scans, and the automatic gain control for fluoranthene ion used for
ETD was 300,000. Supplemental activation (20 %) of the charge-reduced species was used in
the ETD analysis to improve fragmentation. Dynamic exclusion for 60 s was used to prevent
repeated analysis of the same components. Polysiloxane ions at m/z 445.12003 were used as a
lock mass in all runs.
Data analysis
Data processing was performed using Proteome Discoverer 1.4 software (Thermo Scientific)
as previously described with small changes [2]. Due to the high speed of data processing we
used Sequest HT mode instead of Sequest. All spectra were initially searched with full
cleavage specificity, filtered according to the confidence level (medium, low and unassigned)
and further searched with the semi-specific enzymatic cleavage. In all cases the precursor
mass tolerance was set to 6 ppm and fragment ion mass tolerance to 50 mmu.
Carbamidomethylation on cysteine residues was used as a fixed modification. Methionine
oxidation and HexNAc attachment to serine, threonine and tyrosine were used as variable
modifications for ETD MS2. All HCD MS2 were pre-processed as described [2] and searched
under the same conditions mentioned above using only methionine oxidation as variable
modification. All spectra were searched against a concatenated forward/reverse humanspecific database (UniProt, January 2013, containing 20,232 canonical entries and another
251 common contaminants) using a target false discovery rate (FDR) of 1 %. FDR was
calculated using target decoy PSM validator node, a part of the Proteome Discoverer
workflow. The resulting list was filtered to include only peptides with glycosylation as a
modification. This resulted in a final glycoprotein list identified by at least one unique
glycopeptide. ETD MS2 data were used for unambiguous site assignment. HCD MS2 data
were used for unambiguous site assignment only if the number of GalNAc residues was equal
to the number of potential sites on the peptide. Glycopeptide M/L ratios were determined
using dimethyl 2plex method. The M/L ratio of the detected precursor ions doublet was
calculated using sequentially the Event Detector Node and the Precursor Ion Node of the
Proteome Discoverer Workflow. The Event Detector node was used for peak area
quantification clustering isotopes of precursor ions that elute during the same retention time.
Isotopically labeled ions were finally quantified using Precursor Ions Quantifier Node.
Manual validation of the precursor ions without reported quantitative values (singlets) within
a selected pool of data has revealed that to be able to distinguish the substantial up/down
regulations within the chemical noise the minimum required threshold level has to be at least
5E5. We have inspected all of such cases manually by calculating XIC of the assigned
4
precursor ions and the respective potential dimethyl pair in raw data and reported them as
outliers (“singlets”).
RNA transcriptomic analysis
The analysis was performed on total RNA from two clones each of HepG2WTT1, T2, T3,
and two pools of HepG2WT cells. The Beijing Genetics Institute (BGI) performed the RNA
transcriptomic analysis (RNAseq). Briefly, library was constructed using Illumina Truseq
RNA Sample Preparation Kit. High quality total RNA was DNase I treated, mRNA isolated
by magnetic bead Oligo (dT) enrichment, followed by mRNA fragmentation, cDNA
synthesis, single nucleotide A (adenine) addition, and adapter ligation. Agarose gel purified
fragments (~ 160 bp) were subjected to PCR amplification and quality control (QC) before
and subjected to next generation sequencing using Illumina HiSeq 2000 System (Illumina,
USA). Raw reads were QC’ed and filtered clean reads were aligned to human reference
sequences with SOAPaligner/SOAP2 [3]. Total reads per sample were around 60 million with
an average of 75% total mapped reads and an averaged 50% perfect match reads. Gene
coverage ranged from 90-100% for 60% and 0-10% for 8%.
Bioinformatic analysis
Raw read counts from the RNAseq analysis were analysed using the DESeq [4] and EdgeR
[5] packages for R and Bioconductor. Biological coefficient of variations were derived (WT 0.03, T1KO - 0.24, T2KO - 0.29 and T3KI - 0.35). DESeq and EdgeR analyses were run
using default parameters, and following previously defined methods [6]. For the EdgeR
analysis, counts were filtered so that there were at least 2 samples where there were more than
1 count per million. DESeq cutoffs were set at an adjusted p-value of 0.1, while EdgeR
cutoffs were set at a p-value of 0.05 as per the referenced method. A further filtering step was
applied, where the difference of the log2 RPKM between samples was required to be between
-1 and 1 (i.e. only transcripts where there existed a less than a 2-fold difference in RPKM
between the samples), and any transcripts that did not pass this filtering were rejected.
Finally, the Entrez gene identifiers that were present in both the EdgeR and DESeq
differential lists were selected to be the final list of differentially expressed genes.
GO enrichment was performed using the GO and GOStats [7] packages available in
Bioconductor. For initial enrichments, conditional testing was used to ensure that the most
specific terms were enriched for where possible. Both the GO database, as well as the Uniprot
GO annotation was used as sources of GO annotations. UniProt identifiers were first mapped
to Entrez ids, and GO annotation was combined with the GO.db annotations. A hypergeometric test was applied to this combined annotation to calculate GO enrichment for sets of
5
genes and proteins.
Additional references
1.
Vakhrushev SY, Steentoft C, Vester-Christensen MB, Bennett EP, Clausen
H, Levery SB (2013) - Enhanced mass spectrometric mapping of the human
GalNAc-type O-glycoproteome with SimpleCells. Mol Cell Proteomics 12: 932-44
2.
Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB,
Schjoldager KT, Lavrsen K, Dabelsteen S, Pedersen NB, Marcos-Silva L, et al.
(2013) Precision mapping of the human O-GalNAc glycoproteome through
SimpleCell technology. The EMBO journal 32: 1478-88
3.
Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an
improved ultrafast tool for short read alignment. Bioinformatics 25: 1966-7
4.
Anders S, Huber W (2010) Differential expression analysis for sequence
count data. Genome biology 11: R106
5.
McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of
multifactor RNA-Seq experiments with respect to biological variation. Nucleic
acids research 40: 4288-97
6.
Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W,
Robinson MD (2013) Count-based differential expression analysis of RNA
sequencing data using R and Bioconductor. Nature protocols 8: 1765-86
7.
Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term
association. Bioinformatics 23: 257-8
Download