1 Appendix Supplementary Methods GALNT3 Stable Integration A synthetic full open reading frame (ORF) of GALNT3 (Genewiz, USA) was flanked by 5’ and 3´ ApaI and XbaI restriction sites and inserted into the ApaI/XbaI site of pAAVS1Sig2A-GALNT2-ZFN integration vector (pAAVS1-T2) from which the existing ApaI/XbaI GALNT2 cassette had been removed. The integration vector generated was designated pAAVS1-T3. The endogenous AAVS1 promoter, as previously described for GalNAc-T2, drove expression of integrated pAAVS1-T3. Sample Processing and Lectin Weak Affinity Chromatography Secretomes: Cell culture supernatant was cleared for cell debris, 2X diluted using LACA+H buffer and loaded onto a 0.8 mL VVA (Viscious Villosa) lectin (Vectorlabs) column twice, followed by 20X CV wash with the same buffer. Finally, bound glycoproteins were eluted by heating the lectin slurry in 0.02% RapiGest (Waters). Total cell lysates: Packed cells were lysed in 0.1% RapiGest (Waters) in 50 mM ammonium bicarbonate using a Sonic Dismembrator (Fischer Scientific) and the solutions were cleared by centrifugation (1,000g for 10 min). The cleared lysate and secretome samples were heated for 10 min at 80C, followed by reduction (5 mM DTT, 60C, 0.5 h), alkylation (10 mM iodoacetamide, RT, 30 min), and digestion with Trypsin (25 µg/sample) (Roche) (37C, ON). Digests were treated with concentrated TFA (6 μl, 37C, 20 min) and cleared by centrifugation. Quantitative O-glycoproteomic Strategy The glycopeptide quantification based on M/L isotope labeled doublet ratios was evaluated to estimate a meaningful cut-off ratio for substantial changes. First, we labeled equal amounts of a HeLa cell digest (Thermo Scientific) with L or M isotopes, mixed these in a ratio 1:1, and analyzed them by nLC-MS/MS. Assuming a normal (Gaussian) distribution we determined that 99% of all quantified doublet M/L ratios were within +/-1 (Log10 scale) matching a normal (Gaussian) distribution (Fig. 1). Next we labeled and mixed 1:1 digests from HepG2SC and HepG2SCT2, and subjected these to LWAC separation. Comparing the distribution of labeled peptides from the LWAC flow-through with the distribution of labeled glycopeptides from the LWAC elution fractions showed that the quantitated peptide M/L ratios were normally distributed with 99% falling within +/-1 (Log10), whereas the glycopeptide M/L ratios followed a more asymmetric distribution with left and right “shoulders”, indicating a higher abundance of “light” or “medium” labeled glycopeptides. Importantly, the labeled glycopeptides produced doublets with varying ratios of the isotopic ions as well as a 2 significant number of single precursor ions without evidence of ion pairs. These arose either from light or medium labeled samples and could obviously not be quantitated. We therefore classified identified labeled precursor ions as doublets or singlets, where doublets refer to glycopeptides where the precursor peak area for both light and medium labeled peptide and the relative abundances could be calculated, while singlets refer to unique glycopeptides where only one of the light or medium peptide precursors were identified. In the case of singlets these cannot be automatically recognized as representing differential expression by standard proteomics software (Proteome Discoverer), so we chose to consider singlets manually and assigned them as such. When analysing differential O-glycoproteomes of isogenic cells with and without a single GalNAc-T isoform, quantitative differences in doublets would suggest that loss or gain of a GalNAc-T isoform resulted in partial changes in glycosylation capacity, while occurrence of singlets would suggest complete loss of glycosylation capacity. With respect to singlets it is obviously clear that these could also be derived from experimental (or technical) errors including sample variations and ions hidden by chemical noise, but also by true biological variations such as a change in the proteome with loss or gain of expression of particular proteins their processed products. Nevertheless, it is reasonable to consider singlets as candidates for isoform-specific O-glycosylation events Isoelectric focusing LWAC fractions from total cell lysate digests were screened by preliminary LC-MS for glycopeptide content, and fractions most enriched in glycopeptides were pooled together, dried by vacuum centrifugation, reconstituted in IPG rehydration buffer, and submitted to IEF fractionation [1]. Isoelectric focusing was performed by a 3100 OFFGEL fractionator (Agilent) using pH 3-10 strips (GE Healthcare) 12 fractions were collected and desalted by custom Stage Tips (C18 sorbent from Empore 3 M) and submitted to LC-MS and HCD/ETDMS/MS as described below. Mass spectrometry EASY-nLC 1000 UHPLC (Thermo Scientific) interfaced via nanoSpray Flex ion source to an LTQ-Orbitrap Velos Pro spectrometer (Thermo Scientific) was used for glycopeptide analysis. A precursor MS1 scan (m/z 350–1,700) of intact peptides was acquired in the Orbitrap at a nominal resolution setting of 30,000, followed by Orbitrap HCD-MS2 and ETDMS2 (m/z of 100–2,000) of the five most abundant multiply charged precursors in the MS1 spectrum; a minimum MS1 signal threshold of 50,000 was used for triggering data-dependent fragmentation events; MS2 spectra were acquired at a resolution of 7,500 for HCD MS2 and 15,000 for ETD MS2. Activation times were 30 and 200 ms for HCD and ETD fragmentation, respectively; isolation width was 4 mass units, and usually 1 microscan was 3 collected for each spectrum. Automatic gain control targets were 1,000,000 ions for Orbitrap MS1 and 100,000 for MS2 scans, and the automatic gain control for fluoranthene ion used for ETD was 300,000. Supplemental activation (20 %) of the charge-reduced species was used in the ETD analysis to improve fragmentation. Dynamic exclusion for 60 s was used to prevent repeated analysis of the same components. Polysiloxane ions at m/z 445.12003 were used as a lock mass in all runs. Data analysis Data processing was performed using Proteome Discoverer 1.4 software (Thermo Scientific) as previously described with small changes [2]. Due to the high speed of data processing we used Sequest HT mode instead of Sequest. All spectra were initially searched with full cleavage specificity, filtered according to the confidence level (medium, low and unassigned) and further searched with the semi-specific enzymatic cleavage. In all cases the precursor mass tolerance was set to 6 ppm and fragment ion mass tolerance to 50 mmu. Carbamidomethylation on cysteine residues was used as a fixed modification. Methionine oxidation and HexNAc attachment to serine, threonine and tyrosine were used as variable modifications for ETD MS2. All HCD MS2 were pre-processed as described [2] and searched under the same conditions mentioned above using only methionine oxidation as variable modification. All spectra were searched against a concatenated forward/reverse humanspecific database (UniProt, January 2013, containing 20,232 canonical entries and another 251 common contaminants) using a target false discovery rate (FDR) of 1 %. FDR was calculated using target decoy PSM validator node, a part of the Proteome Discoverer workflow. The resulting list was filtered to include only peptides with glycosylation as a modification. This resulted in a final glycoprotein list identified by at least one unique glycopeptide. ETD MS2 data were used for unambiguous site assignment. HCD MS2 data were used for unambiguous site assignment only if the number of GalNAc residues was equal to the number of potential sites on the peptide. Glycopeptide M/L ratios were determined using dimethyl 2plex method. The M/L ratio of the detected precursor ions doublet was calculated using sequentially the Event Detector Node and the Precursor Ion Node of the Proteome Discoverer Workflow. The Event Detector node was used for peak area quantification clustering isotopes of precursor ions that elute during the same retention time. Isotopically labeled ions were finally quantified using Precursor Ions Quantifier Node. Manual validation of the precursor ions without reported quantitative values (singlets) within a selected pool of data has revealed that to be able to distinguish the substantial up/down regulations within the chemical noise the minimum required threshold level has to be at least 5E5. We have inspected all of such cases manually by calculating XIC of the assigned 4 precursor ions and the respective potential dimethyl pair in raw data and reported them as outliers (“singlets”). RNA transcriptomic analysis The analysis was performed on total RNA from two clones each of HepG2WTT1, T2, T3, and two pools of HepG2WT cells. The Beijing Genetics Institute (BGI) performed the RNA transcriptomic analysis (RNAseq). Briefly, library was constructed using Illumina Truseq RNA Sample Preparation Kit. High quality total RNA was DNase I treated, mRNA isolated by magnetic bead Oligo (dT) enrichment, followed by mRNA fragmentation, cDNA synthesis, single nucleotide A (adenine) addition, and adapter ligation. Agarose gel purified fragments (~ 160 bp) were subjected to PCR amplification and quality control (QC) before and subjected to next generation sequencing using Illumina HiSeq 2000 System (Illumina, USA). Raw reads were QC’ed and filtered clean reads were aligned to human reference sequences with SOAPaligner/SOAP2 [3]. Total reads per sample were around 60 million with an average of 75% total mapped reads and an averaged 50% perfect match reads. Gene coverage ranged from 90-100% for 60% and 0-10% for 8%. Bioinformatic analysis Raw read counts from the RNAseq analysis were analysed using the DESeq [4] and EdgeR [5] packages for R and Bioconductor. Biological coefficient of variations were derived (WT 0.03, T1KO - 0.24, T2KO - 0.29 and T3KI - 0.35). DESeq and EdgeR analyses were run using default parameters, and following previously defined methods [6]. For the EdgeR analysis, counts were filtered so that there were at least 2 samples where there were more than 1 count per million. DESeq cutoffs were set at an adjusted p-value of 0.1, while EdgeR cutoffs were set at a p-value of 0.05 as per the referenced method. A further filtering step was applied, where the difference of the log2 RPKM between samples was required to be between -1 and 1 (i.e. only transcripts where there existed a less than a 2-fold difference in RPKM between the samples), and any transcripts that did not pass this filtering were rejected. Finally, the Entrez gene identifiers that were present in both the EdgeR and DESeq differential lists were selected to be the final list of differentially expressed genes. GO enrichment was performed using the GO and GOStats [7] packages available in Bioconductor. For initial enrichments, conditional testing was used to ensure that the most specific terms were enriched for where possible. Both the GO database, as well as the Uniprot GO annotation was used as sources of GO annotations. UniProt identifiers were first mapped to Entrez ids, and GO annotation was combined with the GO.db annotations. A hypergeometric test was applied to this combined annotation to calculate GO enrichment for sets of 5 genes and proteins. Additional references 1. Vakhrushev SY, Steentoft C, Vester-Christensen MB, Bennett EP, Clausen H, Levery SB (2013) - Enhanced mass spectrometric mapping of the human GalNAc-type O-glycoproteome with SimpleCells. Mol Cell Proteomics 12: 932-44 2. Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT, Lavrsen K, Dabelsteen S, Pedersen NB, Marcos-Silva L, et al. (2013) Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. The EMBO journal 32: 1478-88 3. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966-7 4. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome biology 11: R106 5. McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research 40: 4288-97 6. Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, Robinson MD (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols 8: 1765-86 7. Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23: 257-8