Supplementary data WGS500 AUTHORSHIP Steering Committee

advertisement

Supplementary data

WGS500 AUTHORSHIP

Steering Committee

Peter Donnelly (Chair) 1 , John Bell 2 , David Bentley 3 , Gil McVean 1 , Peter Ratcliffe 1 , Jenny

Taylor

1,4

, Andrew Wilkie

4, 5

Operations Committee

Peter Donnelly

1

(Chair) John Broxholme

1

, David Buck

1

, Jean-Baptiste Cazier

1

, Richard Cornall

1

,

Lorna Gregory

1

, Julian Knight

1

, Gerton Lunter

1

, Gilean McVean

1

, Jenny Taylor

1,4

, Ian

Tomlinson

1, 4

, Andrew Wilkie

4, 5

Sequencing & Experimental Follow up

David Buck

1

(Lead) Christopher Allan

1

, Moustafa Attar

1

, Angie Green

1

, Lorna Gregory

1

, Sean

Humphray 3, Zoya Kingsbury 3 , Sarah Lamble 1 , Lorne Lonie 1 , Alistair Pagnamenta 1 , Paolo Piazza 1 ,

Guadelupe Polanco

1

, Amy Trebes

1

Data Analysis

Gil McVean

1

(Lead), Peter Donnelly

1

, Jean-Baptiste Cazier

1

, John Broxholme

1

, Richard Copley

1

,

Simon Fiddy

1

, Russell Grocock

3

, Edouard Hatton

1

, Chris Holmes

1

, Linda Hughes

1

, Peter

Humburg

1

, Alexander Kanapin

1

, Stefano Lise

1

, Gerton Lunter

1

, Hilary Martin

1

, Lisa Murray

3

,

Davis McCarthy 1 , Andy Rimmer 1 , Natasha Sahgal 1 , Ben Wright 1 , Chris Yau 6

1

The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK.

2

Office of the Regius Professor of Medicine, Richard Doll Building, Roosevelt Drive, Oxford,

OX3 7LF, UK

3 Illumina Cambridge Ltd., Chesterford Research Park, Little Chesterford, Essex, CB10 1XL, UK

4

NIHR Oxford Biomedical Research Centre, Oxford, UK.

5

Weatherall Inst of Molecular Medicine, University of Oxford; John Radcliffe Hospital

Headington, Oxford OX3 9DS, UK

6

Imperial College London, South Kensington Campus, London, SW7 2AZ. UK

1

Supplementary Methods

Whole genome sequencing and analysis in detail

2ug of DNA were fragmented using a Covaris S2 system with the following settings: Duty

Cycle= 10% Intensity= 5% Cycles/bust= 200. Distribution of fragments after shearing was determined using Tapestation 1DK system (Agilent/Lab901). Libraries were constructed using the NEBNext DNA Sample Prep Master Mix Set 1 Kit (NEB) with minor modifications.

Ligation of adapters was performed using 6µl of Illumina Adapters (Multiplexing Sample

Preparation Oliogonucleotide Kit). Ligated libraries were size selected using 2% E-Gel® EX

(Invitrogen) and the distribution of fragments in the purified fraction wasdetermined using

Tapestation 1DK system (Agilent/Lab901). Each library was PCR enriched with 25 µM each of the following custom primers:

Multiplex PCR primer 1.0

5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCT-3'

Index primer

5'-CAAGCAGAAGACGGCATACGAGAT[INDEX]CAGTGACTGGAGTTCAGACGT

GTGCTCTTCCGATCT-3'

Indexes were 8bp long and part of an indexing system developed in-house. Four independent

PCR reactions per sample were prepared using 25% volume of the pre-PCR library each. After 8 cycles of PCR (cycling conditions as per Illumina recommendations) the four reactions were pooled and purified with AmpureXp beads. The final size distribution was determined using a

Tapestation 1DK system (Agilent/Lab901). The concentration of each library was determined by realtime using Agilent qPCR Library Quantification Kit and a MX3005P instrument (Agilent).

Sequencing was performed on a HiSeq2000 as 100 paired end. The library was run on the

HiSeq2000, using version 3 sequencing chemistry. A PhiX control was spiked into the library.

We ran 2 lanes of the original library. Then, to "top up" to the required coverage, we ran the library in a multiplex of 4.

WGS reads were mapped to the human reference genome (GRCh37d5/hg19) using STAMPY

1 and duplicate reads removed using Picard. After duplicate reads removal, the mean coverage across the genome was 25.6x with 90.4% of bases covered at 15x or more. The mean coverage over the 17.1 Mb ROH identified by SNP analysis was 25.9x with 93.4% of bases covered at 15x or more. Coverage was calculated with custom scripts and the BEDTOOLS package

2

.

Identification of variant sites and alleles was performed with in-house software Platypus

3

, which can detect SNPs and short (<50bp) indels.

Exome sequencing and analysis in detail for Saudi Arabian kinship

5µg of DNA was fragmented using a Covaris S2 system with the following settings: Duty

Cycle= 20% Intensity= 4% Cycles/burst= 200 time 118s. Distribution of fragments after shearing was determined using 2100 Bioanalyser (Agilent). Libraries were constructed using the TruSeq

Exome enrichment capture technology (Illumina) as per the manufacturer’s protocol in a

2

technical service provided by Eurofins MWG Operon (www.eurofinsdna.com). Sequencing was performed on the Illumina HiSeq2000 platform as 2x100bp paired end reads. Raw sequencing reads were aligned to the consensus genome (hg19), sorted and converted to a BAM file using

Mosaik (version 1.1.21; http://bioinformatics.bc.edu/marthlab/Mosaik ). The BAM file was indexed and variants called using samtools (version 0.1.16; Li H.*, Handsaker B.*, Wysoker A.,

Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data

Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools.

Bioinformatics, 25, 2078-9.). The alignments were optimised for indel calling and indels called using dindel (version 1.0.12; CA Albers, G Lunter, Daniel G MacArthur, Gilean McVean,

Willem H Ouwehand, Richard Durbin. Dindel: Accurate indel calls from short-read data.

Genome Research 2010). The resulting list of variants were visualised and assessed using the

UCSC Genome Browser (Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM,

Haussler D. The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006.)

Putative disease causing variants were verified by PCR amplification and Sanger sequencing.

Genomic DNA was extracted from blood samples by automated DNA extraction on the M48

BioRobot using the MagAttract DNA blood Mini M48 kit (Qiagen 951336) as part of the routine service performed by the Northern Region Genetics service molecular laboratory, and amplified by the Moltaq PCR kit (Molzym P-010-1000). Primer sequences are available on request. All sequencing was performed using bi-directional fluorescent sequencing on an ABI 3730 XL 96 capillary sequencer, with BigDye Version 3.1 chemistry.

1.

Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of

Illumina sequence reads. Genome research 2011; 21:936-939.

2.

Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26: 6: 841 –842.

3.

Rimer A, Mathieson I, McVean G, Lunter G. Platypus program: Integrated Variant Caller. The

Wellcome Trust Centre for Human Genetics, University of Oxford. 2012. http://www.well.ox.ac.uk/platypus/

3

Table S1 Analysis of whole exome sequence data for Cases 1&2 (ALG14 mutation)

Sibling 1 Sibling 2 Shared between two siblings

1511

1178

1548

1192

Shared between two siblings but absent from other in-house exomes

All variants

Synonymous substitutions

Non-synonymous substitutions and mutations in

3’UTR, 5’UTR and splicing mutations

Homozygous mutations

Genes with two or more mutations

333

70

36

356

87

38

52

2

0

1-Alg14

4

Table S2 Analysis of whole exome sequence data for Case 5 (ALG2 mutation, c.283_296delGGGGACTGGCTGCinsAGTCCCCGGC

) candidate interval

Total SNVs

Exomic SNVs

Genomic indels

Exomic indels

Total exonic variants

Homozygous

Not in dbSNP

Homozygous and not in dbSNP

Segregating with disease phenotype

18

18

0

(9p31.1) whole exome

72 59068

18

12

0

13676

6844

203

13879

939

354

0

0

12

0

SNV; single nucleotide variations

5

Table S3. Analysis of whole genome sequencing data for Case 7 (ALG2 mutation, c.203C>G ) – Homozygous variants

Stopgain

Non-synonymous

Splicing

2

14

3

Insertion (non-frameshift) 7

Insertion (frameshift) 5

Deletion (non-frameshift) 6

Deletion (frameshift)

Other exonic (ncRNA, synonymous)

3'UTR

5'UTR

1

44

113

31

0

3

0

0

0

0

0

0

0

0

TOTAL 226 3

Removal of variants where sequences were misaligned due to segment duplication, non-

Fhomozygous variants, and variants located on X chromosome

6

Figure S1

II III i ii iii i ii iii

Segregation of the ALG2 c.203T>G variant within the pedigree. Restriction digest analysis of

DNA from family members. For numbering of family members see Figure 5A. ALG2 Exon 1 was amplified from genomic DNA and digested with Age I, and products separated on an agarose gel. c.203T>G results in loss of the Age I site. IIii = index case.

7

Figure S2

Reads from whole genome sequencing showing the homozygous variant ALG2 c.203C>G

8

Download