Genome Sequencer FLX+ Whole Genome Sequencing For Life Science Research use only. Not for diagnostic purposes Project Information Client C1383 - SV912148 Patrick Delahunty Medicinal Genomics Sample Information SID Sample Sample Type 18052 1 (rec'd 27JUNE11) Genomic DNA Quote SW0001MEDG0511 Deliverable 1. 454 will prepare one library for the customer supplied plant sample. 2. 454 will sequence this library in 12 runs using the GS FLX+ chemistry. Typical yield is 900,000 to 1.3 million reads per run, however actual yield may vary. PLEASE NOTE: Customer would like 454 to provide run QC information demonstrating performace of the library. 3. 454 will assemble the resulting data set using the latest released version of the GS Assembler software. PLEASE NOTE: Customer would like to receive initial assembly results following the completion of 6 runs in order to assess the need for the 6 additional sequencing runs. 4. 454 will provide all quality filtered reads and associated quality scores in FASTA format along with assembly results. Delivery Date August 01, 2011 For Life Science Research use only. Not for diagnostic purposes Sequencing Results Run Name R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052 D_2011_07_13_10_42_02_frontend2_fullProcessing (2.6) Run Statistics Region SID 1 18052 2 Sample 1 (rec'd 27JUNE11) 1 (rec'd 18052 27JUNE11) HQ Reads HQ Bases Avg Read Length Mode Read %Mixed Length %Dots 700,573 448,334,655 640 748 9.41% 3.75% 712,483 448,772,334 630 732 9.96% 3.65% Table 1.0 - Summary of run metrics %Mixed displays the percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules. %Dots shows the percentage of reads filtered by the dots filter. A dot is an instance of 3 successive nucleotide flows that record no incorporation. Read Length Distribution Figure 1.0 - Read length distribution of high quality reads. R1 = 1 (rec'd 27JUNE11), R2 = 1 (rec'd 27JUNE11). For Life Science Research use only. Not for diagnostic purposes Run Name R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052 D_2011_07_26_13_34_02_frontend2_fullProcessing (2.6) Run Statistics Regio n 1 2 SID 1805 2 1805 2 Sample HQ Reads HQ Bases 1 (rec'd 605,878 355,813,952 27JUNE11) 1 (rec'd 335,963,27 552,550 27JUNE11) 3 Avg Mode Read Read Length Length %Mixed %Dots 587 718 12.95% 3.68% 608 750 14.69% 5.48% Table 2.0 - Summary of run metrics %Mixed displays the percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules. %Dots shows the percentage of reads filtered by the dots filter. A dot is an instance of 3 successive nucleotide flows that record no incorporation. Read Length Distribution Figure 2.0 - Read length distribution of high quality reads. R1 = 1 (rec'd 27JUNE11), R2 = 1 (rec'd 27JUNE11). For Life Science Research use only. Not for diagnostic purposes Run Name R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052 D_2011_07_26_13_30_02_frontend2_fullProcessing (2.6) Run Statistics Region SID 1 18052 2 Sample HQ Reads HQ Bases 1 (rec'd 631,895 376,001,214 27JUNE11) 1 (rec'd 18052 564,899 328,045,722 27JUNE11) Avg Mode Read Read Length Length %Mixed %Dots 3.78% 595 749 9.67% 581 757 12.41% 3.97% Table 3.0 - Summary of run metrics %Mixed displays the percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules. %Dots shows the percentage of reads filtered by the dots filter. A dot is an instance of 3 successive nucleotide flows that record no incorporation. Read Length Distribution Figure 3.0 - Read length distribution of high quality reads. R1 = 1 (rec'd 27JUNE11), R2 = 1 (rec'd 27JUNE11). For Life Science Research use only. Not for diagnostic purposes Run Name R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052 D_2011_07_28_12_32_02_frontend2_fullProcessing (2.6) Run Statistics Region SID 1 18052 2 Sample HQ Reads HQ Bases 1 (rec'd 588,357 335,946,095 27JUNE11) 1 (rec'd 18052 596,933 340,072,586 27JUNE11) Avg Mode Read Read Length Length %Mixed %Dots 571 688 14.57% 3.75% 570 710 13.26% 4.01% Table 4.0 - Summary of run metrics %Mixed displays the percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules. %Dots shows the percentage of reads filtered by the dots filter. A dot is an instance of 3 successive nucleotide flows that record no incorporation. Read Length Distribution Figure 4.0 - Read length distribution of high quality reads. R1 = 1 (rec'd 27JUNE11), R2 = 1 (rec'd 27JUNE11). For Life Science Research use only. Not for diagnostic purposes Data Package The following data files are provided as the delivered product and are available for download at the Roche sFTP site. Please use sftp to obtain the files using a Unix/Linux server. Be sure to type 'bin' to specify binary mode before you 'get' the file. (Alternatively, you can use a browser from Windows or Mac) User Name : X Password : X Server :X The files are compressed and packaged in .tgz format in Linux, and uploaded to the sFTP server. After downloading (make sure binary file format is specified during download), the .tgz file can be unpacked on a Linux/unix machine by using the command: tar zxvf samplefile.tgz The archive can also be unpacked by WinZip or WinRar programs on the Windows platform. However, it is recommended that the customer use a Linux/Unix based operating system to manipulate the files since the volume of data provided is usually much larger than can be comfortably opened and viewed using Microsoft Word or Excel. File contents: (files shown are after unpacking with tar) R20110712sc20_sffread.tgz R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/ R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/ R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/sff/ R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/sff/G5UGEGY02.sff R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/sff/G5UGEGY01.sff R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/reads/ R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/reads/1.454Reads.fna R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/reads/2.454Reads.fna R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/reads/2.454Reads.qual For Life Science Research use only. Not for diagnostic purposes R_2011_07_12_13_43_18_sc20_kontoudp_400_7075_93839220_SV912148_18052/D_2011_07_13_10_42_02_frontend2_f ullProcessing/reads/1.454Reads.qual R20110725sc17_sffread.tgz R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/ R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/ R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/sff/ R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/sff/G6IQQIE02.sff R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/sff/G6IQQIE01.sff R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/reads/ R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/reads/1.454Reads.fna R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/reads/2.454Reads.fna R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/reads/2.454Reads.qual R_2011_07_25_16_28_56_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_34_02_frontend2_fullP rocessing/reads/1.454Reads.qual R20110725sc21_sffread.tgz R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/ R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/ R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/sff/ R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/sff/G6IEU6N02.sff R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/sff/G6IEU6N01.sff R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/reads/ R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/reads/1.454Reads.fna R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/reads/2.454Reads.fna R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/reads/2.454Reads.qual R_2011_07_25_12_12_32_sc21_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_26_13_30_02_frontend2_fullP rocessing/reads/1.454Reads.qual R20110727sc17_sffread.tgz R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/ R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/ R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/sff/ R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/sff/G6MC6LA02.sff For Life Science Research use only. Not for diagnostic purposes R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/sff/G6MC6LA01.sff R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/reads/ R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/reads/1.454Reads.fna R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/reads/2.454Reads.fna R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/reads/2.454Reads.qual R_2011_07_27_15_26_35_sc17_bonvinl1_400_7075_93821020_912148_18052/D_2011_07_28_12_32_02_frontend2_fullP rocessing/reads/1.454Reads.qual For Life Science Research use only. Not for diagnostic purposes The following is a description of all the files listed above File Name Description region.454Reads.fna FASTA file of the individual sequence reads. region.454Reads.qual Corresponding quality score values for each base in the sequence reads. sfffilename.sff Sequence Flowgram Format (SFF) file that represent all quality filtered sequences. It contains information on 454 flowgram signals, basecalls and quality scores and can be used to compile a package suitable for submission to the NCBI trace archive. A description of the SFF format can be found at NCBI at http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=formats For Life Science Research use only. Not for diagnostic purposes