Class_4_RNAseq-final - Genome Projects at University of

advertisement
4
E S S E N T I A L S O F N E X T G E N E R A T I O N
S E Q U E N C I N G W O R K S H O P 2 0 1 5
U N I V E R S I T Y O F K E N T U C K Y A G T C
Class
RNAseq
Goal: Learn how to use various tool to extract information from RNAseq reads.
Input(s):
magnaporthe_oryzae_70-15_8_supercontigs.fasta
Moryzae_70-15_*_RNA_sample_{1-2}.fastq
magnaporthe_oryzae-70-15_8_transcripts.gtf
Output(s):
70-15_RNA_sample_{1-3}_thout
70-15_RNA_sample_{1-3}_clout
merged.gtf file
gene_exp.diff file
directory
directory
4.1 Mapping RNAseq Reads to a Genome Assembly
We will use TopHat2 to align RNAseq reads to a genome assembly of the fungal strain from which they
were derived (strain 70-15).
Trapnell et al. (2009) TopHat: discovering splice junctions with RNAseq. Bioinformatics 25:11051111. http://tophat.cbcb.umd.edu/
TopHat2 uses the Bowtie2 alignment engine to map RNA seq reads to the genome assembly.
Bowtie2 utilizes an indexed transformation of the genome assembly to perform its alignment, so the
first step is to create the relevant indexes.
Usage: bowtie2-build [options] -f <reference_genome> <index_prefix>
Where <reference genome> is the path to the genome multifasta file and <index_prefix> is the name
to be given to the index.
 Change to the rnaseq directory. Remember, there is no need to leave this directory. All
operations, such as listing of subdirectories, etc. can be performed from this location.
Essentials of Next Generation Sequencing 2015
 Generate the bowtie index:

bowtie2-build –f magnaporthe_oryzae_70-15_8_supercontigs.fasta \
Moryzae
specifies the name of a multifasta file, or a directory containing multiple fasta files
-f
 Create a new directory called index and place the resulting index files inside it (note: the relevant
files will have a .bt2 suffix).
 Use Tophat2 to map each set of RNAseq reads to the bowtie index:
Usage: tophat2 [options] –o <output_dir> <path-to-indexes> <input-file(s)>

tophat2 -p 1 -o 70-15_mycelial_RNA_sample_1_thout index/Moryzae \
Moryzae_70-15_mycelial_RNA_sample_1.fastq
-p
number of processors to use (select 1). Note: you only have one available to you
for this exercise but normally you would run as many as are available.
-o
name of output directory
TopHat2 invoked with the above command will produce an output folder
(70-15_mycelial_RNA_sample1_thout) containing several files and a subdirectory containing log files:
accepted_hits.bam:
contains alignment information for all of the reads that were successfully
mapped to the genome.
align_summary.txt
provides an overall summary of alignment statistics
left_kept_reads_info: minimum read length, maximum read length; total reads; successfully
mapped read.
insertions.bed:
lists nucleotide insertions in the input sequences
deletions.bed:
lists nucleotide deletions in the input sequences
junctions.bed:
lists splice junctions
logs:
records summary data from intermediate steps
prep_reads.info
provides information on filtering of reads
unmapped.bam
contains .bam entries for unmapped reads
 Use a command line function to take a look at the results in the accepted_hits.bam file.
Hint: to view the file, you will either need to change into the output directory created by
TopHat, or specific the complete path to the file you wish to view.
Essentials of Next Generation Sequencing 2015
Page 2 of 16
 Does the output make any sense? No? Let’s quit the Unix function (^c) and use samtools to
convert the .bam file into the human-readable .sam format:

samtools view 70-15_mycelial_RNA_sample_1_thout/accepted_hits.bam
 Whoa! Did you catch all that? Quit the process (^c) and try piping the results through the more
command line function.
 Next use re-direction to write the output to a file.
 Repeat the mapping process for the remaining sequence files (remember that you need to be in
the rnaseq directory):
Moryzae_70-15_mycelial_RNA_sample_2.fastq
Moryzae_70-15_spore_RNA_sample_1.fastq
Hint: you can use the up arrow key to “copy” the previous command to the current
command line buffer. However, you must remember to change the input and output
names to prevent overwriting of previous results.
4.2 Assembling Transcripts From RNAseq Data
We will use cufflinks to build transcripts from RNAseq reads and compare expression profiles between
different RNA samples:
Trapnell et al. (2010) Transcript assembly and quantification by RNA-Seq reveals
unannotated transcripts and isoform switching during cell differentiation. Nature
Biotechnology 28:511-515. http://cufflinks.cbcb.umd.edu/
The first step in differential gene expression analysis is to identify the gene from which each sequence read
is derived. Cufflinks examines the raw RNAseq mapping results and attempts to reconstruct complete
transcripts and identify transcript isoforms based on overlapping alignments.
Usage: cufflinks [options] –o <output_dir> <path/to/accepted_hits.bam>
 Make sure you are in the rnaseq directory.
 Run cufflinks, providing a reference transcriptome in the form of a .gtf file. All one line:

cufflinks –p 1 –g magnaporthe_oryzae_70-15_8_transcripts.gtf \
–o 70-15_mycelial_RNA_sample_1_clout \
70-15_mycelial_RNA_sample_1_thout/accepted_hits.bam
-o
name of output directory
–p
number of processors to use
-g/--gtf-guide
tells cufflinks to use the provided reference annotation to guide transcript
assembly but also to report novel transcripts/isoforms
Essentials of Next Generation Sequencing 2015
Page 3 of 16
Notes:
A) Omitting the –g option (and accompanying .gtf file specification) from the above command would
tell the program to generate a de novo transcript assembly. Alternatively, one can use -G/--GTF which
will tell the program to assemble only those reads that correspond to previously identified
genes/transcripts.
B) The developers recommend that you assemble your replicates individually, i) to speed computation;
and ii) to simplify junction identification. Therefore, you will need to run cufflinks separately for each
of your .bam files. With the above example, the results will be saved in a directory named “7015_mycelial_RNA_sample_1_clout”
 Re-run cufflinks using each of your accepted_hits.bam files, remembering to change
“sample_1” in both the input and output folder names.
 Examine one of the .gtf files produced by cufflinks. See if you can determine what information
is contained in the various columns.
4.3 Merging Transcript Assemblies
We will use cuffmerge to generate a “super-assembly” of transcripts based on the mapping
information from all three RNAseq datasets. Cuffmerge identifies overlaps between alignment data
for different RNAseq datasets. In this way, it can assemble complete transcripts for genes whose
expression levels are too low to allow full transcript reconstruction from a single sequencing lane.
Usage: cuffmerge [options] <list_of_gtf_files>
 Make sure you are in the rnaseq directory
 Open a text editor and create a list of the .gtf files that will be incorporated into the superassembly. The list should have the following format:
./70-15_mycelial_RNA_sample_1_clout/transcripts.gtf
./70-15_mycelial_RNA_sample_2_clout/transcripts.gtf
./70-15_spore_RNA_sample_1_clout/transcripts.gtf …etc.
Here is another example of where a file created by a standard text editor such as Word will not be read
properly by the cuffmerge program and will produce an error.
 Include the .gtf files for the three datasets and save the file using the name assemblies.txt.
 Run cuffmerge (changing filenames as necessary):

cuffmerge –p 1 –s magnaporthe_oryzae_70-15_8_supercontigs.fasta \
-g magnaporthe_oryzae-70-15_8_transcripts.gtf assemblies.txt
-s
points to the genome sequence which is used in the classification of transfrags that do not
correspond to known genes
-p
number of processors to use
-g
include the reference annotation in the merging operation
Essentials of Next Generation Sequencing 2015
Page 4 of 16
 Examine
the merged.gtf file produced by cuffmerge inside of merged_asm. Use
command line tools to interrogate the file to identify novel transcripts that have not been
previously identified. Note: these will lack MGG identifiers.
4.4 Differential Gene Expression Analysis
We will use cuffdiff to determine if any genes are differentially expressed in one of the RNAseq datasets.
To compare gene expression levels, it is necessary to have a set of genes that one wants to interrogate.
For our purposes, we will have cuffdiff use the merged.gtf file produced by cuffmerge, which combines
existing gene annotations (if available) with new information (novel transcripts, isoforms, etc.) generated
from the RNAseq data. It then uses the alignment data (in the .bam files) to calculate and compare
abundances.
Usage: cuffdiff [options] <transcripts.gtf>
<sample1.replicate1.bam,sample1.replicate2.bam…>
<sample2.replicate1.bam,sample2.replicate2.bam…>
Note: experimental replicates are separated with commas; datasets being compared are separated by a
space (i.e.: Set1_rep1,Set1_rep2 Set2_rep1,Set2_rep2)
 For our experiment, we will compare transcript abundance in spores versus two replicates of
mycelium
 Run cuffdiff as follows:

cuffdiff -o diff_out –p 1 –L mycelium,spores \
–u merged_asm/merged.gtf \
./70-15_mycelial_RNA_sample_1_thout/accepted_hits.bam,\
./70-15_mycelial_RNA_sample_2_thout/accepted_hits.bam \
./70-15_spore_RNA_sample_1_thout/accepted_hits.bam
-o
output directory where results will be deposited
-p
number of processors to use
-L
Labels to use for the three samples being compared. These labels will appear at the top of the
relevant columns in the various output files.
-u
Tells cufflinks to do an initial estimation procedure to more accurately weight reads mapping to
multiple locations in the genome
Be sure not to put spaces around the comma!
By default cuffdiff writes results to a file named gene_exp.diff, inside of your defined output folder.
 The gene expression differences are written to the file named gene_exp.diff. View the header of
this file and see if you can determine what information is contained in the various columns. If
necessary, look at the description of the output columns in the following Appendix, or look at the
online cuffdiff manual (cufflinks.cbcb.umd.edu/manual.html)
Essentials of Next Generation Sequencing 2015
 Use the command line to produce a list that contains the identities of the genes that show
significant differences in their expression levels (only the names of the genes and nothing
else). Write this list to a file.
Hint: You will need to use awk.
 Examine the junctions.bed file to determine if the RNAseq data support the existence of novel
transcript isoforms (as evidenced by the presence of novel splice junctions).
 Use the command line to determine how many novel junctions are robust (supported by at least
10 sequencing reads).
Essentials of Next Generation Sequencing 2015
Page 6 of 16
APPENDIX
How to interpret a number of useful output files
1. The Sequence Alignment/MAP (SAM) format (Tophat2/bowtie2)
This is the default output format for any NGS alignment program. It, or it’s .bam equivalent, serves
as input to many tertiary analysis programs. .bam files are simply binary versions of .sam files.
a. Mandatory fields in the SAM format
No.
Name
Description
1
QNAME
Query NAME of the read or the read pair
2
FLAG
Bitwise FLAG (pairing, strand, mate strand, etc.)
3
RNAME
Reference sequence NAME
4
POS
1-Based leftmost POSition of clipped alignment
5
MAPQ
MAPping Quality (Phred-scaled)
6
CIGAR
Extended CIGAR string (operations: MIDNSHP)
7
MRNM
Mate Reference NaMe (‘=’ if same as RNAME)
8
MPOS
1-Based leftmost Mate POSition
9
ISIZE
Inferred Insert SIZE
10
SEQ
Query SEQuence on the same strand as the reference
11
QUAL
Query QUALity (ASCII-33=Phred base quality)
1. QNAME and FLAG are required for all alignments. If the mapping position of the query is not
available, RNAME and CIGAR are set as “*”, and POS and MAPQ as 0. If the query is
unpaired or pairing information is not available, MRNM equals “*”, and MPOS and ISIZE equal
0. SEQ and QUAL can both be absent, represented as a star “*”. If QUAL is not a star, it must
be of the same length as SEQ.
2. The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear
multiple times in different alignment records, representing multiple or split hits. The maximum
string length is 254.
3. If SQ is present in the header, RNAME and MRNM must appear in an SQ header record.
4. Field MAPQ considers pairing in calculation if the read is paired. Providing MAPQ is
recommended. If such a calculation is difficult, 255 should be applied, indicating the mapping
quality is not available.
5. If the two reads in a pair are mapped to the same reference, ISIZE equals the difference
between the coordinate of the 5ʼ-end of the mate and of the 5ʼ-end of the current read;
otherwise ISIZE equals 0 (by the “5ʼ-end” we mean the 5ʼ-end of the original read, so for
Illumina short-insert paired end reads this calculates the difference in mapping coordinates of
Essentials of Next Generation Sequencing 2015
Page 7 of 16
APPENDIX
the outer edges of the original sequenced fragment). ISIZE is negative if the mate is mapped to a
smaller coordinate than the current read.
6. Color alignments are stored as normal nucleotide alignments with additional tags describing the
raw color sequences, qualities and color-specific properties (see also Note 5 in section 2.2.4).
7. All mapped reads are represented on the forward genomic strand. The bases are reverse
complemented from the unmapped read sequence and the quality scores and cigar strings are
recorded consistently with the bases. This applies to information in the mate tags (R2, Q2, S2,
etc.) and any other tags that are strand sensitive. The strand bits in the flag simply indicates
whether this reverse complement transform was applied from the original read sequence to
obtain the bases listed in the SAM file.
b. SAM File Header Lines - Record Types and Tags
Type
Tag
VN*
SO
Description
File format version.
Sort order. Valid values are: unsorted, queryname or coordinate.
HD - header
Group order (full sorting is not imposed in a group). Valid values are: none,
GO
query or reference.
Sequence name. Unique among all sequence records in the file. The value
SN*
of this field is used in alignment records.
LN*
Sequence length.
Genome assembly identifier. Refers to the reference genome assembly in an
SQ – Sequence AS
unambiguous form. Example: HG18.
dictionary
MD5 checksum of the sequence in the uppercase (gaps and space are
M5
removed)
UR
URI of the sequence
SP
Species.
Unique read group identifier. The value of the ID field is used in the RG
ID*
tags of alignment records.
SM*
Sample (use pool name where a pool is being sequenced)
LB
Library
DS
Description
Platform unit (e.g. lane for Illumina or slide for SOLiD); should be a full,
RG - read group PU
unambiguous identifier
Predicted median insert size (maybe different from the actual median insert
PI
size)
CN
Name of sequencing center producing the read.
DT
Date the run was produced (ISO 8601 date or date/time).
PL
Platform/technology used to produce the read.
ID*
Program name
PG - Program VN
Program version
CL
Command line
CO - comment
One-line text comments
Essentials of Next Generation Sequencing 2015
Page 8 of 16
APPENDIX
c. Interpretation of bitwise flags in .sam/.bam files
Flag
Description
0x0001
the read is paired in sequencing, no matter whether it is mapped in a pair
0x0002
the read is mapped in a proper pair (depends on the protocol, normally inferred
during alignment) 1
0x0004
the query sequence itself is unmapped
0x0008
the mate is unmapped 1
0x0010
strand of the query (0 for forward; 1 for reverse strand)
0x0020
strand of the mate 1
0x0040
the read is the first read in a pair 1,2
0x0080
the read is the second read in a pair 1,2
0x0100
the alignment is not primary (a read having split hits may have multiple primary
alignment records)
0x0200
the read fails platform/vendor quality checks
0x0400
the read is either a PCR duplicate or an optical duplicate
Essentials of Next Generation Sequencing 2015
Page 9 of 16
APPENDIX
d. CIGAR String Operations
The CIGAR string describes the alignment between the sequence read and the reference genome.
Operation
BAM
Description
M
0
alignment match (can be a sequence match or mismatch)
I
1
insertion to the reference
D
2
deletion from the reference
N
3
skipped region from the reference
S
4
soft clipping (clipped sequences present in SEQ)
H
5
hard clipping (clipped sequences NOT present in SEQ)
P
6
padding (silent deletion from padded reference)
=
7
sequence match
X
8
sequence mismatch

H can only be present as the first and/or last operation.

S may only have H operations between them and the ends of the CIGAR string.

For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not defined.

Sum of lengths of the M/I/S/=/X operations shall equal the length of SEQ
Essentials of Next Generation Sequencing 2015
Page 10 of 16
APPENDIX
2. Junctions.bed (Tophat2)
[seqname] [start] [end] [id] [score] [strand] [thickStart] [thickEnd] [r,g,b] [block_count] [block_sizes]
[block_locations]
"start" is the start position of the leftmost read that contains the junction.
"end" is the end position of the rightmost read that contains the junction.
"id" is the junctions id, e.g. JUNC0001
"score" is the number of reads that contain the junction.
"strand" is either + or -.
"thickStart" and "thickEnd" don't seem to have any effect on display for a junctions track. TopHat
sets them as equal to start and end respectively.
"r","g" and "b" are the red, green, and blue values. They affect the color of the display in a browser.
"block_count", "block_sizes" and "block_locations":
The block_count will always be 2. The two blocks specify the regions on either side of the junction.
"block_sizes" tells you how large each region is, and "block_locations" tells you, relative to the
"start" being 0, where the two blocks occur. Therefore, the first block_location will always be zero.
3. Insertions.bed (Tophat2)
[chrom] [chromStart] [chromEnd] [name] [score]:
track name=insertions description="TopHat insertions"
Chromosome_8.1
3314
3314
G
1
Chromosome_8.1
3460
3460
C
1
Chromosome_8.1
3711
3711
C
1
Chromosome_8.1
3760
3760
G
1
Chromosome_8.1
3905
3905
C
1
Chromosome_8.1
4388
4388
A
4
Chromosome_8.1
5143
5143
C
4
Chromosome_8.1
8048
8048
A
1
Chromosome_8.1
8122
8122
A
2
Chromosome_8.1
8125
8125
GA
1
Chromosome_8.1
8769
8769
A
2
Chromosome_8.1
9662
9662
C
1
Chromosome_8.1
9683
9683
T
1
Chromosome_8.1
9692
9692
G
1
Chromosome_8.1
9709
9709
T
1
Chromosome_8.1
9736
9736
GA
37
Chromosome_8.1
9752
9752
T
1
Chromosome_8.1
9756
9756
T
1
Chromosome_8.1
9758
9758
TT
19
Notes: Track name is the name given to the relevant track in a genome browser. ChromStart and
chromEnd indicate the base position where the insertion occurred; name field indicates inserted
base(s); score indicates depth of sequence coverage.
Essentials of Next Generation Sequencing 2015
Page 11 of 16
APPENDIX
4. Deletions.bed (Tophat2)
[chrom] [chromStart] [chromEnd] [name] [score]
track name=deletions description="TopHat
Chromosome_8.1
321
322
Chromosome_8.1
1151
1152
Chromosome_8.1
2205
2206
Chromosome_8.1
2302
2303
Chromosome_8.1
3790
3791
Chromosome_8.1
5261
5262
Chromosome_8.1
9458
9459
Chromosome_8.1
9487
9488
Chromosome_8.1
9747
9749
Chromosome_8.1
9759
9760
Chromosome_8.1
9841
9842
Chromosome_8.1
9847
9848
Chromosome_8.1
9858
9859
-
deletions"
1
2
2
2
3
2
1
1
5
1
2
3
13
Note: Track name is the name given to the relevant track in a genome browser. ChromStart
indicates first deleted base; chromEnd indicates first retained base; name field indicates a deletion;
score indicates depth of sequence coverage.
Essentials of Next Generation Sequencing 2015
Page 12 of 16
APPENDIX
5. GFF/GTF File Format - Definition and supported options
The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track
definition lines. The following documentation is based on the Version 2 specifications.
The GTF (General Transfer Format) is identical to GFF version 2.
Fields
Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted
with a '.'
1. Seqname
name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important
note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such
as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
2. Source
name of the program that generated this feature, or the data source (database or project name)
3. Feature
feature type name, e.g. Gene, Variation, Similarity
4. Start
Start position of the feature, with sequence numbering starting at 1.
5. End
End position of the feature, with sequence numbering starting at 1.
6. Score
A floating point value.
7. Strand
defined as + (forward) or - (reverse).
8. Frame
One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is
the first base of a codon, and so on..
A semicolon-separated list of tag-value pairs, providing additional information about each feature.
9. Attribute
Essentials of Next Generation Sequencing 2015
Page 13 of 16
APPENDIX
Example:
Chromosome_8.1
Cufflinks transcript
51508
51995
1000 .
.
gene_id "CUFF.1";
transcript_id "CUFF.1.1"; FPKM "28.2426370813"; frac "1.000000"; conf_lo "11.338452"; conf_hi "30.235873";
cov "6.323466"; full_read_support "yes";
Chromosome_8.1
Cufflinks exon 51508
51995
1000 .
.
gene_id "CUFF.1"; transcript_id
"CUFF.1.1"; exon_number "1"; FPKM "28.2426370813"; frac "1.000000"; conf_lo "11.338452"; conf_hi
"30.235873"; cov "6.323466";
Chromosome_8.1
Cufflinks transcript
48902
49980
1000 .
gene_id "CUFF.2";
transcript_id "CUFF.2.1"; FPKM "10.0792890004"; frac "0.500000"; conf_lo "5.811789"; conf_hi "14.016667";
cov "2.702250"; full_read_support "yes";
Chromosome_8.1
Cufflinks exon 48902
49980
1000 .
gene_id "CUFF.2"; transcript_id
"CUFF.2.1"; exon_number "1"; FPKM "10.0792890004"; frac "0.500000"; conf_lo "5.811789"; conf_hi
"14.016667"; cov "2.702250";
Chromosome_8.1
Cufflinks transcript
49167
51270
1
.
gene_id "CUFF.2";
transcript_id "MGG_01951T0"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.476481";
cov "0.000000"; full_read_support "no";
Chromosome_8.1
Cufflinks exon 49167
49902
1
.
gene_id "CUFF.2"; transcript_id
"MGG_01951T0"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi
"0.476481"; cov "0.000000";
Track lines
Although not part of the formal GFF specification, Ensembl will use track lines to further configure sets of features. Track lines should be
placed at the beginning of the list of features they are to affect.
The track line consists of the word 'track' followed by space-separated key=value pairs - see the example below. Valid parameters used by
Ensembl are:
• name - unique name to identify this track when parsing the file
• description - Label to be displayed under the track in Region in Detail
• priority - integer defining the order in which to display tracks, if multiple tracks are defined.
More information
For more information about this file format, see the documentation on the Sanger Institute website.
Essentials of Next Generation Sequencing 2015
Page 14 of 16
6. CUFFDIFF output (.diff files)
These are the main output file that one queries to identify genes that are differentially expressed between groups. There are 14 columns
containing the following information:
1. test_id
This is an arbitrary name given to describe the test. Defined in the attributes field of the .gtf file
2. gene_id
This is a systematic gene identifier. Defined in the attributes field of the .gtf file. In the above example, the names
under “gene” are actually systematic identifiers (common names have not been assigned to the genes in this
annotation)
3. gene
This would normally be a common gene name (e.g. GAPDH, TRP1, etc.). Defined in the attributes field of the .gtf
file.
4. locus
Position in reference genome
5. sample_1
Common name given to test group 1
6. sample_2
Common name given to test group 2
7. status
Statement about the statistical test (YES – test was performed; or NOTEST)
8. value_1
Average FPKM for samples in test group 1
9. value_2
Average FPKM for samples in test group 2
10. log2(fold_change)
Fold-change in FPKM in group 2, relative to group 1
11. test_stat
Test statistic value
12. p_value
Corresponding p-value
13. q_value
Corresponding q-value (p-value adjusted for false discovery rate due to multiple hypothesis testing)
14. significant
Whether or not q-value is significant
Essentials of Next Generation Sequencing 2015
Page 15 of 16
APPENDIX
Example (gene_exp.diff):
test_id
gene_id gene
p_value q_value
locus sample_1 sample_2 status
significant
value_1
value_2
log2(fold_change) test_stat
XLOC_000001
XLOC_000001
0.811532 1.21676 0.2
MGG_01946 Chromosome_8.1:14798-16581
748 0.815887 no
control
expt
OK
XLOC_000002
0
1
XLOC_000002
1
no
MGG_15984 Chromosome_8.1:42073-42397
control
expt
NOTEST
0
XLOC_000003
6.09575
XLOC_000003
1.20328 0
MGG_01950 Chromosome_8.1:46679-48726
1
1
no
control
expt
NOTES
2.64731
XLOC_000004
6.55666
XLOC_000004
-0.916602 0
MGG_01960 Chromosome_8.1:52092-56123
1
no
control
expt
NOTEST
12.3768
XLOC_000005
inf 0
XLOC_000005
1
1
no
MGG_01963 Chromosome_8.1:59811-60835
control
expt
NOTEST
0
4.61437
XLOC_000006
XLOC_000006
MGG_15986 Chromosome_8.1:73245-75566
-0.846541
-1.69972 0.0869
0.724348 no
control
expt
OK
46.5364
25.8797
Essentials of Next Generation Sequencing 2015
10.5195
18.4625
0
0
Page 16 of 16
Download