Agreement for NGS RNA-seq collaboration UMR INRA 116 -UEVE-ERLCNRS 8196 Validated by S. Balzergue :XX.XX.XX General information and conditions of access Access to mRNA-seq sequencing for transcriptome analysis is possible through partnerships between the URGV and other laboratories, within the extent of the resources produced in URGV. To benefit from this support, interested teams will fill a request form to be sent to Sandrine Balzergue (URGV) by email (balzerg@evry.inra.fr). The preparation of libraries, sequencing on Illumina units will be performed by dedicated staff of the platform. The team brings in its expertise on transcriptome analysis and its workforce in this collaboration but does not benefit from sufficient financial resources to support the cost of consumables that will therefore be charged to the partner laboratory: 200€ for library construction (« classical » TRUSeq Illumina. Contact the platform for specific protocols if necessary) 1500€ for each lane of HiSeq2000 Pair End 2x100bp (+/- 150000K reads) 200€ for each sample as a contribution to the costs of server maintenance and data storage. Option: 100€ per species for the functional annotation of a UniGene set Note: the collaborator must edit a purchase order in the amount specified by this quote and in a delay of 15 days after receipt of samples. Collaborator must send the original order at INRAURGV, accounting department, 2 rue Gaston Cremieux 91000 Evry. If you want more information, please contact: secretariat@evry.inra.fr Data exchange format: The raw data, contigs (if performed), raw counts, functional annotation of UniGene set (if requested option) will be available via our secure FTP site loading or external hard drive (1 Tera). Note : The hard drive is provided by the platform, the partner agrees to return it after data transfer within 2 weeks of its reception. Otherwise, the cost of the hard drive plus 10% will be charged. Normalized data and the results of the differential analysis will be sent as an Excel table via FileX with a password. Data storage: The URGV agrees to keep the raw data (not images) for 3 months after completion of the project. After this period, data will be deleted. Databases: It is expected that the results of experiments are integrated into the database CATdb, Gagnot et al. Nucleic Acids Res. January 2008, 36 (compatible with standard MIAME: Brazma et al, 2001. Nat Genet. 29 (4) :365-71) and transmitted to the Geomnibus (GEO) database of NCBI. GEO will then issue an accession number required for any publication of transcriptome results. Data Release : The data will be released within 1 year after project completion; an email will notify you 15 days before. On the same date, the remaining RNA samples and libraries will be returned. Technical data (protocols etc..) and a list of current collaborations are available on the URGV website: http://www.versailles.inra.fr/urgv/microarray.htm Characteristics of sequencing runs and delays Version du 12.03.2013-E. Delannoy/S. Balzergue Page 1 / 5 Agreement for NGS RNA-seq collaboration UMR INRA 116 -UEVE-ERLCNRS 8196 Validated by S. Balzergue :XX.XX.XX The runs are performed on the Illumina sequencer HiSeq2000 via the CNS Genomics Institute in Evry. The number of reads per sample is adjusted according to your biological question. A phone call or a meeting at Evry to determine the aims and issues of the project and set up the experimental plan will systematically take place. Schedule : The estimated time from reception of RNA of satisfactory quality and amount to the reception of normalized and differentially analyzed data is 3 to 4 months. This time takes into account: - Sequencing: about 2 months - Construction of libraries: 2 to 3 weeks depending on the number - Assembly of reads. The duration of this step is particularly sensitive to the quality of reads, the number of samples etc. .. and is therefore very difficult to estimate. - Counting (on a reference genome or contigs from the assembly) - Standardization and differential analysis - Data Integration into CATdb and GEO Samples preparation It is important to note that many factors influence the gene expression levels of a plant. The control of experimental conditions is therefore crucial if we want to link a difference in expression to the function studied. And a control plant compared to a plant having undergone a specific treatment should be grown in the same light and nutritional environment as the latter. For example, a shift of sampling time during the day will reveal differences due to the circadian expression of many genes. A lack of homogeneity of watering or plant treatment can be a source of variability unrelated to the studied process. These considerations should be taken into account to ensure the reproducibility of sampling. Replicates It is essential to distinguish between a technical repetition and biological repetition. Technical replicates: Technical repetitions of a sample are prepared at the same time (sowing, harvest, extraction ...). Samples from different individuals but from the same experiment are considered as technical replicates It allows the observation and quantification of technical bias (technical variability), the control of the reproducibility of the study and the quality control of the data but the results cannot be generalized. We also carry out the sequencing of biological replicates on different tracks. Biological replicates: Biological replicates samples are prepared in independent experiments (sowing, harvesting, extraction ...) with at least a 24 hours shift (beware of the circadian cycle). It allows the observation of inter-individual and inter-experiment variability. The results can then be generalized. Version du 12.03.2013-E. Delannoy/S. Balzergue Page 2 / 5 Agreement for NGS RNA-seq collaboration UMR INRA 116 -UEVE-ERLCNRS 8196 Validated by S. Balzergue :XX.XX.XX It is necessary to provide at least one biological repetition, that is to say a repetition of the whole experiment. The aim is to characterize the biological variability between replicates, and "remove" it, to identify genes whose differential expression is related only to the studied factor. Quantity and quality needed for experiments 4μg of total RNA (minimum concentration of 200ng/μl, contact the platform if you are not able to get this amount) per sample is expected. The purity of RNA is one of the most important factors for the success of the experiment, it is preferable to use an affinity purification protocol including the step of DNase I treatment (RNeasy kit Qiagen for example except for the sequencing of small RNA => contact the platform). For "difficult" samples such as seeds, the addition of PVP is very useful, contact us if needed. Total RNAs will be sent in dry ice in the elution solution. Their quality will be estimated on Agilent Chip and will be quantified with "RiboGreen" after arriving on the platform. Total RNA will be sent together with the fully completed information table (see and print the last page of this agreement). Process - Quality control of total RNA (Agilent Bioanalyzer) and quantification (RiboGreen). - Construction of libraries (RNA-seq, Small-RNA, RNA-seq-directional ...): Illumina protocols mainly - Preparation of samples for sequencing (c-bot) - Sequencing on Illumina HiSeq2000 - Assembly - Contigs (if needed) - Mapping - Counting - Standardization - Differential Analysis After statistical analysis of raw results, for each comparison, a list of genes is produced as an Excel file. It includes the average count of condition #1, the average count of condition #2, the ratio, a raw and adjusted p-value to allow control of false positives. Results publication This is a project involving the collaborating scientists and the URGV, in which the platform brings in its expertise. Only the cost of consumables is supported by the partner laboratory. - A member of the transcriptomics Platform and a member of "Genomics and Predictive Bioinformatics" of URGV will be co-authors of the first publication in which transcriptome data will be presented. Version du 12.03.2013-E. Delannoy/S. Balzergue Page 3 / 5 Agreement for NGS RNA-seq collaboration UMR INRA 116 -UEVE-ERLCNRS 8196 Validated by S. Balzergue :XX.XX.XX You will also be asked to include in the description of the data, the CATdb database (example: "All raw and normalized data are available-through the database CATdb (AU_XXXXXXX, Gagnot et al, 2008) and from the Gene Expression omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) (T. Barrett et al. NAR 2006): GSE accession number XXXXX. Project Design (required): We ask you to provide the following information: 1 - Title of Project 2 - Name and address of project manager 3 - Name and address of the person responsible for monitoring the analysis in relationship with URGV 4 - Scientific aims (be as accurate as possible including :) Biological question? Annotation, RNA quantification / Small-RNA, construction of High Density chip? …. 5 – Experiment Design including: Number of reads per sample: Multiplexed sample: yes / no Sequencing: Single Reads / Pair End Sequencing length: 50bp, 100bp, 150bp Does a reference genome or UniGene set exist?: yes / no Should there be a functional annotation of contigs?: yes / no … 6 - Number of libraries - description of samples per run (organ, stage of sampling according to Boyes et al. Plant Cell 2001, treatment ...)– 7 - Expected date of delivery of samples to URGV Signature: Experimental laboratory(URGV) Collaborator -----------------------------------------------------------------------------------------------------------------------8 – Table to join to the RNA samples when sending: TUBE name Sample name experimental design on Concentration µg/µl Version du 12.03.2013-E. Delannoy/S. Balzergue RNA extraction method? DNAseI? Page 4 / 5 Agreement for NGS RNA-seq collaboration UMR INRA 116 -UEVE-ERLCNRS 8196 Validated by S. Balzergue :XX.XX.XX Version du 12.03.2013-E. Delannoy/S. Balzergue Page 5 / 5