Introduction to ChIPseq analysis

Introduction to ChIPseq analysis Sean Thomas, Ph.D. a little about me… BS Biology BA Architecture PhD Molecular Biology 2000 2006 studying antibiotic resistance evolving buildings in a computer regulation of gene expression in a eukaryote with no known protein-coding promoters postdoc numerous collaborations with ENCODE and related projects collaborative research portfolio fee-based consultation teaching, training and mentorship generalized pipeline of a ChIPseq study design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses basic study design principles three or more replicates per sample libraries X sample technical replicates are generally a waste of time and money samples X ✓ sequencing replicates libraries sequencing origin many studies do not account for batch effects i. time ii. origin so if you care about reproducibility experiment experiment1 experiment2 Experiment3… time -------> libraries, sequencing, etc design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses basic study design principles each sample has a matched input input ChIP replicates library/sequencing X sequencing one input is not rigorous. (different samples have different chromatin backgrounds ) input ChIP replicates library/sequencing input samples should ideally be sequenced comparably to ChIP samples X ChIP under-sequenced input input ✓ ChIP replicates library/sequencing ChIP well-sequenced input design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses basic study design principles do not pool data actual replicates pooled data X ✓ if you need to pool your data, then it is under-sequenced under-sequenced data pooled data design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses measure twice, cut once design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses fastqc – quality control http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses quality filter design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses adapter/barcode trimming design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses tag uniqueness design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses Bowtie2 - alignment http://bowtie-bio.sourceforge.net/bowtie2/index.shtml i.) build genome database format: bowtie2-build [options]* <fasta_reference_in> <bt2_index_base> sample code: bowtie2-build myDB.fasta myDB ii.) test bowtie2 and database bowtie2 -x myDB --no-head -c ACATACTTCTTTATATGCCCATA iii.) run alignment bowtie2 -x myDB -U seq.fq -S seq_alignedToMyDB.sam design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses i.) segment the genome into appropriately sized bins (e.g. 20 bp) chr1 25 45 chr1 45 65 chr1 65 85 … http://code.google.com/p/bedops/ ii.) count the number of tags within ‘x’ bp of each genomic bin (e.g. 75bp) chr1 25 45 0 chr1 45 65 2 chr1 65 85 5 … iii.) convert file to BigWig format (e.g. bedGraphToBigWig) iv.) host file on public web server v.) generate url link text: track type=bigWig name="SRX081879" description="SRX081879" bigDataUrl=http://lighthouse.ucsf.edu/public_files_no_password/sthomas/chicken/SRX081879_gal3.20bpdensity.bw vi.) load tracks into UCSC genome browser (genome.ucsc.edu) design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses tag density distribution reproducibility similarity of coverage … design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses unexpected signals systematic biases confounding factors new biology design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses peak calling not standardized, and is an active field of research most respected genomics labs have their own unique methods, there are five distinct methods just within the ENCODE project alone: http://www.ncbi.nlm.nih.gov/pubmed/22955991 http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeTfBindingSuper appropriate methodologies depend on data type punctate mixed signal SPP MACS - http://compbio.med.harvard.edu/Supplements/ChIP-seq/ http://liulab.dfci.harvard.edu/MACS/ broad signal design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses what do you need to get to the point of doing sequence tag alignments? - reproducible experimental system molecular biology lab/reagents/expertise well conceived study design modern computer running bowtie and fastqc reliable library construction and sequencing lab/reagents/expertise in order to build and view tracks on the UCSC genome browser, call ChIP peaks - mac/linux machine - a web server - beginner bioinformatics expertise (~8 hrs of training, for motivated novice) in order to do solid downstream analyses - combination of advanced genomics, bioinformatics and biology experience (either one individual or a team working together). design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses downstream analysis for labs without current computational capability unsuccessful projects - underestimate the importance of proper study design - fail to appreciate or apply necessary expertise to the problem successful projects - effective collaboration with computational scientist - lab member undergoes intensive mentoring with computational scientist design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses case study A Temporal Chromatin Signature in Human Embryonic Stem Cells Identifies Regulators of Cardiac Development http://www.cell.com/retrieve/pii/S0092867412010586 design study obtain input chromatin perform precipitation construct library sequence library filter sequences align sequences get tag density tracks assess data quality understand the data proceed to downstream analyses iterative exploration expertise from a range of different fields are necessary to synthesize genomic data into understanding. exploit all of the information present in your data http://www.cell.com/retrieve/pii/S0092867412010586 experimental validation Introduction to ChIPseq analysis Sean Thomas, Ph.D.

Introduction to ChIPseq analysis

Related documents

Products

Support

Introduction to ChIPseq analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib