Introduction to GenomeGraphs and Bioconductor

advertisement

Intro to

Biocoductor and GeneGraph

by Kyrylo Bessonov

(kbessonov@ulg.ac.be)

9 Oct 2012

Bioconductor repository

• Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/

• Bioconductor libraries cover

• Micorarray analysis

• Genetic variants analysis (SNPs)

• Sequence analysis (FASTA, RNA-seq)

• Annotation (pathways, genes)

• High-troughput assays (Mass Spec)

• All libraries are free to use and contain documentation (i.e. vignettes )

• Vignettes are short and require some previous knowledge of R and/or other defendant libraries

2

Configuring R for Bioconductor

• To install Bioconductor libraries the R environment needs to be configured source("http://bioconductor.org/biocLite.R")

• To download any Bioconductor library use biocLite( " package_name " ) function biocLite("GenomeGraphs")

• Load the downloaded library functions via the library("lib_name") function library("GenomeGraphs")

• Read help file(s) via the

??

or browseVignettes()

??

library_name (e.g. ??GenomeGraphs) browseVignettes (package = "GenomeGraphs")

3

Genome Data Display and Plotting

The GenomeGraphs library

Authors: Steen Durinck and James Bullard

(Bioconductor library)

Intro to GenomeGraphs library

• Allows to retrieve data from

• Ensembl comprehensive DB on genomes

• intron/exon locations

• sequence genetic variation data

• protein properties (pI, domains, motifs)

• GenomeGraphs Bioc library allows to display data on:

• Gene Expression (expression / location)

• Comparative genomic hybridization (CGH)

• Sequencing (e.g. variations/introns/exons)

5

GenomeGraphs: Plotting with gdPlot

• gdPlot main plotting function gdPlot(gdObjects, minBase, maxBase ...)

• gdObjects = any objects created by GenomeGraphs a) BaseTrack ; b) Gene ; c) GenericArray ; d) RectangleOverlay

• minBase the lowest nt position to be plotted ( optional )

• maxBase the largest nt to display ( optional )

Objects are data structures that hold/organize a set of variables

1) Create an object to plot using makeBaseTrack() function

ObjBaseT = makeBaseTrack (1:100, rnorm(1:100),strand = "+")

2) Plot the newly created object gdPlot (ObjBaseT)

• Display object structure / properties (e.g. of BaseTrack ) attributes (ObjBaseT)

• $ sign next to the name represents object variables

• Change or view individual object variables attr (ObjBaseT, “strand")

6

Manipulating GenomeGraphs class graphical properties

• Changing values of variables (classical way) attr(Object, "variable/parameter") = value attr(ObjBaseT, "strand") = "-“

• If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value attr(ObjBaseT, "base")[1] = 0

• Changing graphical parameters such as color showDisplayOptions (ObjBaseT) alpha = 1 lty = solid color = orange size = 5 lwd = 1 type = p getPar (ObjBaseT, "color")

[1] "orange" setPar (ObjBaseT, "color", "blue")

7

gdPlot composite/group plots

• gdPlot(…) can take any number of gdObjects to plot

• Let’s plot two BaseTrack objects a and b a = makeBaseTrack(1:100, rlnorm(100), strand = "+") b = makeBaseTrack(1:100, rnorm(100), strand = "-") ab=list(a,b) gdplot(ab)

8

Manipulating values of the grouped gdObjects

• ab is list of objects a and b

• To access individual object within the list use [ ] ab[1] will display object a ab[2] will display object b

• To modify the grouped object use double [[ ]]

• To change color of b to red getPar(ab [[ 2 ]] , "color") setPar(ab[[2]], "color", " red ")

-OR(re-create object) b = makeBaseTrack(1:100, rnorm(100), strand = "-", dp=DisplayPars(color=" red ")) ab=list(a,b) gdPlot(ab)

9

Plotting with labels and legend

• gdPlot does not provide functions to label axis

• Trick = “use labeled / tagged” objects

"label" = GenomeGraph object

"+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+")

• To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis (), makeLegend (c("+", "-"),c("orange", "red")) )

10

Retrieving and Displaying Data from

Public Database

Combining capabilities of biomaRt and GenomeGraph libraries

Welcome to biomaRt

• biomaRt library allows to retrieve data from public DBs ensembl snp unimart bacteria_mart_14

ENSEMBL GENES 68 (SANGER UK)

ENSEMBL VARIATION 68 (SANGER UK)

UNIPROT (EBI UK)

ENSEMBL BACTERIA 14 (EBI UK)

***Use listMarts() to see all available databases***

• Let’s retrieve gene data of the Bacillus subtilis strain

• useMart(database,dataset) allows to connect to specified database and dataset within this database db=useMart("bacteria_mart_14") listDatasets(db)

Dataset

Description

… bac_6_gene Bacillus subtilis genes (EB 2 b_subtilis) version

EB 2 b_subtilis db=useMart("bacteria_mart_14", "bac_6_gene")

12

Exploring biomaRt object

• listAttributes() shows all prop. of the biomaRt obj.

• The db object has total of 4175 genes

• Use getBM(attribute, filter, value, biomaRt_obj) to extract values belonging to specified attributes

• attribute: general term such as gene name/chromosome # / strand (+ or - )

• filter: parameter applied on attribute such as genomic region to consider (i.e. start and end in nt)

• value: actual values of the applied filter(s) getBM(c("external_gene_id", "description","start_position",

"end_position", "strand"), filters = c("start", "end"), values = list(1,10000), db)

ID description metS Methionyl-tRNA synthetase ftsH Cell division protease ftsH homolog hslO 33 kDa chaperonin

Dgk Deoxyguanosine kinase start(nt)

45633

76984

79880

23146 end(nt) strand

47627 1

78897

80755

23769

1

1

-1 13

Plotting the selected “Genome Region”

• Create an object with makeGeneRegion() function makeGeneRegion(start,end,chromosome name, strand, biomaRt object, plotting options)

• Find notation used for the chromosome naming getBM("chromosome_name","","",db) chromosome_name

1 Chromosome gRegion = makeGeneRegion(1, 10000, chr = " Chromosome ", strand = "+", biomart = db, dp =

DisplayPars( plotId = TRUE , idRotation = 90, cex = 0.8, idColor = "black")) gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1)))

14

Bacillus subtilis genome region (intron / exon)

ensembl_gene_id

15

Mapping Expression data

RNAseq

GenomeArray()

Intro into RNA-seq data

• HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp

• In addition to gene expression analysis it possible to

• Map location of introns (UTRs) / exons

• Principal: one searches for a rapid changes in abundance of the RNA-Seq signal (contigs)

• Integration of sequence + expression information

• Ugrappa Nagalakshmi et. al. 2008 had used this strategy to accurately map yeast genome 1

• Task: Display part of the seqDataEx dataset having both

• abundance of mRNA (cDNA) transcripts and

• annotated yeast genome

1

Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing .

Science , 2008

17

Plotting RNA-seq data with GenomeArray()

• Need to get mRNA contig abundance data data("seqDataEx") rnaSeqAb=seqDataEx$david

• Create GeneticArray object with makeGenericArray (intensity, probeStart, probeEnd, trackOverlay, dp = NULL)

• intensity either microaray or RNAseq transcript abundance signal

• probStart start position of the probe (location in nt)

• Plot contig abundance w.r.t. to genomic location gdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) , makeGenomeAxis() ) )

18

Adding genomic annotation to the prev plot

• Need to get annotated yeast genome data annotGen = useMart("ensembl", "scerevisiae_gene_ensembl")

• Create mRNA contig abundance mRNAabun = makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"])

• Create annotated seq. covering the mRNA contigs location annotSeq= makeGeneRegion(start = min(rnaSeqAb[, "location"]), end = max(rnaSeqAb[, "location"]), chr = "IV", strand = "+", biomart = annotGen, dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = 0.85, idColor = "black", size=0.5))

• Combine objects and plot them gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq),

1299000,1312000)

19

Resulting plot:

Transcript abundance w.r.t. to location

20

Overlays of Basic Shapes and Custom Text

• To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL, coords = c("genomic", "absolute"), dp) rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5))

• To overlay text use makeTextOverlay() tOver = makeTextOverlay("Ribosomal Large subunit",

1302000, 0.95, region = c(1,1), dp = DisplayPars(color = "red"))

• Combine all overlay objects into one vector with c(v1,v2) gdPlot( list("abund" = mRNAabun,makeGenomeAxis(),

"+" = annotSeq), overlays= c (rectOver, tOver) )

21

Overlay of rectangle and text

22

Alternative splicing of transcript

makeTranscript(id, type, biomart, dp = NULL)

Displaying alternative splicing of a gene

• mRNA coming from the same ORF could be spliced in many ways

• E.g. case of VDR genes of IgG

• Given biomaRt object, the makeTranscript() will extract splicing information for given id (i.e. gene)

• Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl")

• Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome))

• Get splicing data from biomaRt object (hGenome) spliceObj = makeTranscript("ENSG00000168309",

"ensembl_gene_id" ,hGenome)

• Plot the object with gdPlot gdPlot(list(makeTitle("Transcript ID:

ENSG00000168309"),splicingObj, makeGenomeAxis()) )

24

The final result

25

Conclusion

• GeneGraphs provides a wide range to plot genomic data

• Can use external databases through biomaRt

• Main useful features

• identifies exons/introns

• allows to cross-reference expression / genome data

• flexible albeit complex plotting capabilities

• allows to overlay graphical objects and text

• ability to create custom legends

• annotation capabilities provided by powerful biomaRt

26

Thank you for your patience!

&

Happy Bioconductor/R Exploration!

Download