Data Analysis Using GeneSpring

advertisement
Introduce GeneSpring GX12
Yun Lian
04-27-2012
GeneSpring Layout
Organization of Elements
• Project: a container for a collection of exp.
• Experiment: a collection of samples for which
arrays have been run in order to answer a specific
scientific question.
• Sample: can be data files or sample which is
created within GeneSpring.
• Exp. Grouping, Parameter: ie. Treat vs un-treat,
age, gender, ….
• Interpretation: defines a particular way of
grouping samples into experimental conditions
for both data visualization and analysis.
• Entity List: comprises a subset of entities (i.e.,
genes, exons, genomic regions, etc.) associated
with a particular technology.
Technology and Biological Genome
• Technology: defined as the package of data
regarding array design, biological and other
information about the entities. Technology
is available for each individual array type. An
experiment comprises samples which all
belong to the same technology.
A technology initially must be installed for
each new array type to be analyzed.
Create a technology
Technology and Biological Genome
(cont.)
• Annotations Manager (new): is the central
interface for updating, deleting or adding
custom annotations for various experiment
types. It allows the user to organize and view
multiple builds of different organisms and
their respective annotations for genes,
transcripts, SNPs, reference, and other
annotation resources.
Technology and Biological Genome
(cont.)
• Translation (new): is a new feature that allows
comparison of entity lists between experiments
of different technologies. The users can compare
same organism but different technologies; or
identify the homologues among different
organism. This cross-organism translation is done
via HomoloGene tables that map Entrez
identifiers in one organism to Entrez identifiers in
the other. ftp://ftp.ncbi.nih.gov/pub/
HomoloGene Table
Serial No. Organism
1 Mus musculus
2 Rattus norvegicus
3 Magnaporthe grisea
4 Kluyveromyces lactis
5 Eremothecium gossypii
6 Arabidopsis thaliana
7 Oryza sativa
8 Schizosaccharomyces pombe
9 Saccharomyces cerevisiae
10 Neurospora crassa
11 Plasmodium falciparum
12 Caenorhabditis elegans
13 Anopheles gambiae
14 Drosophila melanogaster
15 Danio rerio
16 Pan troglodytes
17 Gallus gallus
18 Homo sapiens
19 Canis lupus familiaris
20 Bos taurus
Technology and Biological Genome (cont.)
Translation (cont.)
GeneSpring provides a way to explicitly define
an annotation column for the source
technology and an annotation column for the
destination technology for translation,
through the menu Tool > Options >
Miscellaneous > Translation Mapping.
This feature is useful in translating data
between a custom technology and a standard
technology.
Technology and Biological Genome (cont.)
• Biological Genome (new): refers to the collective
set of all major annotations (Entrez-ID, GO IDs
etc.) for any particular organism. It is created using
the information available at NCBI and can be
stored in GeneSpring . It is independent of any
chip technology and once created can be used
across multiple chip types and technologies.
The NCBI site used for Biological Genome creation
can be accessed from Tools > Options >
Miscellaneous > NCBI ftp URL.
Biological Genome (cont.)
Biological Genome is essential in performing
biological analyses in Generic experiments lacking
annotations.
The Biological Genome can be created from
Annotations > Create Biological Genome
For using the Biological Genome created for an
organism in an experiment, the user has
to update the annotations for that particular
technology from Tools > Update Technology
Annotations > Update from Biological Genome
Running Expression Workflow
1. Create new project/ Open existing project
2. Create new experiment/Open existing exp.
Upon creating an experiment of a specific chip type
for the first time, GeneSpring prompts the user to
download the technology from the update server…
3. Select analysis types from the drop-down menu of
Association, Copy Number, Exon, Expression, …..
4. Select workflow type:

Analysis Biological Significance: for a new user

Data Import Wizard: for an advanced user
Running Expression Workflow (cont.)
Analysis Biological Significance
Analyzing Affymetrix Exon
Splicing Data
• Affymetrix Exon chips are used for studying the
alternative splicing of genes
• When create a new experiment, should be
specify: Analysis Type = exon
Experiment Type= Affymetrix Exon Splicing
• The Data Import Wizard is the only option for the
Affymetrix Exon Splicing experiment
• User is needed to select the Confidence level, as
Core, Extended or Full when setup exp.--Baseline Options
Analyzing Affymetrix Exon
Splicing Data (cont.)
•
Filter transcripts on DABG:
Analyzing Affymetrix Exon
Splicing Data (cont.)
Analyzing Affymetrix Exon Splicing Data(cont.)
Splicing ANOVA: calculates the gene-level normalized
intensities for each of the probesets 1st, and then runs a n way ANOVA where n denotes the number of parameters
Analyzing Affymetrix Exon Splicing Data(cont.)
• Filter on Splicing Index:
• Splicing Index is defined as the difference
between the gene normalized intensities
of the probe sets for the two condition of
samples.
Copy Number Analysis
• The analysis for Copy Number Variation (CNV),
Allele Specific Copy Number (ASCN), Parent
Specific Copy Number (PSCN), Loss of
Heterozygosity (LOH), Log Ratio and Common
Genomic Variant Regions are new features in
GeneSpring.
• All the above computations follow either the
Against Reference method (comparing specified
arrays against a reference file created from a set
of reference individuals) or the Paired Normal
Method (compares each specified array against a
corresponding array obtained from a normal
tissue sample of the same individual)
Copy Number Analysis (cont.)
• Reference: GeneSpring comes prepackaged with a
reference created using the Phase II 270 HapMap
samples, the second generation human haplotype map
from The International HapMap Consortium.
According to the reference cited above, this Phase II
HapMap characterizes over 3.1 million human
single nucleotide polymorphisms (SNPs) genotyped in
270 individuals from four geographically
diverse populations and includes 25 to 35 percent of
common SNP variation in the populations surveyed.
Association Analysis
• GeneSpring provides a toolkit for GenomeWide Association Analysis (GWAS), which lies
on the analysis of SNP and phenotype data for
studies that can involve thousands of
unrelated samples genotyped at hundreds of
thousands of SNPs.
Association Analysis (cont.)
• Features:
• QC methods to identify anomalous samples which would
confound the analysis.
• Filters to remove undesirable SNPs from the analysis.
• Statistical tests to identify SNPs that are associated with
disease incidence, and methods to eliminate false alarm
results.
• Haplotype inference and analysis methods, which can
detect more subtle disease-causing groups of SNPs.
• A wide range of publication-quality intuitive visualization
options, which include LD Plot, Haplo block view, etc.
• Full integration into other features of GeneSpring GX,
such as Pathway Analysis and Genome Browser.
Functional Grouping
1. Gene Ontology Analysis: The Gene Ontology.(GO)
Consortium maintains a database of controlled
vocabularies for the description of molecular
functions, biological processes and cellular
components of gene products. These GO terms are
represented as a Directed Acyclic Graph (DAG)
structure.
GeneSpring is packaged with the GO terms and
their DAG relationships as provided by the GO
Consortium. GeneSpring has a fully-featured gene
ontology analysis module that allows exploring gene
ontology terms associated with the entities of
interest.
Functional Grouping (cont.)
2. GSEA (Gene Set Enrichment Analysis) and GSA (Gene
Set Analysis) : are the computational methods that
determines whether an priori defined set of genes
shows statistically significant differences between two
phenotypes.
In many cases, few genes pass the statistical significance
criterion. When a larger number of genes qualify, there
is often a lack of unifying biological theme, which
makes the biological interpretation difficult. GSEA and
GSA overcome these analytical difficulties by focusing
on gene sets rather than individual genes. It uses the
ranked gene list to identify the gene sets that are
significantly differentially expressed between two
phenotypes.
Functional Grouping (cont.)
3. Pathway Analysis:
provides the necessary biological
context for the functional analysis.
In version 12.0 of GeneSpring now
allows you to overlay data from two
different experiments on the same
pathway simultaneously and thus
enables an integrated analysis of data
from different experiment types.
Pathway Analysis (cont.)
• Import curated pathways directly from the
WikiPathways portal
(http://www.wikipathways.org), or pathways
from other sources in the BioPAX (Level 2),
GPML, or Text format, or create your own
interaction networks from a database of
biological, and chemical entities, ….
• View and investigate these pathways and
interaction networks in an interactive pathway
viewer and overlay your experimental data on
these pathways.
Pathway Analysis (cont.)
• Access other popular pathway analysis tools
like IPA, Metacore, and Cytoscape from within
GeneSpring , allowing you to export data to
these tools.
• Create networks based on information in
PubMed abstracts and identify interactions
associated with Medical Subject Headings
(MeSH) terms using the in-built NLP feature.
Download