Introduce GeneSpring GX12 Yun Lian 04-27-2012 GeneSpring Layout Organization of Elements • Project: a container for a collection of exp. • Experiment: a collection of samples for which arrays have been run in order to answer a specific scientific question. • Sample: can be data files or sample which is created within GeneSpring. • Exp. Grouping, Parameter: ie. Treat vs un-treat, age, gender, …. • Interpretation: defines a particular way of grouping samples into experimental conditions for both data visualization and analysis. • Entity List: comprises a subset of entities (i.e., genes, exons, genomic regions, etc.) associated with a particular technology. Technology and Biological Genome • Technology: defined as the package of data regarding array design, biological and other information about the entities. Technology is available for each individual array type. An experiment comprises samples which all belong to the same technology. A technology initially must be installed for each new array type to be analyzed. Create a technology Technology and Biological Genome (cont.) • Annotations Manager (new): is the central interface for updating, deleting or adding custom annotations for various experiment types. It allows the user to organize and view multiple builds of different organisms and their respective annotations for genes, transcripts, SNPs, reference, and other annotation resources. Technology and Biological Genome (cont.) • Translation (new): is a new feature that allows comparison of entity lists between experiments of different technologies. The users can compare same organism but different technologies; or identify the homologues among different organism. This cross-organism translation is done via HomoloGene tables that map Entrez identifiers in one organism to Entrez identifiers in the other. ftp://ftp.ncbi.nih.gov/pub/ HomoloGene Table Serial No. Organism 1 Mus musculus 2 Rattus norvegicus 3 Magnaporthe grisea 4 Kluyveromyces lactis 5 Eremothecium gossypii 6 Arabidopsis thaliana 7 Oryza sativa 8 Schizosaccharomyces pombe 9 Saccharomyces cerevisiae 10 Neurospora crassa 11 Plasmodium falciparum 12 Caenorhabditis elegans 13 Anopheles gambiae 14 Drosophila melanogaster 15 Danio rerio 16 Pan troglodytes 17 Gallus gallus 18 Homo sapiens 19 Canis lupus familiaris 20 Bos taurus Technology and Biological Genome (cont.) Translation (cont.) GeneSpring provides a way to explicitly define an annotation column for the source technology and an annotation column for the destination technology for translation, through the menu Tool > Options > Miscellaneous > Translation Mapping. This feature is useful in translating data between a custom technology and a standard technology. Technology and Biological Genome (cont.) • Biological Genome (new): refers to the collective set of all major annotations (Entrez-ID, GO IDs etc.) for any particular organism. It is created using the information available at NCBI and can be stored in GeneSpring . It is independent of any chip technology and once created can be used across multiple chip types and technologies. The NCBI site used for Biological Genome creation can be accessed from Tools > Options > Miscellaneous > NCBI ftp URL. Biological Genome (cont.) Biological Genome is essential in performing biological analyses in Generic experiments lacking annotations. The Biological Genome can be created from Annotations > Create Biological Genome For using the Biological Genome created for an organism in an experiment, the user has to update the annotations for that particular technology from Tools > Update Technology Annotations > Update from Biological Genome Running Expression Workflow 1. Create new project/ Open existing project 2. Create new experiment/Open existing exp. Upon creating an experiment of a specific chip type for the first time, GeneSpring prompts the user to download the technology from the update server… 3. Select analysis types from the drop-down menu of Association, Copy Number, Exon, Expression, ….. 4. Select workflow type: Analysis Biological Significance: for a new user Data Import Wizard: for an advanced user Running Expression Workflow (cont.) Analysis Biological Significance Analyzing Affymetrix Exon Splicing Data • Affymetrix Exon chips are used for studying the alternative splicing of genes • When create a new experiment, should be specify: Analysis Type = exon Experiment Type= Affymetrix Exon Splicing • The Data Import Wizard is the only option for the Affymetrix Exon Splicing experiment • User is needed to select the Confidence level, as Core, Extended or Full when setup exp.--Baseline Options Analyzing Affymetrix Exon Splicing Data (cont.) • Filter transcripts on DABG: Analyzing Affymetrix Exon Splicing Data (cont.) Analyzing Affymetrix Exon Splicing Data(cont.) Splicing ANOVA: calculates the gene-level normalized intensities for each of the probesets 1st, and then runs a n way ANOVA where n denotes the number of parameters Analyzing Affymetrix Exon Splicing Data(cont.) • Filter on Splicing Index: • Splicing Index is defined as the difference between the gene normalized intensities of the probe sets for the two condition of samples. Copy Number Analysis • The analysis for Copy Number Variation (CNV), Allele Specific Copy Number (ASCN), Parent Specific Copy Number (PSCN), Loss of Heterozygosity (LOH), Log Ratio and Common Genomic Variant Regions are new features in GeneSpring. • All the above computations follow either the Against Reference method (comparing specified arrays against a reference file created from a set of reference individuals) or the Paired Normal Method (compares each specified array against a corresponding array obtained from a normal tissue sample of the same individual) Copy Number Analysis (cont.) • Reference: GeneSpring comes prepackaged with a reference created using the Phase II 270 HapMap samples, the second generation human haplotype map from The International HapMap Consortium. According to the reference cited above, this Phase II HapMap characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25 to 35 percent of common SNP variation in the populations surveyed. Association Analysis • GeneSpring provides a toolkit for GenomeWide Association Analysis (GWAS), which lies on the analysis of SNP and phenotype data for studies that can involve thousands of unrelated samples genotyped at hundreds of thousands of SNPs. Association Analysis (cont.) • Features: • QC methods to identify anomalous samples which would confound the analysis. • Filters to remove undesirable SNPs from the analysis. • Statistical tests to identify SNPs that are associated with disease incidence, and methods to eliminate false alarm results. • Haplotype inference and analysis methods, which can detect more subtle disease-causing groups of SNPs. • A wide range of publication-quality intuitive visualization options, which include LD Plot, Haplo block view, etc. • Full integration into other features of GeneSpring GX, such as Pathway Analysis and Genome Browser. Functional Grouping 1. Gene Ontology Analysis: The Gene Ontology.(GO) Consortium maintains a database of controlled vocabularies for the description of molecular functions, biological processes and cellular components of gene products. These GO terms are represented as a Directed Acyclic Graph (DAG) structure. GeneSpring is packaged with the GO terms and their DAG relationships as provided by the GO Consortium. GeneSpring has a fully-featured gene ontology analysis module that allows exploring gene ontology terms associated with the entities of interest. Functional Grouping (cont.) 2. GSEA (Gene Set Enrichment Analysis) and GSA (Gene Set Analysis) : are the computational methods that determines whether an priori defined set of genes shows statistically significant differences between two phenotypes. In many cases, few genes pass the statistical significance criterion. When a larger number of genes qualify, there is often a lack of unifying biological theme, which makes the biological interpretation difficult. GSEA and GSA overcome these analytical difficulties by focusing on gene sets rather than individual genes. It uses the ranked gene list to identify the gene sets that are significantly differentially expressed between two phenotypes. Functional Grouping (cont.) 3. Pathway Analysis: provides the necessary biological context for the functional analysis. In version 12.0 of GeneSpring now allows you to overlay data from two different experiments on the same pathway simultaneously and thus enables an integrated analysis of data from different experiment types. Pathway Analysis (cont.) • Import curated pathways directly from the WikiPathways portal (http://www.wikipathways.org), or pathways from other sources in the BioPAX (Level 2), GPML, or Text format, or create your own interaction networks from a database of biological, and chemical entities, …. • View and investigate these pathways and interaction networks in an interactive pathway viewer and overlay your experimental data on these pathways. Pathway Analysis (cont.) • Access other popular pathway analysis tools like IPA, Metacore, and Cytoscape from within GeneSpring , allowing you to export data to these tools. • Create networks based on information in PubMed abstracts and identify interactions associated with Medical Subject Headings (MeSH) terms using the in-built NLP feature.