Functional Annotation Clustering

advertisement
基因功能註解工具:DAVID
Database for Annotation, Visualization and Integrated Discovery
(DAVID )
 Functional Annotation Tool





Gene Ontology
Protein interaction
Protein domain
Pathway
Disease
 Gene ID Conversion
 Gene Functional Classification
DAVID 操作流程
上傳基因列表到網站
Gene Name Batch Viewer
Gene Functional Classification
Functional Annotation Tool
選定類別以進行分析
取得結果
上傳基因列表
AFFYMETRIX_3PRIME_IVT_ID
AFFYMETRIX_EXON_GENE_ID
AFFYMETRIX_SNP_ID
AGILENT_CHIP_ID
AGILENT_ID
AGILENT_OLIGO_ID
ENSEMBL_GENE_ID
ENSEMBL_TRANSCRIPT_ID
ENTREZ_GENE_ID
FLYBASE_GENE_ID
FLYBASE_TRANSCRIPT_ID
GENBANK_ACCESSION
GENOMIC_GI_ACCESSION
GENPEPT_ACCESSION
ILLUMINA_ID
IPI_ID
MGI_ID
OFFICIAL_GENE_SYMBOL
PFAM_ID
PIR_ID
PROTEIN_GI_ACCESSION
REFSEQ_GENOMIC
REFSEQ_MRNA
REFSEQ_PROTEIN
REFSEQ_RNA
RGD_ID
SGD_ID
TAIR_ID
UCSC_GENE_ID
UNIGENE
UNIPROT_ACCESSION
UNIPROT_ID
UNIREF100_ID
WORMBASE_GENE_ID
WORMPEP_ID
ZFIN_ID
Not Sure
1.確定物種
3.
2.選定後使用
Functional Annotation Tool
DAVID Gene ID:
It is an internal ID generated on "DAVID Gene Concept" in DAVID system. One DAVID gene ID represents
one unique gene cluster belonging to one single gene entry.
1. Input Gene list : 817
Map to David Database : 754
David IDs : 734
2. Genes from your list involved
in this annotation categories
3. 99 / 734
4. Single chart report only for
this annotation categories.
Functional Annotation Chart
Functional Annotation Chart
Chart Report is an annotation-term-focused view which lists annotation terms and their associated genes under
study. To avoid over counting duplicated genes, the Fisher Exact statistics is calculated based on corresponding
DAVID gene IDs by which all redundancies in original IDs are removed. All result of Chart Report has to pass the
thresholds (by default, Max.Prob.<=0.1 and Min.Count>=2) in Chart Option section to ensure only statistically
significant ones displayed.
List Total(LT) - number of genes in the gene list mapping to the category of which the term is a member
Population Hits(PH) - number of genes in the background gene list mapping to a specific term
Population Total(PT) - number of genes in the background gene list mapping to the category
每頁可顯示多少結果
RT (Related Term)
Related Term Search can identify
other similar terms
a modified Fisher Exact P-Value
(EASE Score)
RT (Related Term)
Any given gene is associating with a set of annotation terms. If genes share similar set of those terms, they
are most likely involved in similar biological mechanisms. The algorithm adopts kappa statistics to
quantitatively measure the degree of the agreement how genes share the similar annotation terms. Kappa
result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement.
Any a biological process/term coming from all functional categories listed in DAVID.
Annotation Category - Functional Categories
COG_ONTOLOGY refers to an ontology from NCBI's COG database
The database of Clusters of Orthologous Groups of proteins (COGs): a tool for genome-scale analysis of
protein functions and evolution
SP_PIR_KEYWORDS are keywords defined by the SwissProt/Uniprot and PIR (Protein Information Resource)
UP_SEQ_FEATURE refers to the annotation category, Uniprot Sequence Feature, found at the Uniprot site,
within their report.
Annotation Category – Protein domain & Protein Interaction
Protein structure
Annotation Category - Gene Ontology
GOTerms are categorized into 3 groups:
BP - Biological Process
MF - Molecular Function
CC - Cellular Component
GOTERM_BP_1 -> GO term under
Biological Process (BP) in the Level 1.
GOTERM_BP_ALL -> GO term under
Biological Process (BP) in the ALL
possible Levels.
GOTERM_BP_FAT - Basically this test
exams the significance of enriched
annotation
(GO FAT) filters out very broad GO terms
based on a measured specificity of each
term (not level-specificity)
Annotation Category-Pathways
KEGG
Biocarta
Combined View Annotation
總共 11項 Categories
挑選11項 Categories
Functional Annotation Cluster
Functional Annotation Clustering
Due to the redundant nature of annotations, Functional Annotation Chart presents similar/relevant annotations repeatedly. It dilutes
the focus of the biology in the report. To reduce the redundancy, the newly developed Functional Annotation Clustering report
groups/displays similar annotations together which makes the biology clearer and more focused to be read vs. traditional chart report.
• The Functional Annotation Clustering integrates the same techniques of Kappa statistics to measure the degree of the common
genes between two annotations, and fuzzy heuristic clustering to classify the groups of similar annotations according kappa
values.
調整 Kappa statistics 的參數
調整 fuzzy heuristic clustering的參數
P_value
All gene involved in this annotation cluster
Heat map
Ease score (modified fisher exact test)
Initial Group Members (any value >=2; default = 4): the minimum gene number in a seeding group, which affects the
minimum size of each functional group in the final. In general, the lower value attempts to include more genes in functional
groups, particularly generates a lot small size groups.
Final Group Members (any value >=2; default = 4): the minimum gene number in one final group after “cleanup” procedure.
In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups.
It co-functions with previous parameters to control the minimum size of functional groups. In the final cluster, the number of
terms that a cluster must have to be presented in the output.
Multi-linkage Threshold (any value between 0% to 100%; default = 50%): It controls how seeding groups merge each other,
i.e. two groups sharing the same gene members over the percentage will become one group. The higher percentage, in
general, gives sharper separation i.e. it generates more final functional groups with more tightly associated genes in each
group. In addition, changing the parameter does not contribute extra genes into unclustered group.
Enrichment Score = [ -log(P_value 1) + -log(P_value 2) + -log(P_value N) ] / n
Chart vs Cluster
•
If you run both functions with defualt setting, they will not be totally overlapped. In general, clustering
result may contain more result than chart. In clustering, some 'non-significant' terms could be
included due to the link of their 'significant' neigthbors (co-members in on cluster).
•
If you want to completely cross link the two reports, you should run chart report by setting p-value
cutoff to "1" (ground level). Thus, you will have all possible terms with significant or insignificant pvalues.
上傳基因列表到網站
Gene Name Batch Viewer
Gene Functional Classification
Functional Annotation Tool
選定類別以進行分析
取得結果
Another Tools in DAVID
Gene Name Batch Viewer
Gene Functional Classification Tool
Term report
Gene Functional Classification Tool - Create sublist
Gene ID Conversion Tool
Thank you for your attention
Download