Knowledge Extraction for Semantic Web Using Web Mining 報告者:陳宜樺 報告日期:2015/10/30

advertisement
Knowledge Extraction for Semantic
Web Using Web Mining
報告者:陳宜樺
報告日期:2015/10/30
outline





Introduction
Background
Methodology
Implementation
Evaluation
Introduction(1/2)
 The backbone of semantic web is ontology
 This paper investigates the problem of extracting
knowledge from large number of web documents in
order to develop ontologies
Introduction(2/2)
 Web documents could be categorized in to three
categories based on their structure namely, unstructured, semi-structured and fully-structured
This research attempt combines the web content
mining with web usage mining to semi automate
the process of ontology learning from semistructured web documents
Background(1/2)
Web content mining
web content mining can take advantage of the
HTML tags in the semi-structured web pages
Web structure mining
web structure mining is mostly interested in the
hyperlinks of the web pages
Background(2/2)
Web usage mining
web usage mining is the process of extracting
knowledge from web user behaviour
Methodology(1/3)
Concept and conceptual relationship extraction
through web content mining
Concept Extraction:
Weighted frequency of a word depends on the
frequency of occurrence and the type of
HTML tags which are used to encapsulate it
工具:Part Of Speech(POS) tagging /Natural Language Processing (NLP)
Methodology(2/3)
Conceptual Relationships Extraction:
it produces a hierarchical tree structure, as the
same way how web authors present their
information
Methodology(3/3)
Conceptual relationships identification through web
usage mining
 log file(IP address/date/time/method: GET or
POST/file name/results)
 access logs
限制:1.One user could have several IP addresses even in the same session
2.Several users could have the same IP address due to the effect of
network address translation
Implementation
Web Ontology Language (OWL)
Picallo 2D graphics framework(Picallo 2D 圖形化框架)
版本: Piccolo2D.Java 、 Piccolo2D.NET、
PocketPiccolo2D.NET(在.NET Compact Framework)
openNLP tool (自然語言處理工具)
WUMprep
Implementation
Web Ontology Language (OWL)
Picallo 2D graphics framework(Picallo 2D 圖形化框架)
版本: Piccolo2D.Java 、 Piccolo2D.NET、
PocketPiccolo2D.NET(在.NET Compact Framework)
openNLP tool (自然語言處理工具)
WUMprep
Evaluation(1/2)
precision and recall (used in information retrieval) or
accuracy measure (used in machine learning)
User evaluation
Average and Novice based on the level of expertise
in semantic web
 Likert scale(李克特量表):是一種心理反應量表
Evaluation(2/2)
evaluation of the ontology against 'gold standard'
ontology
BBC wild life ontology
(英國廣播公司野生動物本體)
Thank You!
Download