Knowledge Extraction for Semantic Web Using Web Mining 報告者:陳宜樺 報告日期:2015/10/30 outline Introduction Background Methodology Implementation Evaluation Introduction(1/2) The backbone of semantic web is ontology This paper investigates the problem of extracting knowledge from large number of web documents in order to develop ontologies Introduction(2/2) Web documents could be categorized in to three categories based on their structure namely, unstructured, semi-structured and fully-structured This research attempt combines the web content mining with web usage mining to semi automate the process of ontology learning from semistructured web documents Background(1/2) Web content mining web content mining can take advantage of the HTML tags in the semi-structured web pages Web structure mining web structure mining is mostly interested in the hyperlinks of the web pages Background(2/2) Web usage mining web usage mining is the process of extracting knowledge from web user behaviour Methodology(1/3) Concept and conceptual relationship extraction through web content mining Concept Extraction: Weighted frequency of a word depends on the frequency of occurrence and the type of HTML tags which are used to encapsulate it 工具:Part Of Speech(POS) tagging /Natural Language Processing (NLP) Methodology(2/3) Conceptual Relationships Extraction: it produces a hierarchical tree structure, as the same way how web authors present their information Methodology(3/3) Conceptual relationships identification through web usage mining log file(IP address/date/time/method: GET or POST/file name/results) access logs 限制:1.One user could have several IP addresses even in the same session 2.Several users could have the same IP address due to the effect of network address translation Implementation Web Ontology Language (OWL) Picallo 2D graphics framework(Picallo 2D 圖形化框架) 版本: Piccolo2D.Java 、 Piccolo2D.NET、 PocketPiccolo2D.NET(在.NET Compact Framework) openNLP tool (自然語言處理工具) WUMprep Implementation Web Ontology Language (OWL) Picallo 2D graphics framework(Picallo 2D 圖形化框架) 版本: Piccolo2D.Java 、 Piccolo2D.NET、 PocketPiccolo2D.NET(在.NET Compact Framework) openNLP tool (自然語言處理工具) WUMprep Evaluation(1/2) precision and recall (used in information retrieval) or accuracy measure (used in machine learning) User evaluation Average and Novice based on the level of expertise in semantic web Likert scale(李克特量表):是一種心理反應量表 Evaluation(2/2) evaluation of the ontology against 'gold standard' ontology BBC wild life ontology (英國廣播公司野生動物本體) Thank You!