Web Intelligence (WI) - Department of Software and Information

advertisement
Web Intelligence (WI)
Definition, Research Challenges
and Major Tools
Yang Chen
UNC Charlotte
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
A Brief History of WI
• 1999: Collaborative research initiatives
– Ning Zhong, Data Mining and Knowledge Systems
– Jiming Liu, Intelligent agents and multi-agents
– Yiyu Yao, Information retrieval and intelligent
information systems
• Combined research efforts with common
goal: create a new sub-discipline covering
theories and techniques related to web
information.
A Brief History of WI
• 2000: Publication of a two-page position
paper on WI (Zhong, Liu, Yao, Ohsuga,
COMPSAC 2000)
A Brief History of WI
• 2001: First Asia-Pacific Conference on Web
Intelligence
• 2002: Publication of first special issue on WI in
IEEE Computer
• 2002: Web Intelligence Consortium
• 2003: First edited book on WI
• 2005: The international WIC Institute
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Trends and Research Agenda
Major Web Intelligence Tools
Conclusion
Motivation
• The sheer size of Web
– Difficulties in the storage, management, and
efficient and effective retrieval
• Complexity of Web
– Heterogeneous collection of structured,
unstructured, semi-structured, interrelated,
and distributed Web documents
– Consist texts, images and sounds
Motivation
Web Intelligence on the Web
Industrial Interests in WI
• Web Intelligence kis-lab.com/wi01/
• Web-Intelligence Home Page
– www.web-intelligence.com/
• Intelligence on the Web
– www.fas.org/irp/intelwww.html
• WIN: home WEB INTELLIGENCE NETWORK,
– smarter.net/
• CatchTheWeb - Web Research, Web Intelligence
Collaboration www.catchtheweb.com/
• Infonoia: Web Intelligence In Your Hands
– www.infonoia.com/myagent/en/baseframe.html
Motivations
• Data production on the Web is at an
exponential growth rate.
• A fast growing industrial interest in WI
• Only a few academic papers
• We need to narrow the gap between
industry needs and academic research.
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
What is Web Intelligence
• Web Intelligence (WI) exploits the fundamental
and practical impact that advanced Information
Technology (IT) and innovative Artificial
Intelligence (AI) will have on the Web:
– Integration of IT with AI
– Applications of AI on the Web
Web Intelligence System
Based on Zhong`s AWIC03
keynote talk
An Example
Advanced Questions
• How the customer enters VIP portal in
order to target products and manage
promotions and marketing campaigns?
• What is the semantic association between
the pages the customer visited?
• Is the visitor familiar with the Web
structure? Or is he or she a new user or a
random one?
• Is the visitor a Web robot or other users?
• …
Advanced WI System
• Making a dynamic recommendation to a
Web user based on the user profile and
usage behavior;
• Automatic modification of a website’s
contents and organization;
• Combining Web usage data with
marketing data to give information about
how visitors used a website.
Advanced WI System
Perspectives of WI
• WI can be classified into four categories
(based on Russel & Norvig`s scheme)
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
Research Agenda of WI
• Semantic Web mining and automatic
construction of ontologies
• Social network intelligence
The Semantic Web
• The Semantic Web is based on languages
that make more of the semantic content of
the page available in machine-readable
formats for agent-based computing.
A “semantic” language that ties the
information on a page to machine
readable semantics (ontology).
Components of Semantic Web
• A unifying data model such as RDF.
• Languages with defined semantics, built on
RDF, such as OWL (DAML+OIL).
• Ontologies of standardized terminology for
marking up Web resources.
• Tools that assist the generation and processing
of semantic markup.
Ontologies provides the semantic backbone for
Semantic Web applications.
Ontologies offer
• Communication
– Normative models, Networks of relationships
• Sharing & Reuse
– Specifications, Reliability
• Control
– Classification, and Finding, sharing,
discovering relationships
Categories of Ontologies
• A domain-specific ontology describes a welldefined technical or business domain.
• A task ontology might be either domain-specific
or reconstructed from a set of domain-specific
ontologies for meeting the requirement of a task.
• A universal ontology describes knowledge at
higher levels.
Research Agenda of WI
• Semantic Web mining and automatic
construction of ontologies
• Social network intelligence
The Web as a Graph
• We can view the Web as a directed social
network that connects people
(organizations or social entities).
• Research Questions:
• How big is the graph? (outdegree and indegree)
• Can we browse from any page to any other? (clicks)
• Can we exploit the structure of the Web? (searching and mining)
• How to discover and manage the Web communities?
• What does the Web graph reveal about social dynamics?
Social Network Intelligence
Social Network
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Trends and Research Agenda
Major Web Intelligence Tools
Conclusion
Major Web Intelligence Tools
•
I. Collection
– Offline Explorer
– SpidersRUs (AI Lab)
– Google Scholar
•
II. Analysis (Data and Text Mining)
– Google APIs
– Google Translation
– GATE
– Arizona Noun Phraser (AI Lab)
– Self-Organizing Map, SOM (AI Lab)
– Weka
•
III. Visualization
– NetDraw
– JUNG
– Analyst’s Notebook and Starlight
Collection: Offline Explorer
Project list
Project properties setup window
Download
URLs
File filters, URL filters,
and other advanced
properties.
Download
level
File modification
check
Analysis: Google APIs
•
Google provides many APIs to help you quickly develop your own applications.
http://code.google.com/more/
•
Examples of Google APIs:
– Google API for Inlink: Discovers what pages link to your website.
– Google Data APIs: Provide a simple, standard protocol for reading and writing
data on the Web. Several Google services provide a Google Data API, including
Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web
Albums.
– Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google
search box and display search results in your own Web pages.
– Google Analytics: Allows users gather, view, and analyze data about their
Website traffic. Users can see which content gets the most visits, average page
views and time on site for visits.
– Google Safe Browsing APIs: Allow client applications to check URLs against
Google's constantly-updated blacklists of suspected phishing and malware
pages.
– YouTube Data API: Integrates online videos from YouTube into your
applications.
GATE
• Information Extraction tasks:
– Named Entity Recognition (NE)
• Finds names, places, dates, etc.
– Co-reference Resolution (CO)
• Identifies identity relations between entities in texts.
– Template Element Construction (TE)
• Adds descriptive information to NE results (using CO).
– Template Relation Construction (TR)
• Finds relations between TE entities.
– Scenario Template Production (ST)
• Fits TE and TR results into specified event scenarios.
• GATE also includes:
– Parsers, stemmers, and Information Retrieval tools;
– Tools for visualizing and manipulating ontology; and
– Evaluation and benchmarking tools.
GATE
Attributes
oject information
Results display
SOM
• The multi-level self-organizing map neural network
algorithm was developed by Artificial Intelligence Lab at
the University of Arizona.
– Using a 2D map display, similar topics are positioned
closer according to their co-occurrence patterns;
more important topics occupy larger regions.
SOM
Topic
Topic
region
Different
Topics
# of
documents
belonging to
this topic
Warm colors
represent
new topics.
Visualization: JUNG
• The Java Universal Network/Graph Framework (JUNG) is a
software library for the modeling, analysis, and visualization of data
that can be represented as a graph or network. It was developed by
School of Information and Computer Science at the University of
California, Irvine.
http://jung.sourceforge.net/index.html
• The current distribution of JUNG includes implementations of a
number of algorithms from graph theory, data mining, and social
network analysis:
– Clustering
– Decomposition
– Optimization
– Random Graph Generation
– Statistical Analysis
– Calculation of Network Distances and Flows and Importance
Measures (Centrality, PageRank, HITS, etc.).
JUNG
Examples of visualization types
Conclusion
• The marriage of hypertext and internet
leads to a revolution: the Web.
• The marriage of Artificial Intelligence and
Advanced Information Technology, on the
platform of Web, will lead to another
paradigm shift: the Intelligent and Wisdom
Web.
Thank You
Any Question?
Download