Web Intelligence (WI) Definition, Research Challenges and Major Tools Yang Chen UNC Charlotte Outline • • • • • • A brief history of Web Intelligence Motivations for WI Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion A Brief History of WI • 1999: Collaborative research initiatives – Ning Zhong, Data Mining and Knowledge Systems – Jiming Liu, Intelligent agents and multi-agents – Yiyu Yao, Information retrieval and intelligent information systems • Combined research efforts with common goal: create a new sub-discipline covering theories and techniques related to web information. A Brief History of WI • 2000: Publication of a two-page position paper on WI (Zhong, Liu, Yao, Ohsuga, COMPSAC 2000) A Brief History of WI • 2001: First Asia-Pacific Conference on Web Intelligence • 2002: Publication of first special issue on WI in IEEE Computer • 2002: Web Intelligence Consortium • 2003: First edited book on WI • 2005: The international WIC Institute Outline • • • • • • A brief history of Web Intelligence Motivations for WI Definition and Perspectives of WI Trends and Research Agenda Major Web Intelligence Tools Conclusion Motivation • The sheer size of Web – Difficulties in the storage, management, and efficient and effective retrieval • Complexity of Web – Heterogeneous collection of structured, unstructured, semi-structured, interrelated, and distributed Web documents – Consist texts, images and sounds Motivation Web Intelligence on the Web Industrial Interests in WI • Web Intelligence kis-lab.com/wi01/ • Web-Intelligence Home Page – www.web-intelligence.com/ • Intelligence on the Web – www.fas.org/irp/intelwww.html • WIN: home WEB INTELLIGENCE NETWORK, – smarter.net/ • CatchTheWeb - Web Research, Web Intelligence Collaboration www.catchtheweb.com/ • Infonoia: Web Intelligence In Your Hands – www.infonoia.com/myagent/en/baseframe.html Motivations • Data production on the Web is at an exponential growth rate. • A fast growing industrial interest in WI • Only a few academic papers • We need to narrow the gap between industry needs and academic research. Outline • • • • • • A brief history of Web Intelligence Motivations for WI Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion What is Web Intelligence • Web Intelligence (WI) exploits the fundamental and practical impact that advanced Information Technology (IT) and innovative Artificial Intelligence (AI) will have on the Web: – Integration of IT with AI – Applications of AI on the Web Web Intelligence System Based on Zhong`s AWIC03 keynote talk An Example Advanced Questions • How the customer enters VIP portal in order to target products and manage promotions and marketing campaigns? • What is the semantic association between the pages the customer visited? • Is the visitor familiar with the Web structure? Or is he or she a new user or a random one? • Is the visitor a Web robot or other users? • … Advanced WI System • Making a dynamic recommendation to a Web user based on the user profile and usage behavior; • Automatic modification of a website’s contents and organization; • Combining Web usage data with marketing data to give information about how visitors used a website. Advanced WI System Perspectives of WI • WI can be classified into four categories (based on Russel & Norvig`s scheme) Outline • • • • • • A brief history of Web Intelligence Motivations for WI Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion Research Agenda of WI • Semantic Web mining and automatic construction of ontologies • Social network intelligence The Semantic Web • The Semantic Web is based on languages that make more of the semantic content of the page available in machine-readable formats for agent-based computing. A “semantic” language that ties the information on a page to machine readable semantics (ontology). Components of Semantic Web • A unifying data model such as RDF. • Languages with defined semantics, built on RDF, such as OWL (DAML+OIL). • Ontologies of standardized terminology for marking up Web resources. • Tools that assist the generation and processing of semantic markup. Ontologies provides the semantic backbone for Semantic Web applications. Ontologies offer • Communication – Normative models, Networks of relationships • Sharing & Reuse – Specifications, Reliability • Control – Classification, and Finding, sharing, discovering relationships Categories of Ontologies • A domain-specific ontology describes a welldefined technical or business domain. • A task ontology might be either domain-specific or reconstructed from a set of domain-specific ontologies for meeting the requirement of a task. • A universal ontology describes knowledge at higher levels. Research Agenda of WI • Semantic Web mining and automatic construction of ontologies • Social network intelligence The Web as a Graph • We can view the Web as a directed social network that connects people (organizations or social entities). • Research Questions: • How big is the graph? (outdegree and indegree) • Can we browse from any page to any other? (clicks) • Can we exploit the structure of the Web? (searching and mining) • How to discover and manage the Web communities? • What does the Web graph reveal about social dynamics? Social Network Intelligence Social Network Outline • • • • • • A brief history of Web Intelligence Motivations for WI Definition and Perspectives of WI Trends and Research Agenda Major Web Intelligence Tools Conclusion Major Web Intelligence Tools • I. Collection – Offline Explorer – SpidersRUs (AI Lab) – Google Scholar • II. Analysis (Data and Text Mining) – Google APIs – Google Translation – GATE – Arizona Noun Phraser (AI Lab) – Self-Organizing Map, SOM (AI Lab) – Weka • III. Visualization – NetDraw – JUNG – Analyst’s Notebook and Starlight Collection: Offline Explorer Project list Project properties setup window Download URLs File filters, URL filters, and other advanced properties. Download level File modification check Analysis: Google APIs • Google provides many APIs to help you quickly develop your own applications. http://code.google.com/more/ • Examples of Google APIs: – Google API for Inlink: Discovers what pages link to your website. – Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. – Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages. – Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. – Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages. – YouTube Data API: Integrates online videos from YouTube into your applications. GATE • Information Extraction tasks: – Named Entity Recognition (NE) • Finds names, places, dates, etc. – Co-reference Resolution (CO) • Identifies identity relations between entities in texts. – Template Element Construction (TE) • Adds descriptive information to NE results (using CO). – Template Relation Construction (TR) • Finds relations between TE entities. – Scenario Template Production (ST) • Fits TE and TR results into specified event scenarios. • GATE also includes: – Parsers, stemmers, and Information Retrieval tools; – Tools for visualizing and manipulating ontology; and – Evaluation and benchmarking tools. GATE Attributes oject information Results display SOM • The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona. – Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions. SOM Topic Topic region Different Topics # of documents belonging to this topic Warm colors represent new topics. Visualization: JUNG • The Java Universal Network/Graph Framework (JUNG) is a software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It was developed by School of Information and Computer Science at the University of California, Irvine. http://jung.sourceforge.net/index.html • The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis: – Clustering – Decomposition – Optimization – Random Graph Generation – Statistical Analysis – Calculation of Network Distances and Flows and Importance Measures (Centrality, PageRank, HITS, etc.). JUNG Examples of visualization types Conclusion • The marriage of hypertext and internet leads to a revolution: the Web. • The marriage of Artificial Intelligence and Advanced Information Technology, on the platform of Web, will lead to another paradigm shift: the Intelligent and Wisdom Web. Thank You Any Question?