Knowledge Management Systems • Knowledge Discovery in Databases • Information Retrieval • Formal methods to discover information & possibly knowledge. - Data collection • Documents • Usage - Data analysis • Relationships • IR measures KDD Process • Goal: extracting actionable knowledge from data - Understandable patterns - Rules • Updated methods to extend beyond statistical analysis - Volumes of data collection - Increased computation power • Real-time • Continuous data - Advances in visualization KDD in Use • Data Mining is only one step - Preprocessing Data Transformation Pattern Detection Interpretation Use • Most development work is in the preprocessing • Most intellectual work should be in forming hypotheses KDD Practices • • • • • • • Classification Regression Clustering Summarization Dependency Modeling Link analysis Sequence analysis IR & the Semantic Web • Rich description of documents enables additional functionality - Darpa Agent Markup Language - Ontology Interface Layer • Is this “semantic markup” derived from tacit or explicit knowledge? - How can it be generated? - How can it be used? • Information Retrieval • Question answering (simple & complex) • Faith in XML Semantic IR • How systems should work • Events ontology • Coordination among individuals - Groups? - Interdependencies? • Processing for Hybrid IR? - Trust in ML - Trust in System Navigating Social Cyberspaces • Understanding Usenet use - Postings • Why • How • Information - Distribution • Cross postings • Specific groups & cultures - Free-riders vs. Contributors - Usenet readers Social Cyberspace Dimensions • Netscan – social accounting metrics - Size of group - Culture - Social cues • Messaging protocols - Asynchronous - Real time (IM) • Discussion Engagement - Frequency, Replies - Date, Time • Thread and Author Tracker - Thread Visualization - New Threads vs. Replying to Old Blogs & Social Dimensions • Are blogs taking the place of newsgroups? • RSS Readers • Topic discovery methods - Blog rolls - Search engines - Links • Issues of Awareness • Posting technologies s. Usenet Answer Garden • • • • • A shared organizational memory system Storing, retrieving and viewing information What methods worked best? What about user paricipation? What’s an optimal size? PeopleGarden • Another view of participation • How does the community work? - Welcoming - Volumes of dicsussion • Groups found and formed - Paired relationships - Arguments and issue development • Visualizing interaction - Personal history - Groups and Threads Problems in Data Warehousing • How about problems in understanding users? • Technical issues are easier than social issues - Privacy - Accuracy