Large-Scale Case-Based Reasoning: Opportunity and Questions David Leake School of Informatics and Computing Indiana University Overview • • • • Intro to case-based reasoning Appeal of CBR for large scale data Some challenges Questions for the audience What is CBR? • Reasoning by remembering (and analogizing and adapting…) • Common in human planning, programming, problem-solving, diagnosis, decision-making The CBR Cycle From Leake, Maguitman, and Reichherzer, 2005 Motivations for Using CBR (Kolodner 1993; Aamodt & Plaza 1994; Leake, 1996) • Easing knowledge acquisition, especially when cases are already available • Reasoning when causal connections are complex or poorly understood • Speedup from reuse • Explainability CBR as AI Technology • Classic applications include force deployment planning, diagnosis, design support, help desks,… • IU eScience example: The Phale system (Leake & Kendall-Morwick, 2008, 2009) supports workflow construction with case-based reuse of lessons from provenance traces collected by the Karma provenance collection tool (http://d2i.indiana.edu/provenance_karma; project directed by Beth Plale). Large-Scale Challenge for Phala • Phala’s case retrieval depends on fast structure mapping • Structure mapping toolkit has been developed and publicly released (Structure Access Interface, Kendall-Morwick & Leake, 2011) • Fast structure mapping remains a key issue, especially for process-oriented case-based reasoning • Taking a step back, how does CBR fit domains with large collections of data? The Core of CBR: Reasoning Directly from the Data (First approximation) • • • • • Cases are specific episodes Lazy learning: Learning is storage Don’t extract rules: Reason from similar cases Don’t generalize cases Each problem-solving episode adds a case Large-Scale CBR • Most CBR systems are comparatively small scale • Questions for today: – What are the large-scale applications which might most benefit from CBR? – What would issues would need to be addressed to apply it? Reasoning Directly from the Data (Second Approximation, fleshing out core issues) • Cases are specific episodes (not necessarily predelineated; could be very large) • Lazy learning: Learning is storage (+ indexing) • Don’t extract rules: Reason from similar cases (how to find them? How to extract indices/similarity criteria? How to integrate reasoning?) • Don’t generalize cases (adaptation) • Each problem-solving episode adds a case (scale issues, maintenance, and case base sharing may be needed) Scale-Up as Opportunity: Example of Potential for Big Data to Ease Case Adaptation (Jalali & Leake, 2013) • Problem: How to gather/generate the knowledge to adapt prior cases to new needs • For numerical prediction, adaptations can be generated by comparing case differences Case Difference Heuristic [ Hanney & Keane, 1997] • A knowledge-light method for adaptation acquisition • Adaptations are generated by pairwise case comparison Extending Case Adaptation with Automatically-Generated Ensembles of Adaptation Rules Vahid Jalali and David Leake Approaches to Instance-Based Adaptation Generation and Application • Generation: Selecting cases from which generate adaptations • Application: Selecting source cases to adapt Extending Case Adaptation with Automatically-Generated Ensembles of Adaptation Rules Vahid Jalali and David Leake Questions to Discuss • For what large-scale tasks CBR could provide an edge? • What are opportunities for facilitating computations underlying large-scale CBR?