From: AAAI Technical Report WS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Opportunity Explorer: Navigating Large Databases Using Knowledge Discovery Templates Tej Anand& Gary Kahn A. C. Nielsen 2201 Waukegan Road, Suite S-200 Bannockburn1L 60015 tanand@nis.naitc.comgkahn@nis.naitc.com Introduction In July 1991A. C. Nielsenreleased NielsenSpotlightTMthe first product with an embedded knowledge-based systemfor explainingthe data in large point-of-saledatabases.Spotlight[1, 2] is a softwaresystemthat extracts fromvery large databasesmeaningfulinformationabout exceptionalevents, explanationsfor these eventsand correlationsamongevents. It then presentsthe user with a series of formattedhard-copyreports. To generatethese reports the systemuses a pre-definedknowledge base, a contextset up by the user anda predefinedformatfor the reports. Spotlight reports highlight productsandmarketswith exceptionalchangein share, reasonsfor the share change,and trends across various productsegments.Thesereports providethe user with a comprehensive answerto the question "Whatis happeningin the market=placeand why?" Spotlightfilled a needin the marketplace for an efficient, batch, top-line report generationsystem.Earlier attemptsto addressthis needby systemssuchas CoverStory[3] werenot as widelyaccepted.Spotlight was innovativein its use of an expert systemshell whichallowedfor deployment of knowledge bases on end-user computers,a distributed approachto knowledge discoveryandthe notion of an abstract markuplanguage[2]. Spotlight has beensuccessfully used by sales representatives and brandmanagersof majorconsumerpackaged goodsmanufacturers to obtain a top-level look at the state of their business. Alarge number of organizations haverolled the productout to their entire sales force whouse the systemto preparethemselvesbeforecalling on buyersin retail organizations.Smallerorganizationstend to use Spotlightas a headquarterst0ol. Sales analysts preparereports usingSpotlightandthensendthemout to sales representativesin the field. Therewerethree technical limitations with the Spotlightproduct: discoveryanddid not put in place tools or an 1. It representedan instanceof an applicationfor knowledge architecture that couldbe usedto developother knowledge discoveryapplications. 2. Thereports created for the user werenot interactive. Thusthe user hadto manuallynavigatethroughthe informationcreated, andthe systemdid not lead the user towardsseekingeither related or moregranular information. 3. All the parametersthat couldbe usedto customizethe reports hadto be known a priori. For example,it wasnot possiblefor the user to create a completelynewtable. ~, Oneof the reasonsfor the successof Spotlight wasthat it usedsimpletracking measuresuchas "Volume "PriceN, and"Distribution"to explainin a languageeasily understoodby the sales representativesthe reasons for changein share. However, the use of simpletracking measuresalonewasalso a limitation becauseit preventedthe user frommoresophisticatedanalysessuchas identifyingselling strategies. TM product that wasreleased to the market-placein In this paper wedescribe the NielsenOpportunityExplorer March,1993.OpportunityExplorergenerates interactive reports using knowledge discoverytemplates, convertinga large space of data into concise, inter-linked informationframes.Eachinformationframe addressesspecific businessissues, and leads the user to seek related informationby meansof dynamically created hypedinks. AAAI-93 KnowledgeDiscovery in Databases Workshop1993 ~,, Page 45 In terms of domainknowledge OpportunityExplorerin addition to the tracking measuresused in Spotlight, also makesuse of several modeledmeasuresthat help the user quantify the impactof promotionsand thus identify selling strategies. OpportunityExplorerinformationframesprovidethe user with a comprehensive answerto the question "Whatis happeningin the market-place,why,and howthe share of a product can be changed?" Opportunity Explorer hasledtothearticulation ofa coherent architecture fordeveloping knowledge discovery applications with a large number ofreusable libraries with standardized functionalities andsimple wellunderstood application programming interfaces (APIs). Opportunity Explorer doesnotallow theusertocreate completely newreports butithasmade substantial progress towards thatgoal bycreating high-level output objects andabstract queries tospecify information frames. A developer defines a knowledge discovery template byspecifying instances ofthese output objects andabstract queries. Thishasmade itrelatively easy fordevelopers withprogramming experience todefine newinstances ofknowledge discovery applications. Inthispaper wefirst describe thenotion ofinformation frames andcomponents ofthearchitecture for delivering information frames totheuser. Wethendescribe themechanics ofgenerating information frames fromknowledge discovery templates, Thepaper endswitha discussion ofworkinprogress. Information Frames Withthe availability of graphicaltools andclient-server databasesit has become possiblefor users to browse and querydata fromlarge databasesirrespective of wherethey are located, either on remotedata servers, LAHs or local workstations.However,the user needsto havein-depth knowledge of the semanticcontent of the databaseto relate th~ data to the task that is to be completed.Aninformationframe,on the other hand, providesa user withdirect accessto the semanticcontentsof the databaserelating the data to the users’ task. Theinformationframeis a concisepackagingof analyzeddata that directly providesan answerto a specific businessissue. In this sense, a knowledge discoveryapplicationhas the followingproperties: it is a series of informationframeseachaddressinga specific businessissue it consists of navigationaltools that link the informationframesin the mannerthat representsa model or workflowof the task the user is involved in T it allowsthe user to performstandardoperationson an informationframewhichare independentof the informationcontent Aninformationframeis an abstract conceptconsisting of a numberof textual, tabular, andgraphicalviews. Thestandard operationson an informationframe include the following: 1. Displayan alternate view,i.e. displayeither the tabular or graphicalviewsfroman active textual view 2. Print the current active view 3. Createor viewrelated informationframesusing hyperlinks TheOpportunityExplorerapplicationconsists of informationframesthat help the sales representativeof a consumerpackagedgoodscompany sell moreproduct, moreprofitably to the retailer. This is accomplished by preparinga presentationthat highlightsthe advantages for the retailer ffthe retailer stocksadditionalproducts or participates in additionalpromotions. Asales representativesells to the retailer productsbelongingto variousproductcategoriessuchas Coffee, CarbonatedBeverages,Detergents,etc. Asa first step, he needsto determinehowhis productsare performing in each product category. The"Cross CategoryAnalysis"informationframein OpportunityExplorerhighlights for the sales representativethe contributioneachcategoryis makingtowardsthe total businessin a particular retailer. Page 46 KnowledgeDiscovery in Databases Workshop1993 AAAI-93 Nielsen Advisor OpportunityExplorer - [CategoryRoadMap:text1] Standart Operafi( ca~ge~ ~,d ~ Cembimd Or~’q~eJuice - Florida Morni~- Schnuck’s- St. Louls Dollars 13 Weeks ~ September 5,1992 ¯ ~O.XJ~.~ share inSchnuck’s - St.Louis was3.7,down2.6points vs.lastyear. Florida Morning share inSt.Louis $2MMalso decreased, down2.3points to5.I. Florida Morningvolumein Selmuck’s- St. LOUIswas$76,084, down44.6%vs. last year. Combined" OrangeJuice declined also, down6.5%to $2,039.7M. ¯ ~ ~ Vo/wNOtaggd EzFlaaah’oa: Incremmtal Volume was down $56,971 to $15,181 in the sdected period. TI~ decrease contributed 93.0%to the total Florida Morningvolumeloss. H~erli lacr~ Volur~ ~pim~’lon: [ Total M~r~mdismg Supportfor Florida Morningdecreased(-3.2 weeksto 5.0). 6 In addition, "quality" (nan-TPR)support also decreased(-0.3 weeksto 0.O). Florida Momiag D~JPrice was$1.78 per unit. This correspondsto a 36.4%promotionalprice discount fromits everydayprice. This price discount wassmaller than a year ago whenFlorida Morningreceived a 45.2%promotionalprice discount ($1.28 deal price). ¯ Sunshine V¢li~ ~ained themostshare, up5.5points to33.2. Itsvolume increased 11.8% to$677. I MVolm~,t~taugJ F.xplautloa: SunshineVaUeyvolumeincrease wasprimarily due to an increase in incremental volumeof $88,225to $322.0 M .q v~- ¯.. Figure 1: The Category RoadmapInformation Frame l After examiningthis informationframethe sales representativecan decideto pursuefurther analysis in a particular category, for examplea categoryin whichthe marketshare of his productis declining.Thesales representativewouldlike to knowwhatare the factors that are leadingto this decreasein share. This informationis presentedto the user in the "CategoryRoadmap" informationframe. This informationframe providesthe sales representativewith a detailed look at the performance of his productsandhis competitors includingexplanationsin termsof merchandising, pricing &distribution factors. Eachfactor that constitutes an explanation,contains a hyperlinkto an informationframethat providesfurther details aboutthat factor. The CategoryRoadmap also contains hyperlinks wherebythe user can "drill down"to seek moregranular informationabouta product. For example,a brandmightconsist of three sizes, instead of lookingat aggregated data for a brandthe user mightwantto look at data for the individualsizes that constitute that brand.Taken together these frameshelp the user in understandingwhatis happeningin the marketplace. This understandinghelps the sales representativeseek framesthat recommend specific actions the user could suggestto the retailer in terms of alternative merchandising promotionsor in terms of distributing new products. For example,the systemmightdeterminethat if the retailer whereto take a weekof promotionaway fromproduct"A"and instead promoteproduct"B" (whichis the productthe sales representativeis trying to sell) additionalprofits couldaccrueto the retailer andthe sales representativewouldincreasethe shareof his product. Byprinting selected viewsthe sales representativecan create a presentationfor the retailer. In this mannerthe informationframeswithin OpportunityExplorermapthe data in large point of sale databasesto the task a sales representativehas. AAAI-98 KnowledgeDiscovery in Databases Workshop1998 Page 47 FigureI depicts the text viewof a CategoryRoadmap informationframefor a fictitious product&its competitorsin the Combined OrangeJuice product category. Generating Information Frames Figure 2 providesa conceptualframework for the developmentof knowledgediscoveryapplications such as OpportunityExplorer. Thereare three majorcomponents of this architecture: This component provides the application access to data in a 1. DatabaseAccessMethodology: databaseindependentmanner,i.e. it hides fromthe applicationthe location of the database(remote server, LAN or local workstation)as well as the type andstructure of the database.Theapplication shoulduse the data access methodology to take advantageof anyinherent structure in the database, for exampleproductandmarkethierarchies, queryoptimization,and built-in functionsfor ranking, sorting, aggregating, and performingmathematicalcomputations. TM[4], a product developedat Nielsen for ad OpportunityExplorerused the Nielsen Workstation hocdata retrieval. TheNielsenWorkstationis used as the databaseaccess methodology across all current andplannedknowledge discoveryapplicationsat Nielsen. In addition to providingthe features of a databaseaccess methodology mentionedabove,the Nielsen Workstationautomatically distributes the computation with the remoteserver, if the remoteserver supportslocal computations, it also allowsfor the applicationto querydata frommultipledatabases. Knowledge-Based Module AbstractQueryProcessor ~ Expert System Analyses Nielsen Workstation Knowledge Discovery T Template Information FrameGenerator ~___ User Interface Selection Database Access Methodology P--~~ntation l Navigation Figure2: Architecture for generating information frames 2. Page 48 TheUser Interface: This component consists of 3 modules.The Selection modulehelps the user in defining a context for the analysis. Theuser defines domaindependentterms that will guide the generation of informationframes. For example,in order to generate the CategoryRoadmap shown in Figure1, the user needsto define domaindependentterms suchas the target product(the productthat the sales representativewantsto analyze),the target market(the retailer in whichthe KnowledgeDiscoveey in Databases Workshop1993 AAAI-93 sales representativeis interested), the share-basts(definition of the productcategory),competitors (a methodfor selecting the productsthat are consideredcompetitorsof the target product),etc. Thepresentationmoduleconsists of a set of viewersfor text, tables andgraphs. Theseviewers supportvariablefonts, specializedscrolling, andall the standardoperations[see Figure1]. Thenavigationmoduleprovidesa user the capability to viewor create informationframesthat are related to the active informationframe.This can be doneeither by meansof hyperlinkson the active informationframeor by meansof a list maintainedby the systemof the informationframes that havebeencreated, Thenavigationmoduleinterprets the hyperlinksthat are embedded in the contentsof the frame. Theuser-interfacehas beendevelopedas a set of genericclass libraries that can be re-usedby other applications such as OpportunityExplorer. . Knowledge-Based Module:This componenthas the functionality for creating informationframes, basedon the context suppliedby the user (fromthe selection module),and the informationframe definition providedby the knowledgediscoverytemplate. Theknowledgediscoverytemplate consists of three sections: abstract data queries, analysesto be performed, andspecificationsof outputobjects to be displayedin the informationframe.Theabstract data queriesin conjunction with the user selections drive the AbstractQueryprocessorto extract data neededfor generatingthe informationframe. This moduletypically has applicationspecific functionality encodedin it and has to be developedfor everyapplication. Theinformationframe generator has generic knowledgeabout the processingof conditional objects, suchas bullets, paragraphs,tables andgraphs.Werefer to these objects as conditional objects, becausethe systemchecksfor pre.-conditionssuchas the existenceof certain typesof data, or the existenceof certain types of analytic results beforegeneratingthe object. Conditionscan be attached to output objects in the knowledge discoverytemplate. This moduleincludes a library of output objects, conditions, and a vocabularywhichcan be extendedfor other applications. Knowledge Discovery Templates Aknowledge discoverytemplateSpecifies the informationcontent that will be present in the informationframe. Togenerate this informationthe systemneedsto knowwhatdata is available, whatanalyses can be performed or needsto be performed,and the format for presentingthe information.Theknowledge discoverytemplateis a text file createdby the developer.It is loadedat runtimeandinterpreted by the rule sets withinthe knowledgebasedmodule.There are three sections within a knowledge discoverytemplate. Figure 3 showshowa knowledge discoverytemplatecan be used to generate informationframes. Rulesinterpret keywordsin queries to extract data fromthe database.Theseare asserted as facts, the analysis rules forward-chain on these facts to create analytic objects whichare applicationspecific. Akeywordbasedvocabularyis used within the knowledge discoverytemplatethat specifies howthe data objects andanalytic objects are to be used within bullets, paragraphs,tables andgraphs. Abstract Queries Theseare instances of queries that use domainconceptsto extract data. For examplefor the Category Roadmap informationframeshownin Figure 1, weneedto extract causal data relating to changein share for all competitors.This constitutes an abstract query.Thesystemwill use the definition of competitorsandcausal as specifiedby the user to extract the data. For example,competitorscouldbe definedas productsthat makeup 80%of the total productcategoryvolumein the market.Similarly, causal couldbe definedas in-store trade promotion activities, distribution, and pricing. Thusan abstract querysuchas "fetch causal data for all competitorsin the target market"will be containedin the knowledgediscovery templatethat specifies the CategoryRoadmap informationframe. Whenthe user requeststhis informationframethe data will be automaticallyretrieved by the system. AAAI-93 KnowledgeDiscovery in Databases Workshop1993" Page 49 Types of Analyses Theknowledge discoverytemplategives the user the ability to specifytypes of analysesto be performed.For example,OpportunityExplorer includes a causal modelfor changein volumebased on trade promotions,pricing anddistribution factors. This modelis interpreted usingrules, but neednot be used in all informationframes.Theknowledge discoverytemplatespecifies whetherthis analysis should be used or not. For the CategoryRoadmap shownin Figure I the knowledge discoverytemplate will specify a causal modelin termsof cause& effect relationships, for examplewhendistribution increasesthe volumeof the productincreases. After the data has beenretrieved by the systemthe rules interpret the causal relationships and determineexplanationsfor the changein volume.Opportunity Exploreralso includes rules for determiningopportunitiesfor a productin a market,and determinethe leading and decliningproductsfor various measures. Database Access Queries Analysis Rules ueries /. Knowledge Discovery Template Generation Rules Information Frames Figure 3: Using KnowledgeDiscovery Templates OutputObjects Withina knowledge discoverytemplateoutput objects suchas bullets, tables andgraphsare specified depending on the nature of the informationthe frameis trying to present. Abullet is composed of sentences,a sentenceis composed of phrasesand phrasesare madeup individual words.Similarly, a table is madeup of rows, rows are madeup of columns,and columnsare madeup of individual words. Graphsconsist of data groups,data groupsconsist of elementsand elementsconsist of words.Columns haveadditional attributes suchas width, justification, etc. Graphshavegraphtypes suchas bar, pie, etc. associatedwith them.Depending Page 50 KnowledgeDiscovery in Databases Workshop1993 AAAI-93 on the graphtype the viewerinterprets the data groups.Outputobjects are conditional,i.e. depending on the result of a specificanalysisa different graphtypecouldbe selected. For examplethe knowledgediscoverytemplatefor the CategoryRoadmap shownin Figure I will contain the specificationof a bullet for the target productvolumechangeexplanations.This bullet will only be generatedif the systemhas beenable to find an explanationfor the changein volumeof the target product.This bullet will be specifiedin termsof sentencesthat describevariousfactors that mightconstitute an explanation. Discussion This paper represents a statementof workin progress. WhileSpotlightwasthe first instance of ideas concerningknowledge discovery,OpportunityExploreris the first realization of ideas concerningan architecture for knowledge discovery.This is an evolvingarchitecture in that it will growto incorporatenew developments in the areas of object-oriented design, databasemanagement systems,and multimedia.It also needsto incorporateanalysis modelsthat are basedon statistical regression,andothers that reasonfromcases. Theanalysis modelsthat are includeddependexclusivelyon the task an applicationis assisting with. OpportunityExplorerwasreleased to the marketplace in March,1993andearly indications are that it will meet with tremendousacceptancefromthe customer-base. Acknowledgments Aproject with a scopeas wideas OpportunityExplorercould never havebeencompletedwithoutthe creative participation of all members of the joint marketingand development team. Wewouldlike to specifically acknowledge the tremendouscontributions of MarkAhrens, GlennHicks, TonyRochon,Paul Terry, Patrick A~Arelloand Joe Cannonin the conceptionand development of all the ideas embodiedin this paper. References A Data-ExplanationSystem. In Proceedingsof the Eighth 1. Anand,T. and Kahn,GI 1992. SPOTLIGHT: IEEEConferenceon AI for Applications, 2-8. Washington,D.C.: IEEEComputerSociety. 2. Anand,T. and Kalm,G. 1992. MakingSense ofGigabytes: A Systemfor Knowledge-Based Market Analysis.In InnovativeApplicationsof Artificial Intelligence4, 57-69.MenloPark, Calif." AAAI Press. 3. Schmitz,J., Armstrong,G. and Little, J. D. C. 1990. CoverStory- Automated NewsFindingin Marketing. In DSSTransactions,LindaVolino(Ed.), 46-54. Providence,R+I." TheInstitute of Management Sciences 4. NielsenInf~act WorkstationReferenceGuide.1993.A. C. Nielsen, Northbrook,Illinois. AAAI-93 KnowledgeDiscovery in Databases Workshop1993 Page 51