From: AAAI Technical Report WS-94-03. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Homogeneousdiscoveries contain no surprises: Inferring. risk-profiles from large databases ArnoSiebes (arno@cwi.nl) cw! P.O. Box 94079, 1090 GB Amsterdam, The Netkerlandm Abstract - Manymode/sof reality are probabilistic. For example,not everyoneorders crisps with the/r beer, but a certain percentagedoes. Inferring suchprobabilistic knowledge from databases is one of the majorchallenges for data mining. RecentlyAgrawaiet ai. [1] investigated a class of such problems.In this papera newclass of suchproblemsis investigated, viz., inferring risk-profiles. The prototypical exampleof this class is: "whatis the probability that a given policy-holder will file a da/mw/th the insurance companyin the next year". A risk-profile is then a description of a groupofinsurants that havethe sameprobability for filing a claim. It is shownin this paper that homogeneous descriptions are the most plausible risk-profiles. Moreover, under modest assumpt/ons it is shownthat covers of such homogeneousdescriptions are essentially unique. A direct consequenceof this result is that it suff/ces to search for the homogeneous description with the highest associatedprobability. The mainresult of this paper is thus that we showthat the inference problemfor risk-profiles reducesto the well studied problemof maximisinga quality function. Ke~wo~s~ Phrases: DataMining, Probabilistic Knowledge, Probabilistic Search, ProbabilityTheory 1. INTRODUCTION Manymodels ofreality areprobabilistic rather thandeterministic, either bylackof current understanding orbynature. Forexample, itisunrealistic toexpect a model thatwillpredict accurately whether or nota customer willorder crisps withhisbeeror not.Itis,however, verywellpossible to havea model thatyields theprobability thata customer orders crisps withhisbeer. Infact, Agrawal etal.haverecently shown thatsucha model canbe inferred efficiently from a database [i]. Inthispaper we introduce a newclass of suchmodels, called r/sk-profiles andshowhowto infer themfroma database. Theproto-typical example of thisclass of problems is derived fromtheinsurance business. Take,e.g.,theinsurance company Saveor Sort’# (SOS), thathandles carinsurances. stayin business, SOShasto satisfy twoalmost contradictory requirements. First, thetotal ofpremiums received ina given yearshould beatleast ashighasthetotal ofcosts caused by KDD-94 AAAI.94 Workshop on Knowledge Discovery in Databases Page 97 claims, i.e., the higher the premiumsthe better. Secondly, the premiumsit charges should not be higher than those of the competitors, i.e., the lower the premiumsthe better. Hence, SOSshould be able to predict the total claims of an insurant in a given year as accurately as possible. Clearly, predicting this amount with a 100%accuracy for each and every client is impossible. Whatinsurance companiesdo, is identifying groups of clients with thesamee~eeted claimamount peryear.Suchgroups aredescribed by so-called risk-profiles. In otherwords, by whatis called a (set-)description in datamining. Clearly, thesharper thesedescriptions are,themorecompetitive SOScanbe.Sinceinsurancecompanies havelargedatabases withinsurance andclient data,inferring risk-profiles o~ersa majorandprofitable challenge to datamining. Another example of theuseof risk-profiles is in theanalysis of medical data.Bothdoctors andpatients wouldliketo knowthechance thata patient diesgiventhesymptoms shehas. It is reassuring to knowthatsayonly0.1%of thepatients withfiuwilldie,butperhaps less so if oneknowsthatthemortality rateforthosewithboththefiuanda heart-condition is muchhigher. In thispaperwe studya simpleriskinference problem, viz.,we onlytryto inferthe probability thata client willfilea claimin a givenyear.Morecomplicated casesaresimple extensions of thesolution proposed in thispaper. Thissolution is reached as follows. Afterrecapitulating somebasicprobability theory, Section 2 presents theformal statement of thisproblem in termsof (set-)descriptions. This formal problem is analysed in thethirdsection, werewe indicate thatgooddescriptions are homogeneous. Roughly, a description is called homogeneous ifallextensions of thisdescription yieldnothing surprising. Thenotions of homogeneity andsurprises areformalised in Section 4. In Section 5, it is shownthatcertain maximal homogeneous descriptions arein a certain sen.’se unique. Thisuniqueness result indicates thatthesemaximal descriptions areplausible solutions forourproblem. Thenextsection Showsthatfinding thesemaximal descriptions is an instantiation of thewell-studied problem: maximise thisfunction. It is thenargued thatdueto thenatureof ourproblem, probabilistic maximisation heuristics arethemost adequate. In thefinalsection, theconclusions arestated together witha briefdescription of current research. Moreover, a briefcomparison withrelated research is given. Note,thispaperis onlyan extended abstract. Alldefinitions, results andproof-sketches aregivenin therunning text. 2. PRELIMINARIES In thefirstsubsection we recall somebasicfactsfromprobability theory. Fora moredetailed treatment, thereaderis referred to textbooks suchas [3].Thesecond subsection presents the formalstatement of the problem. In the thirdandfinalsubsection somenotational conventions areintroduced. ~.IBernouillz/ ezperiments To determine therisk-profiles forSOS,it is assumed thatitsclients undertake a Bernouill~/ or 0/Iexperiment eachyear.Recall thata Bernouilly experiment is an experiment withtwo possible outcomes, denotedby 0 and1. Theoutcome1 is oftencalleda success. In our example, a client succeeds in a trialof theexperiment ifhe filesa claim in thatyear. Page 98 AAAI.94 Workshop on Knowledge Discovery in Databases KDD-94 For a Bernouilly experiment, it is assumed that there is a constant probability, say p, of success for each trial. The probability that we get exactly k success in n trials is then given by the binomial distribution: nP(#Sfkln, ) P) ffi( k pk(1 _ p)(.-k) If n trials result in k successes, the best estimation of p is given by k/n. However,it is conceivable that p differs from k/n. The (100 - if)% confidence interval for p, given n and k is the largest interval Cl(n, k,6) _ [0,1] such that P(p CI(n, k, 6)) <_6%,i.e. , the probability that p ¢ CI(n, k, 6) is smaller then if%. Cl(n, k, 6) can be computed using the binomial distribution. bounds: P(lk/n - P[ > ~) - ~ n)k p~(1 - p)(n-k) k: Ik/n-pl>c <2e In fact, using the Chernoff -dn/4pO-p) one sees that with c ffi ~ [k/n - c, k/n + c] is an easily computed conservative approximation of Cl (n, k, ~). ~.~ Descriptions: the problem If a client is insured for a long time with SOS,and if it is reasonable to assumethat the client’s probability of success has not changed over the years, the confidence intervals of the previous subsection can be used to determine the chance of success for each client individually. Alas, these assumptions are not very often met. If only because, e.g., changes in traffic density ensure that the probability of success for each client will vary over the years. With only one or two trials per client, the method sketched above will give [0, 1] as the 95%confidence interval for our individual clients, which is pretty useless. The assumption wemakeis that there are a few groups of clients, such that clients in the same group have the same probability of success. That is, we assume that there is a cover {G1,..., Gz) of disjoint subsets of SOS’sclients, such that: Veil, el2 : [p(cla) = p(cl2)] *’* [3!i E (1,..., l) : [ell E Gi Ael2 E where p(cli) denotes the probability of success of client cli. The assumption that l is small is madeto ensure that each group has enough clients to allow for an accurate estimation of the associated probability of success. The second assumption is that membership of one of the Gi depends on only a few properties of the client. That is, we assume: 1. a set of attributes ~4 = {A1,..., An) with domains D1,..., Dn, where dom(Ai) ffi Di; 2. a function p : D1 x ... x Dn "* [0,1]; such that if Ai(t, el) denotes the Ai-value client el has at time t, p(Al(t, cl),... denotes c/’s probability of success at time t. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases ,An(t, Page 99 ForSOS,we canthinkof theattributes age,gender, andmilesperyearwiththeirobvious domains. Usingthesecondassumption, we canrephrase thefirstas saying, we assumethatthere exists a cover, C- {CI,..., Cl},of disjoint subsets of DI x ...x Dn suchthat: vvl,v~e DIx ... x D,:[p(vl)=p(v~)] .- [Vie {i,...,l} : [vl¯ C~.~ v2¯ The chance of success that all the v E Ci share is called the associated probability of Ci and is denoted by Pc~. The problem discussed in this paper can nowbe stated as: ’Trod the (7/and their associated probability Pc,". In dat_umining terminology this means that we try to find a (set-)description for ear3 the Ci. That is, find logical formulae ¢i such that a value v ¯ DI x ... x D, satisfies ¢i, denoted by ¢i(v), iff v ¯ C~. The formulae ~b are expressions in some description language over .4. Whichdescription language @is chosen is unimportant for this paper as long as one can expect that the Ci can be described by @and the following four assumptions are met: I. ¯ is closed under negation, i.e., ~ ¯ @---, -,¢ ¯ @ 2. @is closed under conjunctions, i.e., ¢i,~b2 ¯ @-*Ca A ¢b2 ¯ @; 3. Implication is decidable for @. 4. ¯ is sparse with regard to P(DI x ... x Dn). That is, the number of subsets described by @is small comparedto the numberof all possible subsets. The necessity of these requirements will becomeclear in the next sections. For SOS, an example of such a sparse description language is given by the disjunctive and conjunctive closure of the following set of elementary descriptions: 1. gender = female, gender = male; 2. age - young, age -- middle aged, age - old; 3. miles per year --" low, miles per year = average, miles per year = high. Of course, themostimportant assumption on @ is thattheCi canbe described in @. Hence, choosing an appropriate @ is an important taskin deriving therisk-profiles. To Sateourproblem in termsof descriptions, define a set{~i,..., Ck}of descriptions to be a disjunctive cover, abbreviated to discovery, if: 1. ¥i,j ¯ {1,...,k}:i ~j.--* [¢iACk"-*-L] 2. Vi=1 k ¢i ¯ @ and[V~=a¢~i] "" T Notethatif@contains 2, we assume thatdiscoveries do notcontain 2..Theset{gender -- female, gender ffimale)is a discovery forSOS. Theproblem canthenbe restated as:finda discovery {¢I,..., Ck}suchthat Vvl,v2¯ D1x ... x D,:[p(vl)=p(v2)]~-[Vi¯ {1,...,k}: [~(v~)~. Cdv2)]]. Page 100 AAAI.94 Workshop on Knowledge Discovery in Databases KDD-94 ~.3 Notational conventions Thedatabase is seen as a recording of trials of our experiment. Hence,it is a table R, with schema.4 U {S}, where S has domain{0,1}. Each time t an object o E P takes experiment E, weinsert, regardless of duplicates, the tuple (Az(t,o), ... ,An(t,o), $(t,o)) in the table where: 1. Ai(t, o) denotesthe value vi E Di that o has for Ai at time 2. $(t, o) - 1 if o succeededin this experimentand S(t, o) -- 0 otherwise. Three notations usedfortheelements of ¯ withrespect toR are:(¢)to denote thesubtable ofR ofalltuples thatsatisfy ¢,(¢)sdenotes theelements of(¢)thataresuccesses, andII~[I todenote allv E Dix ...x D,~thatsatisfy ¢. 3. PROBLEM ANALYSIS Theproblem as stated in theprevious section consists of twoparts. We haveto finda discovery {¢z,... ,¢k}andwe havetoprove thatforthisdiscovery Vvz,v2 E Dzx ...x On: [p(vl) - p(v2)] *-*[Vie{1,...,k} : [¢,(vz) ~ ¢,(v2)]] Finding discoveries iseasy,eachsequence Cz,..., Cn ¯ ~ of descriptions generates the potential discovery ~ -- {~z,’~¢z A ~2,~IA ~(~2 A ~S,...,(~¢~1 A’’" A ~n)}. After removal ofduplicates (~isa duplicate of~ iff¢ ,-.¢)andJ.,~ isa discovery. So,we mayexpect thatthediscoveries areabundant. However, theyarenotallequally goodin thatsomewillnotsatisfy thesecond requirement. Todiscuss thisrequirement, weshould first determine whatprobability should beassociated witha description ¢. Ifwe assume thatthedatabase isa random selection of thepotential clients, thisissimple, since foreachC E ~,(¢)canbeseenastherecord ofsomeBernouilly experiment E~.Hence, theprobability p~ ofsuccess associated with¢ issimply thefraction z. of i~ccesses in(¢)and¢ hastheI00- 6%confidence interval CI~= CI(J(¢)J, J(¢)sJ, Ifthedatabase isnota random selection ofthesetofallpotential clients, saythenumber ofyoung clients isfarlessthancould beexpected, theabove observation isonlytrueforthe "real" ¢iandthelogical combinations thereof. Thisishowever unimportant forthepurposes ofthispaper. Thesecond requirement in turnconsists of twosub-requirements. Thefirstis thattwo values thatsatisfy different elements ofthediscovery havedifferent chances ofsuccess. That is,thedifferent elements aredistinct. Thesecond states thattwovalues thatsatisfy the sameelement of thediscovery havethesamechance of success. Thatis theelements are homogeneous. UsingtheI00- 6% confidence intervals, wecantestdistinction. Forif Cl~zf~ CI~2- 0 we knowthat Yv, w ¯ Dz x... x Dn: Cz(v) h ¢2(w) P(p(v) - p( w)) <_ 6, i f b oth ¢~ are homogeneous. So,provided we cantesthomogeneity of descriptions, we canforsomefixed6 discard allthose discoveries thatcontain atleast twoelements thatcannot bedistinguished witha confidence ofatleast 100- 6%. Zln this paper we assume that al] descriptions we consider are su/Bciently large, such that reasonably accurate estimates of the p~ can be found. See also footnotes and remarkslater in this paper. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 101 Can we testhomogeneity of descriptions? If a description ¢ is not homogeneous, (¢) contains elements r andw suchthatp(v)~ p(w).Moreprecisely, if ¢ is nothomogeneous, there exists a cover C ffi{CI,..., Cz},of disjoint subsets of [¢[suchthat: Butthisis exactly thequestion we aretrying to answer! So,letus firsttryto answer thequestion whentheemptydescription, i.e.,T is homogeneous. Clearly, if everysubsetof thedatabase hasthesameassociated probability on success, thenthedatabase is homogeneous. However, thisis obviously impossible. In fact, the database is the record of a weighted Bernouilly ezperiment. Let gi denote the relative size of group Gi in the universe of potential clients. Under the assumption that the actual clients are a random subset of the potential clients the database is the recording of a Bernouilly experiment E with its probability on success PE given by: i--1 Notethatif JR)is large, then,as onewouldexpect: PE --- ~1 g,PG, ~ ~-1 ~7~’JP~, ~ ~-Z ~.J x ~ = the number oi~uccesses in R Hence, if we consider all possible subsets of the database, the numberof successes in the subsets of a given size follow a binomial distribution. In other words, if weinspect all possible subsets we can conclude nothing. However, we are not interested in all possible subsets, but only in those subsets we can describe. Since ¢ is sparse, one would expect that all descriptions have more or less the same associated probability. If we think that T is homogeneousand it turns out that two descriptions have distinct associated probabilities then we are surprised. Wewould not expect this to happen. If female clients appear to drive muchsafer than male clients, weare no longer sure that all clients have the same associated probability. That is, we are no longer convinced that T is homogeneous. In other words, it is plausible that a description is homogeneousif all its covers have the same associated probability. This can be stated more succinctly in the slogan: Homogeneous descriptions contain no surprises. 4. HOMOGENEOUS DISCOVERIES To simplify the discussion, define a set ~¢1,..., description ¢, if: 1. Vi, j¯ {1,...,k} :i#j...-~ Ck} of descriptions to be a discovery .for [¢i ^ Ck--* -L] 2. V~=1¢i ¯ @ and k If we assume thatT ¯ ~, thediscoveries of theprevious sections arediscoveries forT; we willcontinue to usethisabbreviation. Page 102 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 If @ is an homogeneous description, so is @ ^ ~. In fact,thenpc andp¢^~should be, almost, identical. In theprevious section, wehaveseenthatp~ andPCA~areidentical with a probability 6 if CI~ andCI~^~are disjoint. Therefore, wedefine a description ~ to be a (100- 6)%su~risin9 description fora description @,if CI~andCI~^,~ aredisjoint. For example, if Cl~~- [0.19,0.21]and s~ CI~e,~er=,,~,a 0.27], thenSender = maleis e - [0.25, a 95%surprise for T. Since discoveries cancontain manyelements, itis notimpossible thata discovery fora homogeneous description @ contains somesurprising descriptions for@. To quantify the number ofsurprising descriptions ina discovery thatissurprising, notethata description hasa chance 6 to besurprising for@.Thatis,being a surprise is a Bernouilly experiment, withprobability 6.We wouldbe surprised if thenumber of successes in thisexperiment is high. Fora Bernouilly experiment withprobability p of success, define Lwb(n,p,6) to be the lowest integer k suchthatP(#S~_ k[n,p) _< 6. Fora description @, define Lwb(@,6) Lwb([(@)[,p#,6). Hence,wedefine a discovery{@1,.¯ ¯, ¢z} for a rule ~ to be a (100-6)%surprising discovery /or if: [{iE {1,...,/}l@i is a (100- 6)%s.rp~sing description for~ }[_>Lwb(~, 6) A description forwhich there areno(100- 6)%surprising discoveries, iscalled a (100homogeneousdescription 2. Onemightwonder whywe bother Bernouilly experiments forcovers since¯ is assumed to besparse. However, evenif@ is sparse, it canstill happen that,I~distinguishes allvalues of an attribute witha largedomain, sayage.Forsucha densecover, onecanexpect some sunrises. However, onlyifsuchsurprises areconsiderable, sayallclients younger than25,it counts asa real surprise. Notethatif,I~isnever dense, suchasinourexample, onesurprise ina cover isenough to makethecover surprising. Let ~ be a description, a discovery {@1,..., @z}for ~ is (100 - 6)%homogeneous discovery .for ~, iif all of the @~are (100 - 6)%homogeneous descriptions. 5. SPLIT (100--6)% HOMOGENEOUS DISCOVERIES ARE UNIQUE Clearly, the real discovery we are searching for will be (100 - 6)%homogeneous.But are all (100 - 6)% homogeneousdiscoveries plausible candidate solutions? If there are many (100- 6)%homogeneous discoveries, one might wonderwhich of these discoveries is the most plausible answer.In this section, weshowthat all 8pHtdiscoveries are moreor less the same. A (100 - 6)%homogeneous discovery {@1,..., @z}is split iff i#j [ ^ V@e#Vi,j6{1,...,l}: (@iA@)>>0 A (¢j 6 --*CI~,^,AnOI~#^¢=¢ >> 0 21f@ isade,,cription forwhich I(V~)I isalre~y sosmall that for almost all noncontradictory ¢,I(¢ ^ is to small for an accurate estimation of P@A~are not considered to be homogeneous. KDD-94 AAA1-94 Workshop on Knowledge Discovery in Databases Page 103 Inother words, a discovery issplit ifitsdescriptions aredistinct onallbuttrivial subsets. Let{~bl,..., ~k}and{~Pl,..., ~P~}be twosplit (100- 6)%homogeneous discoveries. Then sinceboth/¢ andI areassumed to be small(since thenumber of groups is assumed to be small) there isforeach~i,atleast one~pjsuch that(~biA~Pj) >>0.Butsince both discoveries arehomogeneous andsplit, therecanbe at mostone.So,(~i)~ (~Pj)CI~~. I~j C . Define a pre-order onthe(100-6)% homogeneous discoveries by{~i,.. ¯,~k}-<{~Pl,. ¯ ¯,~Pz} iff there exists ainjective mapping f:{i,..., k}-.j¯{I, ..., l}such that CI~ Ci~/(o .6 c Then, by theobservation above, we canidentify allsplit(1006)%homogeneous discoveries andturnthisrelation intoa partial order. Clearly, thesplit(100- 6)%homogeneous discoveries areminimal withregard to this order. Note,however, theyneednotexist. Hence, we cannot identify thesplit(100- 6)% homogeneous discoveries withtheminimal discoveries. Of-all the(100- 6)%homogeneous discoveries, theminimal discoveries arepreferable, e.g., bytheminimal description length principle. Ifa split discovery exists, thisoneisthemostplausible ofallminimal discoveries, since it makes thesharpest distinction between itsdescriptions. Inother words, itisthedescription withthehighest information-gain. Thefactthatthesplit discoveries arenotunique is simply caused bythefactthata set of tuples canhavemorethanonedescription. Forexample, itcouldhappen thatalmost all young clients aremaleandviceversa. Inthatcasethedescriptions age- young andgender - maleareequally goodfroma theoretical point of view. Notnecessarily froma practical point ofview. For,itis verywellpossible thatthedescription age= young makes sense to a domain expert whilegender = maledoesnot.Hence, bothoptions should be presented to thedomain expert. 6.’~FINDING MINIMAL HOMOGENEOUS DISCOVERIES In thefirstsubsection we showthatfinding minimal homogeneous discoveries reduces to maximisation ofa function. Inthesecond we briefly discuss various maximisation algorithms. 6.1 Mo~misation Define thepartial order "<6on thesetof(I00- 6)%homogeneous descriptions asfollows: ¯ Ifforneither ¢~"<6~Pnortp"<6~ holds, then wesaythat ~ ~6~P. Fromthedefinition of(100-6)% homogeneous discoveries itis immediately clear thatsuch a discovery neccesarily contains a description ~b suchthatforall(100- 6)%homogeneous descriptions Ipeither ~P"<6~bor~b~6~P. Minimal (100-6)%homogeneous discoveries differ fromtheordinary onesin thattheir "maximal" element ~ alsohasa maximal [(~)ITherefore, define thefollowing pre-order the(100- 6)%homogeneous descriptions: ~ r’_6~pif: I. either if ~ "<6~; Page 104 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 Thenminimal (I00- 6)%homogeneous discoveries contain an element @ suchthatforall (1006)%homogeneous descriptions ~, ~ _-.6 Fromthisobservation, we seethatthefollowing pseudo algorithm yields a minimal (100 6)% homogeneous cover: 1.Makea listd (I00- 6)%homogeneous descriptions as follows: (a)finda (bthatismaximal withregard to__.6for (b)remove (~)fromR andadd~ tothelist. (c)continue withthisprocess until: either T is homogeneous on theremainder of R; orR is to small forfurther accuracy estimations ofprobabilities. Inthiscase makethelistempty. 2.Ifthelistisnon-empty, turnitintoa potential discovery asindicated atthebeginning ofsection 3.Ifthelistisempty return nothing Ifwe arereturned a potential discovery, it is a minimal (100- 6)%homogeneous discovery by virtue ofourremarks above. Ifthealgorithm falls, nosolution exists in ~.Given sucha minimal description, itisstraightforward tocheck whether itissplit. Asanaside, notethatthelistreturned bythepseudo-algorithm issimilar toa decision-list [10] 6.,eFinding mazimal descriptiona Toturnthepseudo-algorithm oftheprevious subsection intoanalgorithm, itissufficient to givea procedure thatreturns a maximal (100-6)%homogeneous description (b.For, cancheckwhether T is homogeneous by checking whether ~ and"I"havethesameassociated chance. Butfinding su~a maximal (100- 6)%homogeneous description @ is a simple application of maximising a function. In thefirst phaseof thesearch thisfunction is theassociated probability of"a rule, andinthesecond phase itisthecover oftherule(while retaining the maximal probability found). Thisproblem is well-documented in themachine learning literature. Oldersolutions are ID3,AQ15andCN2[9,6, 2].Moremodern, non-deterministic approaches usegenetic algorithms or simulated annealing. See[4]foran overview ofa someofthese systems. Forourproblem, a non-deterministic approach is bestsuited. For,therecanbe many different (100- 6)%homogeneous discoveries, thatdescribe thesamesetsbutwithdifferent descriptions. Someofthese descriptions willbepurecoincidence, while others havea sensible interpretation. Ifweusea deterministic algorithm, thechance exists thatwemissoutthesensible descriptions completely; thiswould render theresult virtually useless. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 105 Of the two pro-dominant non-deterministic search methods, the genetic approach seems to offer the best possibilities for our problem. Since, as described in [5], its genetic operators can adapt the proven heuristics of the older deterministic algorithms. In this particular case, we think in fact of two sets of operators. One for the first phase and another for the second. In the first phase we search for a maximal probability so, one can think of a hill-climber type of approach. In the second phase generalising a description is far more important. Hence, we need a different set of operators. Note that it is not our intention to turn genetic programminginto something deterministic, rather, it is an attempt to lift genetic programmingto the level of symbolic inference. 7. CONCLUSIONSANDFUTURERESEARCH The main result of this paper is that we have shownthat a new class of problems, viz., find risk profiles, can be reduced to a well-knownclass of problems, viz., maximisation problems. These results have been achieved by introducing the notions of a surprise and homogeneity. In particular the slogan: "homogeneousdiscoveries contain no surprises", has proven to be fruitful. The important implication of these results is that we can built efficient algorithms to solve this new class of real world problems. In fact, at the momentwe are building and testing a system based on the ideas presented in this paper for an insurance companythat wants to find such riskprofiles, for obvious commercialreasons. Next to the choice for an adequate maximisation algorithm as discussed in Section 6, the other main problem is the choice of a good description language. As we have seen in this paper, this language should not be to rich. Obviously, it should neither be to poor if the profiles areto be expressible in thislanguage. Thedesign of a language thatmeetsthesetwo conflicting requirements is currently oneof ourmaintopics. 7. ~ Related work As already indicated in the introduction, as far as the author is aware, inferring risk-profiles is a new problem area in data mining research. It is probably most connected to diagnostic problem solving as reported in [7, 8]. A diagnostic problem is a problem in which one is given a set of symptomsand must explain why they are present. The authors of [7, 8] introduce a solution that integrates symbolic causal inference with numeric probabilistic inference. A crucial point in this integration is the use of a-priori probabilities. The authors imply that these probabilities should be supplied by an expert. However, using the technique presented in this paper, these a-priori probabilities can be derived from a database. ACKNOWLEDGEMENTS Marcel Holsheimer has been the sparring partner in many stimulating discussions. he pointed out an error in a previous version of this paper. Myheartfelt thanks. Moreover, Also thanks to the anonymous referee who pointed out many improvements, and brought references [7] and [8] to myattention. Page 106 AAAI.94 Workshop on Knowledge Discovery in Databases KDD-94 REFERENCES I. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Database mining: a performance perspective. IEEE transactions on knowledge and data engineering, Vol. 5, No. 6, December:914 - 925, 1993. 2. Peter Clark and Tim Niblett. The CN2induction algorithm. Machine Learning, 3:261 283, 1989. 3. William Feller. An introduction to probability theory and its applications, Vol 1. Wiley, 1950. 4. Marcel Holsheimer and Arno P.J.M. Siebes. Data mining: the search for knowledge in databases. Technical Report CS-R9406, CWI, January 1994. 5. Zbigniew Michalewicz. Genetic Algorithms 4- Data Structures Artificial Intelligence. Spinger-Verlag, 1993. 6. Ryszard S. Michalski, Igor Mozetic, Jiarong Hong, and Nada Lavrac. The multi-purpose incremental learning system AQ15and its testing application to three medical domains. In Proceedingsof the 5th national conference on Artificial Intelligence, pages 1041 - 1045, Philadelphia, 1986. 7. Yun Peng and James A. Reggia. A probabilistic causal model for diagnostic problem solving - part I: Integrating symbolic causal inference with numeric probabiUstic inference. IEEE transactions on Systems, Man, and Cybernatics, Vol. 17, No. 2, March/Aprih146 - 162, 1987. YunPeng and James A. Reggia. A probabilistic causal model for diagnostic problem solving - part II: Diagnostic strategy. IEEEtransactions on Systems, Man, and Cybernatics, Vol. 17, No. 3, May/June:395 - 406, 1987. 8. 9. J. Ross Quinlan. Induction of decision trees. 10:-.Ronald L. Rivest. Learning decision lists. KDD-94 - Evolution Programs. Machine Learning, 1:81 - 106, 1986. MachineLearning, 2:229 - 246, 1987. AAAI-94 Workshop on Knowledge Discovery in Databases Page 107 Page 108 AAAI.94 Workshop on Knowledge Discovery in Databases KDD-94