Journal of Automated 0 1986 by D. Reidel Reasoning Publishing 2 (1986) Company. 191 191-216. Problem Corner: Seventy-Five Problems for Testing Automatic Theorem Provers FRANCIS Department JEFFRY PELLETIER of Philosophy. University of Alberta, Edmonton, Alberta, Canada T6G 2E5. (Received: 30 August 1985) The purpose of this note is to provide a graduated selection of problems for use in testing an automatic theorem proving (ATP) system. Included in these problems were the ones I have used in testing my own ATP system (Pelletier 1982, and more recent updates to the system). Some of the problems, especially some of the more difficult ones described at the end of this note, are due to discussions with Len Schubert (Computing, Univ. Alberta), Alasdair Urquhart (Philosophy, Univ. Toronto), and Charles Morgan (Philosophy, Univ. Victoria). People who have tried to compile lists of problems for ATPs in the past (for example, the Association for Automated Reasoning) have discovered that the production of such a list is difficult because (a) what is ‘easy’ for one system might not be for another, (b) researchers are understandably shy about saying what problems their ATP might have, unless they know that it is so difficult that any ATP will have trouble with it, (c) especially with problems at the ‘easy end’ of the scale, researchers are prone to think that any ATP system can prove it and so there is no call to write them up, (d) the goals of ‘natural’ system ATPs and resolution-based ATPs are different: the former tries to produce a ‘natural’ proof usually without prior conversion to clause form. This means that some problems, especially at the ‘easy end’ of the scale, will be trivial for resolution systems but difficult for ‘natural’ systems. On the other hand, proponents of ‘natural’ systems think that at the ‘difficult end’ of the scale, there will be problems within the grasp of the ‘natural’ systems which are beyond the reach of resolution systems. It is therefore difficult to even give a graduated scale of problems. All this has led to publication of dificuft problems, but not to publication of lists of problems suitable for developing an ATP. The locus clussicus of problems is McCharen et al. (1976) which contains a wide range of problems all in clause form. More recently, the Journal of Automated Reasoning has instituted its ‘Problem Corner’. A notable item herein is Lusk and Overbeek (1985). All of us involved in ATP wish to encourage others - graduate students, for example - to look into the field. But where is such a person to start? It is with such neophyte 192 FRANCIS JEFFRY PELLETIER ATPers in mind that the following list is offered. None of these problems will be the sort whose solution is, of itself, of any mathematical or logical interest. Such ‘open problems’ are regularly published in the Newsletter of the Association for Automated Reasoning. Most (but not all) of my problems can be found in elementary logic textbooks - but they have been chosen either because logic students find them difficult, or because previous ATP systems have reported difficulties in establishing them, or because they have some interesting connection with other areas of mathematics (such as set theory). The judgements of difficulty are my own, and are based primarily on my years of teaching elementary logic (so the judgements reflect how difficult beginning students find the problems). The scales are from 1 (easiest) to 10, and are relativized to the kind of problem involved. Thus; a ‘9’ in the propositional logic section might in fact be easier than a ‘4’ in the monadic logic section. I give problems in both ‘natural’ and clause form. The negated-conclusion clause form has eliminated tautologous clauses (but see the remark below in the ‘Acknowledgement’ section). As mentioned, many of the ‘easier’ problems - especially in propositional calculus - are trivial in resolution systems. Still, ‘natural’ systems might find them diverting. In either case, however, I would think that any ATP ought to produce an explicit proof, detailing what preceding formulas/clauses were invoked and (in the case of ‘natural’ systems) what rules of inference were used. When comparing two ATPs for efficiency (say by measuring CPU time on identical machines) it is of course mandatory that the systems should include the cost of conversion to clause form since it is only very rarely that problems ‘in the real world’ are presented in clause form. A Word on Notation +: if-then 7: not +: or &: and cr: if and only if A: for all E: there exists p, 4, r, s . . . (perhaps with subscripts): propositional (sentence) letters; that is, O-place predicate letters. F, 9, . . . P, Q, , . . (perhaps with subscripts): predicate letters; context is used to determine the adicity. x, y, z, w, . . . . (perhaps with subscripts): (individual) variables. a, b, c, . . . . (perhaps with subscripts): individual constants; that is, O-place function symbols. f,g,h.. . (perhaps with subscripts): function symbols; context is used to determine the adicity. SEVENTY-FIVE PROBLEMS FOR TESTING 193 ATP Precedence 1, A, E highest &, + medium -+, - lowest Associativity & and + are allowed to take arbitrarily many conjuncts/disjuncts. For the most part, the notation is in common use and should provide no difficulties. Propositional Logic What follows here are some propositional logic theorems and arguments. They are given both in ‘natural’ form and in negated-conclusion clause form. Some are of historical interest (mostly having to do with the logical theorist), while others illustrate various ‘tricks’. ‘Naturaf’ form 1. Negated-conclusion clause form (2pts) A biconditional version of the ‘most difficult’ theorem proved by the original Logic Theorist (Newell et al. 1957) tP-+d-t14+lP) 1P + 4 -Jq+p 14 P 2. .(2pts) A biconditional version of the ‘most difficult’ theorem proved by the new logic theorist (Newell and Simon 1972) llP*P 3. P 1P (lpt) The ‘hardest’ theorem proved by a breadth-first logic theorist (Siklossy et al. 1973) l(P 4. -+ 4) + (4 -+ PI P 14 4 1P (2pts) Judged by Siklossy et al. (1973) to be ‘hardest’ of first 52 theorems of Whitehead and Russell (19 10) (1P + 9) 4-i (14 -+ P> P+4 14 + TP 19 1P FRANCIS JEFFRY PELLETIER 194 5. (4pts) Judged by Siklossy et al. to be ‘hardest’ of first 67 theorems of Whitehead and Russell (1910) ((P + 4) + (P + 4) -+ lq+p+r (P + (4 - r)) 1P 9 7r 6. (2pts) The Law of Excluded Middle: can be quite difficult for ‘natural’ systems (P + 1P) 7. (3pts) Expanded Law of Excluded Middle. The strategies of the original Logic Theorist cannot prove this (P + 1llP) 8. (5pts) Pierce’s Law. Unprovable systems. NP + cd --) PI + P 9. 1P P 1P P by Logic Theorist, and tricky for ‘natural’ P 1P (6pts) A problem not solvable by unit resolution nor by ‘pure’ input resolution. I(P + 67)& mP + 4) & (P + 1411-+ TOP + 14) P+4 1P + 4 p+14 1P + 14 10. (4pts) A reasonably simple problem with premises, designed to see whether ‘natural’ systems correctly manipulate premises. q-+r 11. r + (P & 4) iq + r ir+p lr + q P -+ (4 + 4 p-4 lp+q+r P+4 (Ipt) A simple problem designed to see whether ‘natural’ systems can do it efficiently (or whether they incorrectly try to prove the + each way) (P d-bP) P 1P SEVENTY-FIVE 12. PROBLEMS FOR TESTING 195 ATP (7pts) The ‘hardest’ propositional problem found in Kalish and Montague (1?64), according to Pelletier (1982) Kp - 9) * 4 - [I-J++(4 - dl P+q+r iq+ip+r ir+ip+q ir+iq+p p+iq+r p+ir+q q+r+ip ir + iq + ip Distribution Laws can be very tricky for ‘natural’ systems. (They are assumed by resolution systems in the conversion to clause form). The next few problems list some 13. (5pts) b + (4 & dl t* [(p + 4) 8~ (P + r>l P+9 p+r 14. 1P iq + ir 1P + 4 (6pts) (P +-+4) * ((cl + 1P) & (14 + PII 14+P lq+lP P+4 15. (5pts) (P + 4) ++ (14 16. + 4) (4pts) A surprising theorem of propositional (P + 4) + (4 --, P) 17. 1P P 14 + 4 logic P 14 4 1P (6pts) A problem which appears not to be provable by Bledsoe et al. (1972). (For details of why not, see Pelletier (1982), p. 135f). ((P & (4 -+ 4) + 4 c* OP + 4 + 4 & (lp + ir + s)) 1p+q+s 196 FRANCIS JEFFRY PELLETIER 1p+1r+s P iq + r 1S Monadic Predicate Logic Problems in the monadic predicate logic are not much more difficult than those in the propositional logic. All that’s required is a method of handling quantifiers correctly (by finding appropriate instances or substitutions). The problems in this section are designed to test whether this happens. 18. (lpt) (EY)(W(FY Fx + Fx) 1 E’fC4 19. (3pts) (W (AY) (AZ) W’Y - Q4 + (Px + ex>> 1 W(x) + Q&d Px 1Qx 20. (4pts) [(Ax) (Ar) (Ed (A4 ((Px & Qv>+ (Rz & SW)) + (GWW (Px & 1Py 1Py + -IQZ + 1Qz + RfCy, z) + Sx Qv>--) W Rz)l ; TRW 21. (5pts) A moderately tricky problem, especially for ‘natural’ systems with ‘strong’ restrictions on variables generated from existential quantifiers. lp + Fa lFb+p p + Fx lFx+lp @N(P -, Fx) UW(Fx + P) (E-4 ( P - Fx) Some problems having to do with ‘confinement’ of quantifiers. These are often trivial in reslution systems because they are assumed in conversion to clause form. 22. (3pts) (W(P - Fx) --) (P - (Ax) Fx) p+lFx Fx + 1~ SEVENTY-FIVE PROBLEMS FOR TESTING 197 ATP FY + P lp + 1Fa Fy + 1Fa 23. (4pts) (A-4( P + Fx) ++ (P + W) Fx) p + Fx + Fy p + Fx + Fb 1P 1Fa + p + Fy TFa + iFb The following are some more tedious monadic logic problems from Kalish and Montague (1964). 24. (6pts) Q4 (AxW’x -+ <Qx+ R-4) 1 (Ex)(Sx & l(Ex)Px + (Ex)Qx (Ax) (Qx + Rx --) Sx) (Ex)(Px & Rx) 25. 1Sx 1 Px Pa + 1Qx -IRX -IPX + 1Qx + Qx + Rx Qb + Sx + Sx + TRX Pa -IFX --iPx 1Px 1 Px + + + + (7pts) (Ex) P.u (Ax) (Fx + (1 Gx & Rx)) (Ax)(Px + (Gx & Fx)) [(Ax)(Px + Qx) + -IGX + -IRX Fx Gx Qx + Pb (Ex)(Px & Rx)] (W(Qx 26. 1 Px + Qx + Rb 1Qx + TPX & Px) (7pts) (Ex)Px ++ (Ex) Qx & QY -+ (fi -+ Rx) +-. GW(AY)V’X [(Ax)(Px - SY)) TPX 1Qx -IPX + Qa + Pb + lQy 1Px -IPX 1 Px 1Q Qd+ Qd + + 1Qy + Rx + + Rx + + Rx + 1Qx + PC + 1Rx + Sy (Ax)<Qx+ WI + 1Sy + Rx 1Qx + Sx PC -IRC Sx 198 FRANCIS JEFFRY PELLETIER Qd iTSd TSd TSd 27. (Ax)(Zx + 1 Fa 1Ga -IFX + Hx -IJX + -IZX + Fx Hx)l (Ax) (Jx + -I Ix) 29. + Sx (6pts) (Ex)(Fx & -I Gx) (Ax)(Fx -, Hx) (Ax) (Jx & Ix -+ Fx) [(Ex)(Hx & 1 Gx) + 28. -tRc + -IQX + PC -I- -IRC 1Hx Jb Zb + Gx + -1Zy1Hy KAx)Px-+ (Ax)Qxl KW<Qx + Rx) -+ (W(Qx & WI 1Pa 1Qb + Qx + Qc [(Ex)Sx + (Ax)(Fx + Gx)] (Ax) (Px 8z Fx + Gx) 1Qb TRb TRb -ISX Pd Fd TGd + SC + Qc -I- SC + 1Fy (8pts) + Gy (7pts) (Ex)Fx & (Ex)Gx [(Ax)(Fx + Hx) & (Ax)(Gx + Jx)] KW(AY) (Fx & GY -, Hx & JY)I Fa Gb 1Fx + Hx + 1Fy + 1Gz + Hy -IFX + Hx + 1Fy + 1Gz + Jz -I Fx + Hx + Fe + Gf -IFX + Hx + Fe + 1Jf 1Fx + Hx + 1He + Gf -IFX + Hx + 1He + 1Jf -IGX + Jx + 1Fy + lGz+ Hy 1Gx + Jx + 1Fy + 1Gz + Jz -I Gx + Jx + Fe + Gj -IGX + Jx + Fe + 1Jf -IGX + Jx + iHe + Gf 1Gx + Jx + 1He + iJf Fc + 1Fx + -IGY + Hx SEVENTY-FIVE PROBLEMS FOR TESTING 199 ATP Fc + TFX + -ICY -I- Jy Fc + Fe + Gf Fe + Fe + 1 Jf Fc + 1 He + Gf Fe + THe + 1Jf Gd + 1Fx + 1Gy + Hx Gd + 1Fx + 1Gy + Jy Gd + Fe + Gf Gd + Fe + TJf Gd+ 1He + Gf Gd + 1He + TJf -IHC + TJd + 1Fx + 1Gy + Hx -IHC + 1Jd + 1Fx + 1Gy + Jy -IHC + TJd + Fe + Gf 1Hc + 1Jd + Fe + -IJ~ 1Hc + Jd + 1He + Gf 1Hc + -tJd + -tHe + iJf 30. (6pts) (Ax) (Fx + Gx + 1 Hx) (Ax)((Gx + 1Zx) + Fx & Hx) (Ax)Zx 31. (5pts) 1 (Ex)(Fx & (Gx + Hx)) (Ex)(Zx & Fx) (Ax)(l Hx + Jx) (Ex)(Zx & Jx) 32. --IFx + 1Hx Gx + Fx 1Gx + 1Hx Gx -I- Hx Ix + Fx Ix + Hx 7 la TFX + -IGX TFX + 1Hx la Fa Hx + Jx 1Zx + 1Jx (6pts) (Ax)(Fx & (Gx + Hx) -+ Ix) (Ax)(Zx & Hx + Jx) (Ax)(Kx + Hx) (Ax)(Fx & Kx -+ Jx) 1Fx + 1Fx + 1Zx + TKX + Fa Ka iJa -IGX + Ix -tHx + Ix 1Hx + Jx Hx 200 33. FRANCIS JEFFRY PELLETIER (4pts) This is a monadic predicate logic formulation (Ax)(Pa & (Px + Pb) + PC) * (Ax)((l Pa + (Px + PC)) 8~ of problem (17) above. 7 Pa + Px + PC + Py -lPC (1 Pa + (-I Pb + PC))) 1Pa TPa Pa TPe TPe 34. + Px + PC + 1Pd + TPb + PC + Pb + TPa + Pb $ -rPd + Pb + Px + PC (IOpts) Andrew’s Challenge (cf. de Champeaux (1979)). The problem is logically simple, but its size makes it difficult. (The conversion to clause form is left as an exercise for the reader, about 1600 clauses.) KW(AYW’X +-+PY> - ((WQx ++ WY) f’y)l c* * QY) ++ ++ (AY)PY)I KW(Av)(Qx @Wx Full Predicate Logic (without Identity and Functions) Once again we start with some problems to determine whether the quantifiers are being handled properly. 35. (2pts) W 36. (EY) WY --) (Ax) WY) PXY) PXY 1 Pf(x, Y) (3pts) (Ax) (EY)FXY (A-4 (EY) GXY @f(x) G&d (Ax)(A~)@‘xy + GXY -, 1 Fxy + -I Fyz + Hxz (Az)(Fyz + Gyz + Hxz)) (A-4 (EY) HXY 37. YMX? 1 Fxy + 1 Gyz + Hxz 1Gxy + 1 Fyz + Hxz -tGxy + 1Gyz + Hxz 1 Hax (3pts) (AZ)(Ew)(Ax) (EY) Wz -+ PYW) Pyz & (Pyw + (Eu)Quw)] (Ax) (AZ) [1 Pxz + (EY) QYZI (E-9(EY) QXY --) (Ax) Rxx (Ax) @Y) RXY & + Pf(x, YkW Pfk Y), x lPf(x, Y>> g(x) + Q&G Y), &> f'x, Y + Qh Y>, x lQx,y + &z 1PYX 7Ra.x SEVENTY-FIVE 38. PROBLEMS FOR TESTING 201 ATP (4pts) Here is a full predicate logic version of problems (17) and (33). Conversion to clause form left as an exercise. {(Ax)[Pa & (Px + (Ey)(Py & Rxy)) + (Ez)(Ew)(Pz 8~ Rxw & Rwz)] H (Ax)[(-I Pa + Px + (Ez) (Ew) Pz & Rxw & Rwz)) & (1 Pa + I (Ey)(Py & Rxy) + (Ez)(Ew)(Pz & Rxw & Rwz))]] Some problems in set theory can be represented in first order logic using the predicate ‘F’ to stand for ‘is and element of’. Here are some reasonably simple ones. 39. (3pts) Russell’s paradox: there is no ‘Russell set’ (a set which contains exactly those sets which are not members of themselves) Fxa + I Fxx Fix + Fxa -I 0-3 WY) (FYX t* 1 FYY> 40. I (5pts) If there were an ‘anti-Russell set’ (a set which contains exactly those sets which are members of themselves), then not every set has a complement. (Ey)(Ax)(Fxy c-t Fxx) -+ 1 (Ax)(Ey)(Az)(Fxy ++ 1 Fzx) 41. (6pts) The ‘restricted comprehension axiom’ says: given a set z, there is a set all of whose members are drawn from z and which satisfy some property. If there were a universal set then the Russell set could be formed, using this axiom. So given the appropriate instance of this axiom, there is no universal set. (AZ) (EY) (A-4 WY c* (Fxz & I 42. Fxa + Fxx 1 Fxx + Fxa 1 F.-u,f(x) + I Fyx Fyx + Fx, f(x) I 1 Fk f( Fxx)) (Ez)(Ax) Fxz + FXY ~Fx,f(y) + IFXX 1 Fxy + Fxx + Fx, f(y) Fxa (6pts) A set is ‘circular’ if it is a member of another set which in turn is a member of the original. Intuitively all sets are non-circular. Prove there is no set of noncircular sets. 1 Fxa + Fxy + -I Fyx 1 (EY) (Ax) WY t* 1 (Ez) (FXZ & Fzx)) Fxf(x) Ff(x)x 43. Y) I + Fxa + Fxa (5pts) de Champeaux (1979). Define set equality (‘Q’) as having exactly the same members. Prove set equality is symmetric. VWAY)(QXY W)@Y)(QXY - (W(Fzx ++ QYX) * Fzy)) ~Qxy + IFZX + Fzy Qxy + 1Fzy + Fzx t-l-(x, Y), x + t’f(x, Y), Y + QXY 1 ET@, Y), Y + -I F/-(x, Y), x + QXY I 202 FRANCIS JEFFRY PELLETIER Qab + Qba 1Qba + 1Qab Here are some problems taken from Kalish & Montague 44. (3pts) (AW-x + (EY)(GY & HXY) & @Y)(GY& 1 fWI (WJx 8~(AYWY+ fW1 (Ex)(Jx & -I Fx) 45. 1 Fx + Gf(x) 1 Fx 1 Fx 1 Fx Ja 1Gx -IJX + Hx, j-(x) + Gg(x) + 1 Hx, g(x) + Hax + Fx (5pts) (Ax)(Fx & (Ay)[Gy & Hxy + JXYI + (Ay)(Gy & HXY --* KY)) -I (EYWY & KY) (W W & (AYWXY -, LY) & (AY)(GY & HXY --) JxY)~ (W W & 1 (EY)(GY & Hxy)) 46. (1964) -I Fx + 1Hxy 1 Fx + 1Hxy 1Fx + 1 Hxy 1Lx+-lKx Gf(x) + -I Gy + + Ky Hx,f(x) + 1Gy + + Ky lJx,f(x) + 1Gy + + Ky Fa 1 Hax + Lx -IGX + 1Hax + Jax 1 Fx + Gg(x) -I Fx + Hx, g(x) (6pts) (Ax) (Fx dc (Ay) [Fy & Hyx -+ GYI -+ Gx) {(Ex)(Fx & 1 Gx) -, (Ex)(Fx & 1 Gx & (Ay)(Fy & 1 Fx + Ff(x) + Gx 1 Fx + Hf(x) + Gx -I Fx + lGf(x) + Gx -I GY --, Jxy)) (Ax)(Ay)(Fx & Fy & Hxy + 1 Fx + Gx + Fa 1 Jyx) (Ax)(Fx + Gx) 1Fx + Gx + iGa -I Fx + Gx + 1 Fy + Gy + Jay -IFX + 1Fy + -~Hxy + 1Jyx Fb 1Gb SEVENTY-FIVE PROBLEMS FOR TESTING 203 ATP A problem which has gotten some considerable play in the literature is ‘Schubert’s Steamroller’, after Len Schubert. (See Pelletier (1982), Walther (1985), McCune (1985), Stickel (1986)). The problem presented in English is this: 47. (10pts) Wolves, foxes, birds, caterpillars, and snails are animals, and there are some of each of them. Also there are some grains, and grains are plants. Every animal either likes to eat all plants or all animals much smaller than itself that like to eat some plants. Caterpillars and snails are much smaller than birds, which are much smaller than foxes, which in turn are much smaller than wolves. Wolves do not like to eat foxes or grains, while birds like to eat caterpillars but not snails. Caterpillars and snails like to eat some plants. Therefore there is an animal that likes to eat a grain-eating animal. The previously mentioned authors symbolize the problem (especially the conclusion) differently (see Stickel(l986) for details). In its original form it is as follows: Let PO: a is P,: a is Pz: a is P3: a is P4: a is an animal a wolf a fox a bird a caterpillar (Ax)(P,x -+ Pox) + Pox) + Pox) + Pox) -+ Pox) (Ax)(P,x (Ax)(P,x (Ax)(P,x (Ax)(P,x & & & & & P,: a is a snail QO: a is a plant Q,: a is a grain S: a is much smaller than b R: a likes to eat b (Ex)P,x (Ex) P2x (Ex) P3x (Ex)P,x (Ex) P,x (JWQ,x & (Ax)<Q,x+ Qo4 (AW’ox -+ KAY)(Q,Y + RXY) + (AYW’OY & SYX & fWtQi,z & RYZ)) 4 fiy)l) (Ax)(AYN(P,Y 8~ U’,x + PA) -. SXY) & PZY) + SXY) (W(AYW’,X & P,Y) -+ sxy) (W(AYW,X 8~ (PZY + Q,Y>> -, 1 RXYI (Ax) CAY)((P+ & P,Y) -+ RXY) (Ax) CAY)W’j x & P,Y) + 1 RXY) (A4 W’G + PA + (EY)(QoY & fiYN (W(EYW~X & Pov & W(Q, z & RYZ & RxY)) WHAYNP,X 204 FRANCIS JEFFRY PELLETIER In negated conclusion clause form, the problem becomes: lP,X + -IQ* 1Szx +lQw P,a Rxy 7P,x 1 P,x 7P,x lP,X -tP,x P,b p3c f’,d P5e QLflP,X P,x 1 P,x -lP,x 1 P,x 1 lQ,x -tP, + + + + + + lP,Z + +lRzw + + Rxz 1P,y + sxy 1 P,y + sxy 1P,y + sxy 1P,y + sxy -tPdy + Rxy 1 P4 + Qoi(x) 1 P4x + Rx, i(x) + Pox + Pox + Pox + Pox + P,x + Qox + 7P,y + ~Rxy 1 P, + Q&4 1 P,x + Rx, j(x) lP,x + lP,y + 1Rxy lP,x + lQ,y + 1Rxy lP,X + 1Poy + lP,Z + 1 Ryz -I- Rxy Full Predicate Logic with Identity (without Functions) 48. (3pts) ‘A problem to test identity ATPs’ - Piotr Rudnicki (Computing, Alberta). a=b+c=d a=c+b=d a=d+b=c Univ. a=b+c=d a=c+b=d a#d b#c Here are some problems which straightforwardly test the identity components ATPs without putting much strain on the rest of the system. 49. (5pts) (Ex)(Ey)(Az)(z = x + z = y) Pa & Pb afb (Ax) Px 50. x=c+x=d Pa Pb a#b -7Pe (4pts) (Ax)[Fax + (Ay) Fxy] + Fax + Fxy (JW (AYV’XY 1 W-W 51. (5pts) (W (Ew)(Ax)CAY) (x = z & y = BY w)] ++ 1Fxy + x = n of SEVENTY-FIVE (W PROBLEMS (A4 y=w)-x=z] FOR NW(AY) WY TESTING 205 ATP c-) -~Fxy + y = b x#a+y#b+Fxy 1 EO-4, Y + Y = g(x) + f(x) = x -I W(x), Y + Y = g(x) + Ff(x), h(x, z) + h(x, z) = z 1 v-(43 Y + Y = g(x) + h(x, z) Z z + 1 Ff(x), h, (x, z) Y f ‘d4 + W(x), Y + Ff(x), h(x, z) + h(x, z) = z Y Z g(x) + W(x), Y + f(x) = x Y z g(x) + Ff(x), y + h 6, z> z z + 1 U-W, W, 4 f(x) z x + m4 h(x, z) + h(x, z) = z f(x) # x + h(x, z) # z + 1 U-(4, h (x, 4 52. (5pts) (W (Ew)(A-4WY) (x = z & y t EN WY) WW4 x=z)++y=w] = WY ++ ~Fxy + x = y (-, -~Fxy + y = b w)] (FXY x#y+y#b+Fxy 1 Fy,fW + Y f g(x) + S(x) = x -I Fy,f(x) + Y = g(x) + Wx, z>fW + 1 Fh(x, 4, Y f d-4 + Fy,fW + f(x) Y f g(x) + Fy,fW + Fh(x, z), f(x) + h(x,‘.z) = Y f g(x) + F~,f(x) + h(x, z + 1 Fh(x, z), f(x) f(x) # x + Fh(x, 4, f(x) + h(x,z)=z f(x) # x + h(x, z) # z -t 1 f’h 6, zlS(4 53. (7pts) [Test your converter-to-clause-form: (Ex)(Ey)[x z = Y)l # y & (Az)(z = x + about 146 clauses.] f(x) = x z 4 Z 206 FRANCIS JEFFRY PELLETIER {W (A-4 KJ+) WY)(FXYw y=w)crx=z]tt (W WY)KW (A4 (FXYw x = 2) 4-by = w]} 54. (9pts) Montague’s (1955) paradox of grounded classes. (Ay)(Ez)(Ax)(Fxz - x = y) 1 (Ew) (Ax) [Fxw - (Au) (Fxu + lf%f(.Y) + x = Y x z Y + &f(Y) 1 Fxa + 1 1 Fxa + -I Fx, h(x) + -I Fx, h(y) 55. Fxy + Fg(x, y), y Fxy + 1 Fz, g(x, y) Fxa + Fi( y, x), x + Fya (8pts) The following problem was given by Len Schubert. For ATPs with an ‘answer extraction’ mechanism, the conclusion might replaced with the query ‘who killed Aunt Agatha?’ English: Someone who lives in Dreadsbury Mansion killed Aunt Agatha. Agatha, the butler, and Charles live in Dreadsbury Mansion, and are the only people who live therein. A killer always hates his victim, and is never richer than his victim. Charles hates no one that Aunt Agatha hates. Agatha hates everyone except the butler. The butler hates everyone not richer than Aunt Agatha. The butler hates everyone Agatha hates. No one hates everyone. Agatha is not the butler. Therefore: Agatha killed herself. (Ex)(Lx & Kxa) La & Lb & Lc (Ax)(Lx + x = a + x = b + x = c) (AYNWK~~ (Ax) WY) WY (Ax)(Hax + (Ax)(x # b (Ax)(l Rxa (Ax)(Hax + --f Hxy) + -I RXY) -I Hex) --f Hux) + Hbx) Hbz) (A4 (EY) 1 HXY a#b Kaa The Full Predicate 56. (4pts) Logic with Identity Ld Kda La Lb LC lLx+x=a+x=b+x=c 1 Kxy + Hxy 1Kxy + ~Rxy -IHUX + 1Hcx x = b + Hax Rxa + Hbx 1 Hax + Hbx 1 Hx, f(x) a#b 1 Kaa and Arbitrary Functions SEVENTY-FIVE PROBLEMS FOR TESTING 1Fx + y #f(y) + Fy + 1 Fz + Ff(z) 1Fx + y #f(y) + Fy + Fu 1Fx + Y Z f(r) + Fy + b = f(b) 1F.x + Y #f(~) +f(~) + 11% Fc + 1 Fx -t- Ff(x) Fc + Fa Fc + b = f(b) Fc + TFb 1 Ff(c) + 1 Fx + Ff(x) 1 If(c) + Fu 1 Ff(c) + b = f(b) 1 Ff(c) + 1 Fb tAx)V+ + VW1 57. 207 ATP (2pts) Ffta, b),f@, 4 Wtb, 4, Aa, 4 (Ax)(Ay) (Az)[Fxy & Fyz + Fxz] t’f@, b),fta, 58. + 1 Fyz + Fxz b),f(a, 4 (3pts) tWtA~lfC4 (A4 59. 4 1Fxy lFf(a, (A~lftftx)) = d f(x) = g(y) A = fkt f(ft4) YN # fk(b)) (3pts) I Fx + 1 Ff (x) EK4 + F(x) 1 F(x) + t?-(x) 60. (4pts) (A4 W> f(x) c-, (EY) [(AZ) WY -+ f&f(a) + lf’by + Fy,fW k f(x)) & FXYI Fa, f (a) + Fa, b Fg(x), x + 1 Fax + 1 Fyb + FY, f (4 Fg(x), x + 1 Fax + Fab F&x), x + 1 Fax + 1 Fu, f(a) 1 Fg(x), f(a) + 1 Fax + 1 Fb + FY, J-64 1 Fg(x), f(a) + 1 Fax + Fab 1 Fg(x), f(a) + 1 Fax + 1 Fa, ft4 FRANCIS JEFFRY PELLETIER 208 Having warmed up with some straightforward some more difficult ones. 61. (6pts) (Ax) WY) (Azlf(x~ f( Y, 4) = f(f(x, YIP4 (Ax) CAY)(A4 (A4 fk f( YYfk w))) = f(fuh YL 4 w 62. examples, let’s turn our attention to f&T f( Y, z)> = f(f(x, Y) 4 f(a, f(h fk 0) Z f(f(f(G 8, 4 4 (Spts) Here is the original formulation in Bledsoe et al. (1972), p. 59 of the problem mentioned in (17), (33), and (38). [Fu & (Ax)(Fx (W[[b b J’a+ -+ F’(x))] +-+ f’a + F.x + Ff (f lFf(x) Cd 8~ + Ff(fCMl 1 Fu + Fb + Ff(f(x)) Ff(f(y)) 1Fu + Fb + Ff(f(x>) lFf(y) + Ff(f(~)) 1 Fu + Fb + Ff(f(x)) Ff (Y) -I Fa + Fb + Ff (f(x)) + Fy + + + 1Fz + + 1Ff (f (4) 1Fa + lFf(b) + Ff(f(x)) FY + Ff(f(y)) -IFQ + lFf(b) + Ff(f(x)) lFf(y) + Ff(f(y)) 1Fu + -off + Ff(f(x)) 1 Fz + Ff (4 1Fa + lFf(b) + Ff(f(x)) 1Ff (f (4) Fa -IFC + Ff(c) + IFU + Fy Ff(f(y)) TFC + Ff(c) + 1Fa + + + + + + lFf(y) + Ff(.Ry)) 1 Fc + Ff (c) + 1 Fz + Ff (z) 1Fc + Ff(c) 1 Ff (f (c)) + Ff(f(y)) lFf(f(c)) + Ff(f(y)) lFf(f(c)) + 1 Ff (f (c)) + + lFf(f(4) 1 Fu + Fy + 1Fa + lFf(y) + 1Fz + Ff(z) -I W(f (4) Here are three group theory problems, published I think, by Larry Wos (but I cannot locate where). Consider SEVENTY-FIVE (4 (b) (4 63. ;;I PROBLEMS ty) FOR GWf(ftx, YL TESTING 4 = 209 ATP f(x, f(f(x, Y)? 4 = fk (A&a, x) = x (W(E~lft Y, -4 = a (6pts) Show that (a), (b), (c) entail f@, 4 = x f(gtx), 4 = a (A-4 (AY) (A4 V-(x, Y) = ftz, Y) + x = z] f@, cl = .m f-t Y, 4) 4 b#d 64. (6pts) Show that (a), (b), (c) entail (A4 tA.14 VI Y, 4 = a + ftc, 4 = a f-(x, Y) = 4 ftb, c) # a 65. (8pts) Show that (a) and (b) entail [(A-X)/(X, x) = a --* f(X, x) = a (Ax)tWf(x, Y) = f( Y,41 S(h 4 z ftc, b) Charles Morgan (see AAR Newsletfer #3) has suggested a method of constructing difficult function-problems out of easy propositional logic ones. To do this, take a propositional logic theorem, eg (11 P -+ P), treat the propositional letters as objects which can be quantified over, encode the sentence operators (1 and -+, here) as functions, and treat ‘is a theorem’ as a (monadic) predicate ‘T’. Thus, this theorem would become (Ax) E(n(n(x)), x). Do this for every axiom of the logic under consideration, and treat the results as premises for every argument. The rules of inference of the logic are treated in this manner by saying that if the premise of the rule has ‘7” applied to it then the conclusion of the argument has ‘7” applied to it. So for the rule modus ponens: if P and (P -+ Q) are derivable, then so is Q, we would convert this to (Ax)(Ay)(Ti(x, y) & TX -+ Ty), and treat it as a premise of every argument. That this can generate quite difficult problems out of very simple ones is demonstrated by the following examples of Morgan’s. The initial propositional logic has three axioms plus modus ponens: (4 tAxK%9 TW, 4 Y, -4)) (b) (Ax) (+)(Az) T(iG(x, i( Y, z)), Wx, Y), 0, (4 (Ax) MY) Tt WW 4 Y)), it Y, 4)) (4 (Ax)ObWW, Y)) & T(x) -+ Tt Y)) 66. (7pts) From (a)-(d) prove 67. (Ax) Wtx, n(W)) (7pts) From (a)-(d) prove (Ax) WtntW)~ 4 z)))) 210 FRANCIS 68. (8pts) Replace (c) with (Ax)(Ay)T(i(i( (Ax) Wtx, ~Mx)>N JEFFRY PELLETIERE y, x), i(n(x), n(y)))) and prove Morgan’s method is completely generalizable. The examples above used as an ‘object language’ only + and 1, but we could have added on other connectives together with appropriate definitions or axioms. For example, we might add on t, together with one or more of the definitions oft, in terms of + and 1 and perhaps with new axioms for c, and perhaps with new rules of inference for CI. For instance, we might add on WWWt and/or and/or and/or and/or and/or and/or Wtx, Y)) & TM Y, 4) t* T@(x, Y))) (A-4 (AY) Wtetx, Y), Nitx, Y), n(i( Y, 4))))) GWtAMx, Y) = Wi(x, Y), NY, 4))) (Ax)(AY)UM~~ YN & T(x) + T( YN WW!Mx, Y) = 444, nt Y)) tAx)tA~Mx, Y) = ety, 4 (Ax) GWMx, Y)) = 4x, 4 Y)> and the like. With sufficient of these, you can try to prove (the translations of) arguments using +, 1, t+. They are very difficult. But the method can be generalized even further and extended to other logics which include propositional logic as a part. Let us suppose we have a characterization of the propositional logic using Morgan’s method. That is, suppose we have translated a complete set of propositional axioms and rules of inference, such as the ones mentioned before Problem 66, into the function-notation just mentioned. Now suppose we wish to consider modal propositional logics. These logics introduce one new propositional operator, L (logical necessity). The modal system T can be described as (a) propositional logic axioms (b) Modus Ponens (4 UP + 4) + (LP + Ld (4 LP + P (e) if p is a theorem, then Lp is a theorem. Morgan’s method as described shows how to get the translation of (a) and (b). Thus the function-notation version of system T would be arrived at by introducing a new function symbols for L, say ‘b,‘, and an axiomatization of system T would be (a) (b) tc) (4 te) 69. (the (the (Ax) (Ax) (Ax) translations of the propositional logic axioms) translation of Modus Ponens) WY) TW, (4x, Y)>, it& (4, h t YNN TG(b (4 4) t T(x) -, T(h (4)) (9pts) Using the (a)-(e) just given as premises, translate simple theorems of the modal system T and prove them. For example: (Ax) Wth t-4, 4 (WNN SEVENTY-FIVE 70. PROBLEMS FOR TESTING ATP 211 (open problem) The example in (69) used the modal system T, and ‘6, ’ was the function corresponding to the L of system T. Now let’s consider the modal system K, and let’s represent its L by ‘bz’. The axioms and rules of inference for K are translated as (a) (translations of the propositional logic axioms) (b) (translation of Modus ‘Ponens) (f-1 (A.4 (AA TO@, (i(x, .Y)), @, (4, bd A))> W (A-4 VW -+ T(b, (4)) In Pelletier (1985) I posed the question of whether there might not be a way to ‘define’ the b, function in terms of the other functions (including b,), and a way to ‘define’ the b, function in terms of the other functions (including b,), so that if X was a theorem which mentioned ‘b, ’ but not ‘b2’, then the result of replacing ‘b,’ (and its argument) by its ‘definition’ would result in a theorem. [And conversely for a theorem Y which mentioned ‘b2’ but not ‘b,‘.] These ‘definitions’ were also to have the following feature: letfi be the ‘definition’ of ‘b,’ in terms of ‘b,’ and f2 be the ‘definition’ of ‘bi in terms of ‘b,‘. Then (h) (A-4 Wx, 0) (A-4 T(& h (fi (4))) h (fi (4))) were also to be true. That is, the result of ‘translating’ any sentence purely of one of the sublanguages into the other sublanguage can be ‘translated’ back into the first sublanguage and the result will be provably equivalent to the original sentence. Therefore, given (a)(i) the problem is to first determine whether there are suchf’s and second find out what they are. Some Problems for Studying the Computational Complexity of ATP’s The difficulty in constructing problems for studying the complexity of the proof system of an ATP is to describe a set of problems whose complexity can independently be characterized in terms of some metric which can be varied and which does not introduce any ‘side effects’ into the resulting proofs. Various attempts to state such a set of problems have usually focussed on (a) number of clauses, (b) number of symbols, (c) number of distinct symbols. It is extremely difficult to guarantee in advance that the increase in proof size which is observed when (say) the number of clauses is increased is due solely to the increase in number of clauses, as opposed to being also influenced by some hidden increase in ‘number of tricks required’ (say). The following problem-types are designed to give a measure of complexity which does not involve any other types of difficulty as the problems get more complex. 71. (U-problems, after Alasdair Urquhart). Consider the following problems (the sub-problem of conversion to clause form is left to the reader). 212 FRANCIS JEFFRY PELLETIER The number of distinct sentence letters in U,, is n. the number of occurrences of sentence letters is 2n. The number of embedded W’S is (2n-1). The number of clauses goes up dramatically as U,, increases, but I don’t think it shows that the problems are dramatically more difficult as we go from U, to U,, say. Rather, it’s that the awkward clause form representation comes to the fore most dramatically with embedded biconditionals. On all other measure of increase of complexity from U, to U,, one should say that the problems increase linearly in difficulty. So, given that the U-series of problems increases linearly in difficulty, compare the increase of your ATP along the dimensions of CPU time, size of proof, number of program statements executed, etc. [Alasdair Urquhart informs me that the proof size of any resolution system increases exponentially with increase in n]. 72. (Pigeonhole problems - cf. Cook and Reckhow 1979). Suppose there are n holes and (n + 1) objects to put in the holes. Every object is in a hole and no hole contains more than one object. Pictorially, we can represent this (for the 4 object, 3 hole problem) as: holes A II I I B I I I I C I I I I I I I I 21 I I I 1 I II I I I I I I I I I I I I I I I I I I I I objects 31 LI -1 Each cell (i,i) says that the ith object is in thejth hole. For each cell use a different sentence letter (P, , P,, . . . ). for the 4-object, 3 hole problem we might assign the letters in this fashion: SEVENTY-FIVE PROBLEMS 213 FOR TESTING ATP holes B A I 1 I Pl I I 21 objects I I 31 PZ I I PL I p7 I I ps I I I 4 I I I pa P,, I '6 I I I P 10 I I I I I I I I 1 I I I c -I I I I I ps I P12 I I I I Ps for example says that object 3 is in hole B. Let us now state the problem: ‘Each object is in a hole’ becomes (for this example) G-4 4 + (b) P4 + (c) p, + (4 P,o + PI + p, Ps + f’6 p, + p9 P,, + f’,, ‘No hole has more than one object in it’ becomes (e) 7 P, + 7 P4 (f) 1 P, + 1 P, (g) lP, + lPl0 for hole A (h) lPc, + lP, (i) 7 P4 + i PI0 (3 1 P7 + 1 PI0 (k) -IP> + 1Ps (1) 1 P* + 1 Pg 04 lP2 + 1Pll for hole B (n) 7 P, + 7 P, (0) lP5 + lpi, (PI (4) lPt7 + lP,, 1 p3 + lP6 I (r) i P3 + 7 P9 ) (9 lP3 f lP,* (t) 1 Ps + 1 Py for hole C (4 lP6 + lP12 (VI 1 p9 + 1 PI, J The set of clauses (a)-(v) are inconsistent, as will any set be which is generated this way when the number of holes is less than the number of objects. Picking the number of objects to be one greater than the number of holes will yield the 214 FRANCIS JEFFRY PELLETIER ‘hardest’ problem (for that number of holes), so we will just consider the ‘n-hole problems’, assuming that the number of objects is (n + 1). It will be noticed that, for an n-holes problem, the number of distinct sentence letters is (n2 + n) and the number of clauses is n3 + n* + (n + 1) 2 Thus the number of sentence letters increases quadradically and the number of clauses increases cubically. In any case, it seems that the ‘difficulty’ of the n-hole problems increases polynomially with n. See if your ATP can emulate that. 73. (Predicate logic pigeon hole problems). Problem (72) represented the n-hole problems by distinct sentence letters. This problem can also be represented by selecting a predicate ‘0’ meaning that x is an object, and a predicate ‘H meaning that x is a hole. Let Zxy be a 2-place relation saying that (object) x is in hole y. The 3-hole problem then becomes “(a) (Ex)(Ey)(Ez)(Ew)[Ox & Oy & Oz & Ow & x # y & x # z & x # w & y # z&y # wc4z # WI (b) (Ex)(Ey)(Ez)[Hx&ZZy&ZZz&x # y&x # z&y # z& (Aw)(Hw --) w = x + w = y + w = z)] (4 (Ax)(Ox + @Y) WY & Zxy)) (d) (Ax(ZZy -, (Ay)(Az)(Oy & Oz & Zyx & Zzx + y = z)) These four formulas (a)-(d) are inconsistent, as are any generated in this manner. Test how your ATP increases in effort spent as the number of holes increases. 74. (Arbitrary graph problems. Due to Tseitin (1968), see Galil (1977) for an expository version. See also Urquhart (unpublished a, b). My thanks to Urquhart for explaining them to me.) Consider a graph with the edges labelled. For example e A B c 0 E Assign 0 or 1 arbitrarily to nodes of the graph. For each node of the graph, we associate a set of clauses as follows: (1) every label of an edge emanating from that node will occur in each clause (of the set of clauses generated from that node) (2) if the node is assigned 0, then the number of negated literals in each of the generated clauses is to be odd. Generate all such clauses for that node. SEVENTY-FIVE PROBLEMS FOR TESTING ATP 215 (3) if the node is assigned 1, then the number of negated literals in each of the generated clauses is to be even. Generate all such clauses for that node. Tseitin’s result is this: the sum (mod 2) of the O’s and l’s assigned to the nodes of the graph equals 1 if and only if the set of all generated clauses is inconsistent. For example, if we assign the node at the top of the above graph a 1 and all others 0, then the set of all generated clauses will be inconsistent. The clauses generated are: from top node, which was assigned 1, so the number of negated literals is even in each clause. (a) A + B (b) 7A + -TB (c)A+C+1D - (d) A+lC+D (e)iA+C+D (f)lA+lC+lD (8) B+C+lE (h) B+lC+E (i)iB+C++ i ) from left node, which was assigned 0, so the number of negated literals is odd in each clause (generate all possible such clauses) (which was assigned 0) (j)iB+lC++E (k) D + 1 E (1) 1 D + E I generated from bottom node Clauses (a)-(l) are inconsistent: prove that with your ATP. Now, we can increase the complexity of the graph in a number of different ways. Pick a ‘natural’ way and see how your ATP’s proof increases CPU or a number of resolvants generated, etc. 75. According to Alasdair Urquhart, the U-problems of problem (71) can be graphically represented by the method of problem (74). The relevant graph is where the double-circled node is assigned 1 and all others are assigned 0. Have your ATP prove this. (Number of vertical lines is number of distinct sentence letters). Acknowledgements I am grateful to a number of people. The system described in Pelletier (1982) that was used to verify which problems in this paper were harder than which others, was jointly developed by myself and Dan Wilson (now of Mirias Research of Edmonton). Further assistance on this system (1983-85) was aided by Gilles Chartrand and Randy Kopach. Len Schubert has a constant source of helpful critique, as well as the author of problems (47) and (55). David Sharp (Philosophy, University of Alberta) also 216 FRANCIS JEFFRY PELLETIER suggested which problems from Thomason (1972) students found difficult. Some of the these problems occur intermingled in the ones given under ‘identity’. Charles Morgan explained to me his method (which was described (66)), and pointed out that it could be used to solve the problems of Pelletier (1985) - see problem (70) above. Finally, and especially, I offer my thanks to Alasdair Urquhart for numerous discussions about ATP - both its theory and specific problems. It was he who interested me in ways to test the proof complexity of various ATP systems. The problems (71)475) are directly due to him. I am also grateful to him for showing me Urquart (unpublished a, b), and telling me about Tseitin (1968). Thanks also to Sven Hurum (Computing, University of Alberta) for allowing me use of his ‘convert to clause form’ program. He warns that it is not perfect in its elimination of subsumed clauses. (But it sure beats doing it by hand!) I gratefully acknowledge the assistance of the Canadian National Science and Engineering Research Council grant A5525 in the research involved in my ATP studies. References Bledsoe, W., Boyer, R., and Henneman, W. (1972) ‘Computer proofs of limit theorems’, Arfifciul Infelligence 3, 2140. Cook, S. and Reckhow, R. (1979) ‘The relative efficiency of propositional proof systems’, J. Symbolic Logic 44, 36-50. De Champeaux, D. (1979) ‘Sub-problem finder and instance checker: two cooperating preprocessors for theorem provers’ IJCAI 6, 191-196. Galil, Z. (1977) ‘On resolution with clauses of bounded size’, SIAM J. Computing 6, 444459. Kalish, D. & Montague, R. (1964) Logic: Techniques of Formal Reasoning, World, Harcourt & Brace, Lusk, E. and Overbeek, R. (1985) ‘Reasoning about equality’, J. Aufomafed Reasoning 1, 209-228. McCharen, R., Overbeek, R., & Wos, L. (1976) ‘Problems and experiments for and with automated theorem-proving programs’, IEEE Trans. Compufers C-25(8), 773-782. McCune, W. (1985) ‘Schubert’s Steamroller Problem with linked UR-resolution’ Assoc. Aufomafed Reasoning Newsleffer 4, 4-6. Montague, R. (1955) ‘On the paradox of grounded classes’ J. Symbolic Logic 20, 140. Newell, A., Shaw, J., and Simon, H. (1957) ‘Empirical explorations with the logic theory machine: a case study in heuristics’. Reprinted in E. Feigenbaum and J. Feldman (eds.) Computers and Thoughr (McGraw-Hill, N.Y.) pp. 279-293 (1963). Newell, A. and Simon, H. (1972) Human Problem Solving, (Prentice-Hall, Englewood Cliffs). Pelletier, F. J. (1982) ‘Completely nonclausal, completely heuristically driven automatic theorem proving’, Technical Report TR82-7, (Dept. of Computing Science, Edmonton, Alberta). Pelletier, F,. J. (1985) ‘Six problems in translational equivalence’, Logique et Analyse 108, 423434. Siklossy, L., Rich, A., and Marinov, V. (1973) ‘Breadth first search: some surprising results’, Artificial Intelligence 4, l-27. Stickel, M. (1986) ‘Schubert’s steamroller problem: formulations and solutions’, J. Automated Reasoning 2, 89-104. Thomason, R. (1972) Symbolic Logic (Prentice-Hall, Englewood Cliffs). Tseitin, G. S. (1968) ‘On the complexity of derivation in propositional calculus’, reprinted in J. Siekmann and G. Wrightson (eds.) Aufomafion of Reasoning (Springer-Verlag, Berlin). Urquhart, A. (unpublished a) ‘The complexity of Genzen systems for propositional logic‘. Urquhart, A. (unpublished b) ‘Hard examples for resolution’. Walther, C. (1985) ‘A mechanical solution of Schubert’s steamroller by many-sorted resolution’, Artificial Intelligence 26, 217-224. Whitehead, A. and Russell, B. (1910) Principia Mafhemafica, Vol. I. (Cambridge UP, Cambridge).