Functional Type Theory

CHAPTER TWO: FUNCTIONAL TYPE THEORY 2.1. Compositionality We have given a compositional semantics for predicate logic, a semantics in which the meaning of any complex expression is a function of the meanings of its parts and the way these meanings are put together. In predicate logic, complex expressions are formulas, and their parts are formulas, terms (constants or variables), n-place predicates, and the logical constants. As is well known to anybody who has gone through lists of exercises of the form: 'translate into predicate logic', predicate logic does a much better job on representing the meanings of various types of natural language sentences, than it does on representing the meanings of their parts. Take for example the following sentence: (1) Some old man and every boy kissed and hugged Mary. With the one-place predicate constants MAN, OLD, BOY and the two-place predicate constants KISSED and HUGGED, we can quite adequately represent (1) in predicate logic as (1'): (1') x[MAN(x)  OLD(x)  KISSED(x,m)  HUGGED(x,m)]  y[BOY(y)  KISSED(y,m)  HUGGED(y,m)] The problem is that without an explicit syntax and a compositional semantics, representing (1) as (1') is a completely creative exercise, and it shouldn't be. What we really need is a semantics that gives us the right semantic representation for sentence (1) (and hence the right meaning), and that tells us at the same time how this semantic representation is related to the meanings of the parts of (1). Our representation (1') has various parts (like ) that don't correspond to any part of the sentence; some parts of the representation occur twice in different forms, while in the sentence there is only one part (kissed and hugged Mary); the representation contains sentence conjunction , whereas the sentence contains NP conjunction and verb conjunction; OLD is separated from MAN in the representation, which is motivated by what the meaning should be, but nothing explains how it got separated, etc. What we really need is a logical language in which we can represent the meaning of sentence (1) and the meanings of its parts: OLD/ MAN/ SOME/ OLD MAN/ SOME OLD MAN/ EVERY/ BOY/ EVERY BOY/ AND/ SOME OLD MAN AND EVERY BOY/ KISSED/ HUGGED/ AND/ KISSED AND HUGGED/ MARY/ KISSED AND HUGGED MARY/ SOME OLD MAN AND EVERY BOY KISSED AND HUGGED MARY 21 Similar arguments apply to the structure of noun phrases. We represent sentences (2) and (3) in predicate logic as (2') and (3'): (2) Every boy walks. (2') x[BOY(x)  WALK(x)] (3) Some boy walks. (3') x[BOY(x)  WALK(x)] In both these representations, the connective  or  is put in to get the meaning of the sentence right, but there isn't a theory of where this comes from. And this shows up when we start to think about other determiners: (4) Most boys walk. (5) Mx[BOY(x) ? WALK(x)] Even if we were to add to predicate logic a new quantifier Mx and try to get its semantics right, we couldn't get to the right meaning, because we can prove that not only are both the truth functions  and  inadequate (whatever definition of Mx we manage to provide), but much stronger, there is provably no truth function (and hence no connective) that could provide the right meaning here. The meaning of most falls outside the range of a logical theory of quantification with the power of predicate logic. This means that we have to study the meanings of determiners - of which every and some are only two instances - much more systematically. And when we do that, we see that there are all sorts of semantic inference patterns that we find across classes of different determiners: certain determiners have, by their meaning, properties in common. For instance, we find the following entailment patterns: In the examples in (6) we find an entailment from (6a) with walk to (6c) with move, while in the examples in (7) we find the inverse entailment from (7a) with move to (7c) with walk. (6a) (6b) (6c) Every/some/at least three/more than half of the boys walk. Whoever walks, moves. Hence, every/some/at least three/more than half of the boys move. (7a) (7b) (7c) No/not every/at most three/less than half of the boys move. Whoever walks, moves. Hence, no/not every/at most three/less than half of the boys walk. These inferences are obviously related to the meanings of the determiners involved, and in order to explain them, we need a theory of determiner meanings. We are thus faced with the general problem of formulating a semantic theory in which we can assign meanings to expressions of all sorts of linguistic categories, and in which we can formulate semantic operations on such meanings, which will produce the correct sentence meanings, predicting the correct inference patterns. In order to get the general idea behind such a theory, let me digress once more a bit on predicate logic. I claimed that we had successfully given a compositional semantics for predicate logic. Yet, when we look at the semantics more carefully, we see that 22 there are certain parts of expressions of predicate logic that I have not specified an interpretation for. in particular, I have not specified an interpretation for the logical constants. that is, I have not specified v¬bM,g, vbM,g, vxbM,g, etc. While I have indeed not specified such interpretations, the semantics is nevertheless a fully specified compositional semantics, because for each of the logical constants, I have completely and explicitly specified its semantic contribution. For instance, the clause in the truth definition for ¬ is the following: v¬φbM,g = 1 iff vφbM,g = 0; 0 otherwise. This tells us that the semantic contribution of adding a negation to a sentence φ - any sentence φ - is to reverse the truthvalue, i.e. it takes the truth value of φ in M relative to g, and it specifies the opposite one as the truth value for ¬φ in M relativized to g. This doesn't explicitly tell us what the interpretation of ¬ should be, but it does tell us this implicitly: The interpretation of ¬ in M relative to g is that entity that does the above, i.e. that entity that has that semantic effect. This entity takes a truth value (the truth value of the complement) as input, and produces a truth value as output (the truth value of the whole sentence). Given that there are assumed to be only two truth values 0 and 1, and that the semantic clauses are formulated in such a way that ¬ has a definite semantic effect for every possible input, there is only one entity that the interpretation of ¬ can be: For every model M and assignment g: v¬bM,g = {<0,1>,<1,0>} Negation is interpreted as the one place truth function which maps 0 onto 1 and 1 onto 0. A similar argument shows that  is interpreted as the two-place truth function {<<1,1>,1>,<<1,0>,0>,<<0,1>,0>,<<0,0>,0>>}. When we look at the semantics of the quantifiers, we see that in a fully explicit compositional semantics for predicate logic, where indeed every part is assigned an interpretation, we have to make all our interpretations a bit more complicated. I will not go into that here (see, e.g. Landman 1991). The point of the comparison is the following: we have natural interpretations for certain of the parts of our natural language expressions. We need to find interpretations for the other ones. Our strategy will typically be to first look and determine the semantic contribution of these expressions (as we did for negation). And then we look for the entity which has that semantic contribution. For instance, suppose we have decided that the noun MAN is, as before a one place predicate constant, interpreted as a one-place property. Now we are looking for the interpretation of the adjective old. We observe that there is good reason to assume that old man is like man a noun (but a complex noun), and that it is a one-place predicate just like man. Let us introduce an expression in the logical language OLDn, which will be the representation of the prenominal adjective old. Let us add a rule to the logical language which says: 23 If P is a one-place predicate then OLDn(P) is a one-place predicate. Now let us assume that we have another one-place predicate OLDp, which we do not treat as the representation of the pre-nominal adjective old, but rather as the representation of the predicative adjective. Then it is quite clear what the semantic contribution of the prenominal adjective old, and hence of the expression OLDn, is. We can formulate the following semantic rule: vOLDn(P)bM,g = vOLDpbM,g  vPbM,g This semantic rule completely specifies the semantics of the expression OLDn in a functional way: it tells you what output it produces (vOLDpbM,g  vPbM,g), when you specify any possible input (vPbM,g). The task of finding an interpretation for OLDn, i.e. of defining vOLDnbM,g then comes down to finding the entity that does exactly that. The entity that does this is the function which maps any set onto the intersection of that set and vOLDpbM,g. This function is henceforth the interpretation of OLDn, and hence we need to have in our models domains that contain such functions. To give another example, let us look at (2): (2) Every boy walks. Let us assume, as before, that BOY and WALK are one-place predicates, and let us assume that the meaning of the sentence is adequately represented by the predicate logical formula (2'). (2') x[BOY(x)  WALK(x)] We are interested in the meanings of every and the meaning of the noun phrase every boy. Let us for that reason add an expression to the language EVERY and a rule that forms a term out of this expression and predicates that are used to represent nouns: If Q is a one-place predicate then EVERY(Q) is a term. The terms that we had before (constants and variables) were interpreted as individuals. We cannot possibly do this for terms of the form EVERY(BOY), because it is intuitively clear that whatever such a term stands for, it isn't an individual. Yet, EVERY(BOY) has to combine with the verbal predicate WALK to give a sentence. Here we apply the basic trick of the functional theory. If EVERY(BOY) cannot be the argument of WALK (because that would make its interpretation an individual), we can assume that WALK is in fact the argument of EVERY(BOY). We add a rule for predicates that represent verbs: If P is a one-place predicate then EVERY(Q) (P) is a formula. If we assume that the predicates P and Q are interpreted just as usual, then we know what the semantics of the sentence EVERY(Q) (P) should be like: vEVERY (Q) (P)bM,g = 1 iff for every d  vQbM,g: d  vPbM,g; 0 otherwise 24 This tells us again just as before implicitly what the semantic contribution of the term EVERY(Q) is: The interpretation of EVERY(Q) takes a set as argument (vPbM,g) and gives as result truth value 1 iff every d  vQbM,g is in that set (vPbM,g). This means that vEVERY(Q)bM,g is the function that does exactly that. And there is indeed a unique function f that precisely does that for any set. Once we know what the interpretation of EVERY(Q) is, we can apply the same procedure to find the interpretation of EVERY. The semantic effect of EVERY is completely determined by the semantics for EVERY(Q) (P). We know that EVERY combines with Q to form the term EVERY(Q). We know the interpretation of Q: vQbM,g; we know the interpretation of EVERY(Q): vEVERY(Q)bM,g = f, the function that we have just determined. This means that the semantic effect of the interpretation of EVERY itself is to take a set X (X = vQbM,g) as input, and give as output the function fX. Hence, vEVERYbM,g is the unique function g that takes any set X as input and gives the corresponding function fX as output. We see here that we need rather complicated domains in our models to find the right entities for the interpretation of our expressions: a determiner like EVERY is interpreted as a function which maps sets onto functions from sets into truth values. While the function h which we have assigned as the interpretation of every adequately captures the meaning of every, this is not visually clear, in fact, it is hard to see what it does at all. This is where the logical language comes in. We will in the next chapter add to our logical language the lambda-operator, which is an operator which allows us to represent functions in our logical language. With that operator, we will represent the meaning of every perspicuously as λQλP.x[Q(x)  P(x)]. We will show that this expression is indeed interpreted precisely as the function that I did my best to describe above, and we will discuss some properties of the logical language which will make it very easy to see that this is indeed an adequate representation of the meaning of every. Moreover, this semantics will be seen to extend effortlessly to other determiners, which are not part of predicate logic. It turns out that we can fruitfully apply this procedure for finding functions from the semantic clauses to all parts of natural language that we are interested in. That is, we can find adequate representations of the meanings of all parts of natural language expressions we may want to determine along the above method in a theory which has enough domains of different kinds of functions, and which is based on the two basic principles of function-argument application (functional application) and function definition (lambda abstraction). Such a theory we will now formulate. 25 2.2. The syntax of functional type logic. We will start by specifying the language of functional type logic and its semantics. For compactness sake, I will give the whole language, including the lambda-operator and its semantics, but we will ignore that part till later. The language of functional type logic gives us a rich theory of functions and arguments. Our theory of functions is typed. This means that for each function in the theory it is specified what its domain and range is, and functions are restricted to their domain and range. This means that the theory of functions is a theory of extensional functions, in the set-theoretic sense of the word: a function f from set A into set B: f:A  B is a set of ordered pairs in A  B such that: 1. for every a  A: there is a b  B such that <a,b>  f 2. for every a  A, b,b'  B: if <a,b>  f and <a,b'>  f then b=b' The domain of f, dom(f) = {a  A: b  B: <a,b>  f} The range of f, ran(f) = {b  B: a e A: <a,b>  f} When we talk about, say, the function + of addition on the natural numbers as being the same function as addition on the real numbers, we are using an intensional notion of function: extensionally +:N x N  N and +:R x R  R are different functions, because their domain and range are different and because they are different sets of ordered pairs. They are the same function, intensionally, in that these functions have the same relevant mathematical properties. Our theory of functions will be a theory of extensional functions, and that means that each function is extensionally specified as a set of ordered pairs with a fixed domain and fixed range. This means that functions can only apply to arguments which are in their domain. Partly as a means to avoid the set-theoretic paradoxes, we carry this over into our logical language, which contains expressions denoting these functions. We assume that each functional expression comes with an indication of its type, which in essence is an indication of what the domain and range of its interpretation will be. The syntax of our language will only allow a functional expression f of a certain type a to apply to an argument of the type of expressions that denote entities in the domain of the interpretation of f. This is the basic idea of the functional language: -each expression is of a certain type and will be interpreted into a certain semantic domain corresponding to that type. -a functional expressions of a certain type can only apply to expressions which denote entities in the domain of the function which is the interpretation of that functional expression and hence are of the type of the input for that function. -The result of applying a functional expression to an expression of the correct input type is an expression which is of the output type, and hence is interpreted as an entity within the range of the interpretation of that functional expression. We make this precise by defining the language of functional type logic TL. We first define all the possible types that we will have expressions of in the language: 26 Let e and t be two symbols. TYPETL is the smallest set such that: 1. e,t  TYPETL 2. if a,b  TYPETL then <a,b>  TYPETL e and t are our basic types. They are the only types in the theory that will not be types of functions. -e (entity) is the type of expressions that denote individuals. Thus e is the type of terms in predicate logic. -t (truth value) is the type of expressions that denote truth values. Thus t is the type of formulas in predicate logic. -For any types a and b, <a,b> is the type of expressions that denote functions from entities denoted by expressions of type a into entities denoted by expressions of type b. In short, <a,b> is the type of expressions that denote functions mapping a-entities onto b-entities. This definition of types gives us a rich set of types of expressions: Since e and t are types, <e,t> is a type: the type of functional expressions denoting functions that map individuals (the denotations of expressions of type e) onto truth values (the denotations of expressions of type t, formulas). As we will see, this is the type of properties (or sets) of individuals. Assuming <e,t> to be the type of property-denoting expressions, then <<e,t>,<e,t>> is the type of expressions mapping properties onto properties: the corresponding functions take a function from individuals to truth values as input and deliver similarly a function from individuals into truth values as output. As we will see, this is the just the type for expressions like prenominal adjectives. Since <e,t> is a type and t is a type, <<e,t>,t> is a type as well. This is the type of expressions that denote functions that take themselves a function from individuals into truth values as input and deliver a truth value as output. As we will see, this is the right type for interpreting noun phrases like every boy. Since <e,t> is a type and <<e,t>,t> is a type, also <<e,t>,<<e,t>,t>> is a type. The type of expressions that denote functions which take functions from individuals into truth values as input, and give noun phrase interpretations as output. This is the type of determiners. We really have a rich theory of types. Many types are included in the theory that we may not have any use for. Like the type <<t,e>,<t,e>>: the type of expressions that denote functions which map functions from truth values into individuals onto functions from truth values into individuals. It is unlikely that we can argue plausibly that there are natural language expressions that will be interpreted as functions of this type. To have a general and simple definition of the types and our language, we will nevertheless include these in our logical language (they just won't be used in given interpretations to natural languages expressions). We will very soon give some ways of thinking about the types described above in a more intuitive or simple way. The above examples were only meant to show how the definition of types works. 27 Based on this definition, we now define the logical language TL. The language TL has the following logical constants: ¬,,,,,,=,(,),λ As we can see, for ease we take over all the logical constants of predicate logic and we add another symbol, λ, the λ-operator. I will from now on in the definitions leave out subscript TL, indicating that a set is particular to the language. I take that to be understood. Constants and variables: For each type a  TYPE, we specify a set CONa of non-logical constants of type a and a set VARa of variables of type a: As before, we can think about the non-logical constants of type a as the lexical items whose semantics we do not further specify. For every a  TYPE: CONa = {ca1,ca2,...} a set of constants of type a (at most countably many) For every a e TYPE: VARa = {xa1,xa2,...} a set of variables of type a (countably many) We will never write these type-indices explicitly in the formulas, rather we will make typographical conventions about the types of our expressions. For instance, I will typically use the following conventions: c,j,m are constants of type e x,y,z are variables of type e φ,ψ are expressions of type t P,Q are variables of type <e,t> But generally I will indicate the typographical conventions where needed. So we have as many constants as we want for each type (that is up to our choice), and we have countably many variables of each type. With this given we will specify the syntax of the language. We do that by specifying for each type a  TYPE the set EXPa, the set of well formed expressions of type a. For each a  TYPE, EXPa is the smallest set such that: 1. Constants and variables: CONa  VARa  EXPa Constants and variables of type a are expressions of type a. 2. Functional application: If α  EXP<a,b> and β  EXPa then (α(β))  EXPb 3. Functional abstraction: If x  VARa and β  EXPb then λxβ  EXP<a,b> 4. Connectives: If φ,ψ  EXPt then ¬φ, (φ  ψ), (φ  ψ), (φ  ψ)  EXPt 5. Quantifiers: If x  VARa and φ  EXPt then xφ, xφ  EXPt 6. Identity: If α  EXPa and β  EXPa then (α=β)  EXPt 28 As said before, we will ignore clause 3 of this definition for the moment. Focussing on clauses 4-6, we see that type logic contains predicate logic as part. Clause 4, concerning the connectives is just the same clause that we had in predicate logic (because FORM = EXPt). Clause 5 for quantifiers looks just the same as the corresponding clause in predicate logic, but note that in predicate logic we could only quantify over individual variables (that is, variable of type e), while in type logic, we allow quantification over variables of any type: in type logic we have higher order quantification. Similarly, while in predicate logic we can form identity statements between terms (where TERM = EXPe), in type logic we can form identity statements between any two expressions of any type, as long as they have the same type. The resulting identity statement is a formula, an expression of type t. As an aside. In predicate logic, we only need the logical constants ¬ and , predication ( ,..., ), one quantifier  and identity, and we can define all the other logical constants. In type logic we only need application ( ( )) and abstraction λ and identity =, and we can define in type logic all the connectives and quantifiers we want. This means that clauses 4 and 5 are really only for convenience. Let us now focus on clause 2, the clause for functional application. To see how this works let us specify some constants. Let us assume that we have the following constants: MARY  CONe e is the type of individuals. MAN, BOY  CON<e,t> <e,t> is the type of one-place predicates of individuals. KISSED, HUGGED  CON<e,<e,t>> <e,<e,t>> is the type of two-place relations between individuals. SOME, EVERY  CON<<e,t>,<<e,t>,t>> <<e,t>,<<e,t>,t>> is the type of determiners. OLD  CON<<e,t>,<e,t>> <<e,t>,<e,t>> is the type of one-place predicate modifiers: these are functions that take a property and yield a property. AND1  CON<<<e,t>,t>,<<<e,t>,t>,<<e,t>,t>>> <<e,t>,t> is the type of noun phrase meanings. AND1 takes a noun phrase meaning, and another noun phrase meaning and produces a noun phrase meaning. AND2  CON<<e,<e,t>>,< <e,<e,t>>,<e,<e,t>>>> Here AND is conjunction of two-place relations, it takes two two place relations, and produces a two place relation. 29 Let us now do some functional applications: HUGGED  CON<e,<e,t>>, a two-place predicate, hence (by clause 1): HUGGED  EXP<e,<e,t>> Similarly AND2  EXP<<e,<e,t>>,<<e,<e,t>>,<e,<e,t>>>> We see that: AND2  EXP<<e,<e,t>>,<<e,<e,t>>,<e,<e,t>>>> < ------,----------------> a b HUGGED  EXP<e,<e,t>> ------a Hence (AND2(HUGGED))  EXP<<e,<e,t>>,<e,<e,t>>> ---------------b Next: (AND2(HUGGED))  EXP<<e,<e,t>>,<e,<e,t>>> <-------,------> a b KISSED  EXP<e,<e,t>> ------a Hence ((AND2(HUGGED))(KISSED))  EXP<e,<e,t>> ------b We see that ((AND2(HUGGED))(KISSED)) is itself a two-place predicate. This makes it very suitable to serve as a representation for kissed and hugged (of course, we still would need to give AND2 the correct interpretation). ((AND2(HUGGED))(KISSED))  EXP<e,<e,t>> <-,---> a b MARY  EXPe a Hence (((AND2(HUGGED))(KISSED))(MARY))  EXP<e,t> ---b (((AND2(HUGGED))(KISSED))(MARY)) is a one-place predicate, again, that is what we would expect kissed and hugged Mary to be. OLD  EXP<<e,t>,<e,t>> MAN  EXP<e,t> Hence (OLD(MAN))  EXP<e,t> SOME  EXP<<e,t>,<<e,t>,t>> (OLD(MAN))  EXP<e,t> Hence (SOME((OLD(MAN))))  EXP<<e,t>,t> 30 EVERY  EXP<<e,t>,<<e,t>,t>>> BOY  EXP<e,t> Hence (EVERY(BOY))  EXP<<e,t>,t> AND1  EXP<<<e,t>,t>,<<<e,t>,t>,<<e,t>,t>>> (EVERY(BOY))  EXP<<e,t>,t> Hence (AND1((EVERY(BOY))))  EXP<<<e,t>,t>,<<e,t>,t>> (AND1((EVERY(BOY))))  EXP<<<e,t>,t>,<<e,t>,t>> (SOME((OLD(MAN))))  EXP<<e,t>,t> Hence ((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))  EXP<<e,t>,t> Assuming that <<e,t>,t> is the correct type for noun phrase interpretations, we have see that all of (SOME((OLD(MAN)))) (some old man), (EVERY(BOY)) (every boy) and ((AND1((EVERY(BOY))))((SOME((OLD(MAN)))))) (some old man and every boy) are of that type. Finally: ((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))  EXP<<e,t>,t> (((AND2(HUGGED))(KISSED))(MARY))  EXP<e,t> Hence (((AND1((EVERY(BOY))))((SOME((OLD(MAN)))))) ((((AND2(HUGGED))(KISSED))(MARY))))  EXPt We can make this more perspicuous by introducing a notation convention: Notation convention: Infix notation: Let R be an expression of type <a,<a,c>>, α and β expression of type a. Then ((R(α))(β)) is an expression of type c. We define: (β R α) := ((R(α))(β) := means 'is by definition'. Thus (β R α) is notation that we add to type theory: it is not defined as part of the syntax of the language, but is just a way in which we make our expressions more readable. With infix notation, we can write the above expression with infix notation applied to both AND1 and to AND2 as: ((EVERY(BOY)) AND1 (SOME((OLD(MAN))))) ((KISSED AND2 HUGGED)(MARY))) This is a wellformed expression of type logic of type t, a sentence, and it can be used as the representation of sentence (1). Of course, we haven't yet interpreted the expressions occurring in this expression, but we can note that, unlike the predicate logical expression, in the above expression we can find a wellformed sub-expression corresponding to each of the parts of sentence (1). (1) Some old man and every boy kissed and hugged Mary. 31 2.3. The semantics of functional type logic. Let us now move to the semantics of functional type logic. As before, we start with specifying the models for our type logical language TL. I will follow the practice here to include in the definition of the models only those aspects that vary from model to model. A model for functional type logic is a pair: M = <D,F>, where: 1. D, the domain of individuals, is a non-empty set. 2. F is the interpretation function for the non-logical constants. This looks just like predicate logic. However, since we possibly have many more types of constants, it is in the specification of the interpretation function F that differences will come in. In the model we only specify domain D, the domain of individuals. this is obviously not the only domain in which expressions are interpreted. We don't put the other domains in the definition for the model, because, once D is specified, all other domains are predictable and are given in a domain definition: Let M = <D,F> be a model for TL. For any type a  TYPETL, we define Da, the domain of type a: 1. De = D 2. Dt = {0,1} 3. D<a,b> = (Da  Db) So, the domain of type e, De is the domain of individuals given in the model M. The domain of type t, Dt, is the set of truth values {0,1}. For any sets A,B, (A  B) = {f: f is a function from A into B} Thus (A  B) is the set of all functions with domain A and range included in B. Hence D<a,b> is the set of all functions with domain Da - the set of entities in the domain of a - and with their range included in Db - the set of entities in the domain of b. In other words, D<a,b> is the set of all functions from a-entities into b-entities. Thus, D<e,t> = (De  Dt) = (D  {0,1}) The set of all functions from individuals into truth values. D<e,<e,t>> = (De  (De  Dt)) = (D  (D  {0,1})) The set of all functions that map each individual into a function from individuals into truth values. D<<e,t>,t> = ((De  Dt)  Dt) = ((D  {0,1})  {0,1}) The set of all functions that map functions from individuals into truth values onto truth values. 32 D<<e,t>,<<e,t>,t>> = ((De  Dt)  ((De  Dt)  Dt)) = ((D  {0,1})  ((D  {0,1})  {0,1})) The set of all functions that map functions from individuals into truth values onto functions that map functions from individuals into truth values onto truth values. D<<e,t>,<e,t>> = ((De  Dt)  (De  Dt)) = ((D  {0,1})  (D  {0,1}) The set of all functions that map each function from individuals into truth values onto a function from individuals onto truth values. D<t,t> = (Dt  Dt) = ({0,1}  {0,1}) The set of all functions from truth values into truth values (= the set of all one place truth functions). D<t,<t,t>> = (Dt  (Dt  Dt)) = ({0,1}  ({0,1}  {0,1})) The set of all functions from truth values into one-place truth functions (=, as we will see, the set of all two-place truth functions). The domain definition specifies a semantic domain for every possible type in TYPE. With that, we can specify the interpretation function F for the non-logical constants of the model: A model for functional type logic TL is a pair M = <D,F>, where D is a non-empty set and: for every a  TYPETL: F:CONa  Da For every type a, F maps any constant c  CONa onto an entity F(c)  Da. So F interprets a non-logical constant of type a as an entity in the domain of a. Let M = <D,F> be a model for TL. An assignment function (on M) is any function g such that: for every a  TYPETL for every x  VARa: g(x)  Da Thus an assignment function maps every variable of every type into an entity of the domain corresponding to that type. Hence, g maps variable x  VARe onto an individual g(x) in D, it maps predicate variable P  VAR<e,t> onto some function g(P) in (D  {0,1}), etc. We can now give the semantics for our functional type logical language, by specifying, as before vαbM,g, the interpretation of expression α in model M, relative to assignment function g: 33 The semantic interpretation: vαbM,g For any a,b  TYPETL: 1. Constants and variables. If c  CONa, then vcbM,g = F(c) If x  VARa, then vxbM,g = g(x) 2. Functional application. If α  EXP<a,b> and β  EXPa then: v(α(β))bM,g = vαbM,g(vβbM,g) 3. Functional abstraction. If x  VARa and β  EXPb then: vλxβbM,g = h, where h is the unique function in (Da  Db) such that: for every d  Da: h(d) = vβbM,gxd 4. Connectives. If φ,ψ  EXPt, then: v¬φbM,g = ¬(vφbM,g) v(φ  ψ)bM,g = (<vφbM,g,vψbM,g>) v(φ  ψ)bM,g = (<vφbM,g,vψbM,g>) v(φ  ψb]M,g = (<vφbM,g,vψbM,g>) Here on the right hand side we find the following truth functions: ¬:{0,1}  {0,1}; ,, are functions from {0,1} x {0,1} into {0,1}, given by: ¬ = {<0,1>,<1,0>}  = {<<1,1>,1>,<<1,0>,0>,<<0,1>,0>,<<0,0>,0>}  = {<<1,1>,1>,<<1,0>,1>,<<0,1>,1>,<<0,0>,0>}  = {<<1,1>,1>,<<1,0>,0>,<<0,1>,1>,<<0,0>,1>} 5. Quantifiers. If x  VARa and φ  EXPt then: vxφbM,g = 1 iff for every d  Da: vφbM,gxd = 1; 0 otherwise vxφbM,g = 1 iff for some d  Da: vφbM,gxd = 1; 0 otherwise 6. Identity. If α, β  EXPa then: v(α=β)bM,g = 1 iff vαbM,g = vβbM,g A sentence is a formula without free variables. Let X be a set of sentences and φ a sentence. vφbM = 1, φ is true in M, iff for every g: vφbM,g = 1 vφbM = 0, φ is false in M, iff for every g: vφbM,g = 0 X ╞ φ, X entails φ, iff for every model M: if for every δ  X: vδbM = 1 then vφbM = 1 34 2.4. Boolean algebras. Take set {a,b,c}. Its powerset pow({a,b,c} is { Ø, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c} } The following picture shows how these sets are related to each other by the subset relation : {a,b,c} {a,b} {a,c} {b,c} {a} {b} {c} Ø The line going up from {a} to {a,b} in the picture means that {a} is a subset of {a,b}. The lines are understood as transitive: there is also a line going up from {a} to {a,b,c}, and, though we don't draw it, we take it to be understood that the order is reflexitive: every element has an (invisible) line going to itself. It can easily be checked that the set pow({a,bc}) is closed under the standard set theoretic operations of complementation , union , and intersection , meaning: for every X,Y  pow({a,b,c}): XY, XY, XY  pow({a,b,c}). The structure: <pow({a,b,c}), , , , , Ø, {a,b,c}> is a Boolean algebra. A powerset Boolean algebra is a structure <pow(A), , , , , Ø, A> where: A is a set,  is the subset relation on pow(A) and , ,  are the standard set theoretic operations on pow(A). In general, Boolean algebras are structures that model what could be called naturalistic typed extensional part-of structures. Think of me and my parts. Different things can be part of me in different senses. For instance, I can see myself as the whole of my body parts, but also as the whole of my achievements, and in many other ways. -Typing restricts the notion of part-of to one of those senses. Thus, the domain in which I am the whole of my body parts will not include my achievements as part of me, even though, in another sense these are part me. -Extensionality means that in the domain chosen I am not more and not less than the whole of my parts. This means that if I choose the domain in which I am the whole of my body parts, then I cannot express in that domain that really and truly I am more than the sum of my parts. If I want to express the latter, I must do that by relating the whole of my body parts in the body part domain to me as an entity in a different domain in which I cannot be divided into body parts. This is how intensionality comes in: I can be at the same time indivisible me and the sum of my body parts, because I occur in different domains (or, alternatively, there is a cross-domain identity relation which identifies objects in different domains as me). -Naturalistic means that each typed extensional part-of notion is naturalistic: in that structure parts work the way parts naturalistically work. The claim is, then, that Boolean algebras formalize this notion. I define a successive series of notions: 35 A partial order is a pair B = <B,v>, where B is a non-empty set and v, the part-of relation, is a reflexive, transitive, antisymmetric relation on B. Reflexive: for every b  B: b v b Transitive: for every a,b,c  B: if a v b and b v c then a v c Antisymmetric: for every a,b  B: if a v b and b v a then a = b In a partial order we define notions of join, t, and meet, u. In a structure of parts, we think of join as the sum of the parts, and of meet as their overlap. Let B = <B,v> be a partial order and a,b  B: The join of a and b, (a t b), is the unique element of B such that: 1. a v (a t b) and b v (a t b). 2. For every c  B: if a v c and b v c then (a t b) v c. The meet of a and b, (a u b), is the unique element of B such that: 1. (a u b) v a and (a u b) v b. 2. For every c  B: if c v a and c v b then c v (a u b) The idea is: -To find the join of a and b, you look up in the part-of structure at the set of all elements of which both a and b are part: {c  B: a v c and b v c}. The join of a and b is the minimal element in that set. -To find the meet of a and b, you look down in the part-of structure at the set of all elements which are part both of a and of b: {c  B: c v a and c v b}. The meet of a and b is the maximal element in that set. Elements need not have a join or a meet in a partial order. A partial order in which any two elements do have a both a join and a meet is called a lattice: A lattice is a partial order B = <B,v> such that: for every a,b  B: (a t b), (a u b)  B. Not lattices: I a b 0 II 1 a III 1 b c d a b 0 In I a and b do not have a join. In II a and b do not have a meet. In III, the set {c  B: a v c and b v c} = {c,d,1}. This set doesn't have a minimum, hence a and b don't have a join. Similarly, c and d don't have a meet. 36 Lattices: IV o o Vo o o o o o o o VI o o o o o o o VII o o o o o o o o o o A partial order B = <B,v> is bounded iff B has a maximum 1 and a minimum 0 in B. 1 is a maximum in B iff 1  B and for every b  B: b v 1 0 is a minimum in B iff 0  B and for every b  B: 0 v b Obviously, then, a bounded lattice is a lattice which is bounded. Fact: Every finite lattice is bounded. Proof Sketch : If B = {b1,...,bn} then 1 = (b1 t (b2 t (...t (bn1 t bn)...) and 0 = (b1 u (b2 u (...u (bn1 u bn)...) Not every lattice is bounded, though. If we take Z, the integers, ...,1,0,1,... with the normal partial order ≤ and we define: (n u m) = the smallest one of n and m (n t m) = the biggest one of n and m then we get a lattice, but not one that has a minimum or a maximum. 0 is part of everything. Conceptually this means that 0 doesn't count as a naturalistic part. Nevertheless, it is very convenient to admit 0 to the structure, because it simplifies proofs of structural insights immensely. But indeed we do not count 0 as a naturalistic part, because we define: a and b have no part in common iff (a u b)=0 We assume as minimal constraints on naturalistic extensional typed part-of structures that they be lattices with a minimum 0. And this means that we assume that in a partof structure we can always carve out the sum of two parts; of overlapping parts, we can carve out the part in which they overlap; and for non-overlapping parts we conveniently set their overlap to 0. Now we come to the naturalistic constraints. They concern what happens to the parts of an object a when we cut it. When I say cut it, I mean cut it. In a naturalistic partof structure, cutting should be real cutting, the way my daughter does it. It turns out that real cutting imposes requirements on part-of structures that not all part of structures allowed so far can live up to. If in those structures cutting cannot mean real cutting, they cannot be regarded as naturalistic structures. 37 Naturalistic constraint on cutting 1: Suppose we have object d and part c of d: d c Now suppose we cut through d: d a b This means that d = (a t b) and (a u b) = 0. If we cut in this way, where is c going to sit? And the intuitive answer is obvious: either c is part of a, and c goes to the a-side, or c is part of b, and c goes to the b-side, or c overlaps both a and b and hence c itself gets cut into a part a1 of a and a part b1 or b, and this means that c = a1 t b1 and a1 u b1 = 0. Structures that don't satisfy this naturalistic cutting intuition miss something crucial about what parts are and how you cut. We define: Lattice B = <B,v> is distributive iff for every a,b,c  B: If c v (a t b) then either c v a or c v b or for some a1 v a and some b1 v b: c = (a1 t b1) Lattices IV and V are not distributive. IV is called the diamond, V is called the pentagon. The diamond is not distributive: 1 a c b 0 1 = (a t b), but c is not part of a, nor of b, nor do a and b have parts a1 and b1 respectively, such that c is the sum of a1 and b1. 38 The pentagon is not distributive: 1 c b a 0 1 = (a t b) but c is not part of a, not of b, and not the sum of any part of a and any part of b. There is a very insightful theorem about distributivity, which says that in essence, the pentagon and the diamond are the only possible sources of non-distributivity: Theorem: A lattice is distributive iff you cannot find either the pentagon or the diamond as a substructure. Standardly, distributivity is defined by one of the following two alternative definitions: (1) Lattice B = <B,v> is distributive iff for all a,b,c  B: (a u (b t c)) = ((a u b) t (a u c)) (2) Lattice B = <B,v> is distributive iff for all a,b,c  B: (a t (b u c)) = ((a t b) u (a t c)) I gave the above definition because its conceptual naturalness is clearer. But it is not hard to prove that either definition (1) or (2) is equivalent to the one I gave. We come to the second constraint on cutting: Naturalistic constraint on parts and cutting 2: We go back to d and c: d c The second naturalistic intuition is very simple. It is that c can indeed be cut out of d, and that cutting it out means what it intuitively does: when you cut c out of d, you remove from the partial order all parts of d that have a part in common with c. This means that you will be left only with parts of d that didn't have a part in common with c. The naturalistic constraint is that what you are left with is a part e of d such that c t e = d and c u d = 1, and its parts. What this says is that the way my daughter cuts is like this: c e 39 The constraint, thus, regulates in cutting the relation between what you cut out and what is left. What is left, when we cut out c, we call the complement of c. We define: Bounded lattice B = <B,v> is complemented iff for every b  B: there is a c  B such that (b t c) = 1 and (b u c) = 0. In non-distributive bounded lattices elements can have more than one complement. For instance, in the diamond, both a and b are complement of c. In distributive bounded lattices each element has at most one complement. Thus each element has 0 or 1 complement. This means that in a complemented distributive lattice each element has a unique complement: B = <B,v> is a Boolean lattice iff B is a complemented distributive lattice. There are two ways in which a bounded distributive lattice can fail to be a Boolean lattice. 1. The structure so to say doesn't have enough elements. Example: Let A be an infinite set and let FIN(A) = {X  A: X is finite}, the set of all finite subsets of A. Let FIN+ = FIN(A)  {A}. Then <FIN+(A),> is a bounded distributive lattice. It is a lattice because the intersection and union of finite sets is again a finite set, it has minimum Ø and maximum A, and it satisfies distributivity because  and  are distributive. But the lattice is not complemented: for instance, if a  A, {a}  FIN+(A), and the set of parts of A that don't overlap {a} is {X  A: a  X}. But these parts are not part of a part of A in FIN+(A): {X  A: a  X} is an infinite proper subset of A, and hence not in FIN+(A). 2. The structure so to say has too much elements. This is more serious. Another example of a bounded distributive lattice which is not complemented is structure VI: 1=d o a o o c e f 0 Look at c. If we cut out all the elements that have a part in common with c, we are left with e and its parts {0, f, e}, the set of elements that have no part in common with c. But something got mysteriously lost here, because c t e  d. 40 We can formulate the problem with this structure in a different way. Let B = <B, v>be a bounded distributive lattice. B is witnessed iff the proper part relation is witnessed in B. The proper part relation is witnessed in B = <B, v> iff for every a,b  B: if a  0 and a v b and a  b (a is a proper part of b, but not 0) then there is an c  B such that c  0 and c v b (and c  b) and (a u b) = 0. This means that if a is a proper part of b, and you cut a out naturalistically, with its parts, there should be something left that didn't overlap with a. Intuitively this means that you can only be a proper part of b, if there exists another proper part of b to witness your properness. This is such a natural constraint on cutting, that structures that don't satisfy it seem indeed quite unnatural. Like structure VI: element c is a proper part of element a, but c has no other proper parts (except irrelevant 0). In VI, the only complemented elements are the four corner pieces: 0 and 1 are each other's complement, and d and e are too. None of the other elements has a complement. The naturalistic constraints on cutting tell us that we want to restrict naturalistic structures minimally to witnessed distributive lattices. This means that we exclude structures like VI. This leaves in still unnaturalistic structures like FIN+. To clarify the notion between witnessed distributive lattices and Boolean lattices we generalize the binary notions of join and meet: Let B = <B,v> be a partial order and X  B. The join of X in B is the unique element tX of B such that: 1. for every x  X: x v tX 2. for every c  B: if for every x  X: x v c then tX v c The meet of X in B is the unique element uX of B such that: 1. for every x  X: uX v x 2. for every c  B: if for every x  X: c v x then c v uX These notions generalize binary (and hence finite) join and meet, because we can define the binary notions (a t b) as t{a,b} and (a u b) as u{a,b}. With the generalized notion of join, we can formulate the following fact: Fact: Let B = <B,v> be a witnessed distributive lattice. B is a Boolean lattice iff for every c  B: t{b  B: (b u c) = 0}  B Thus, in a Boolean lattice, t{b  B: (b u c) = 0} is the complement of c, we call it c: the unique element B such that (c t c) = 1 and (c u c) = 0. 41 We see that the notion of complementation can require certain infinite joins (and certain infinite meets) to exist in a structure. This does not require all infinite joins and all infinite meets to exist. A structure with the latter requirement we call complete: Lattice B = <B,v> is complete iff for every X  B: tX, uX  B. In fact, you get complete lattices with much less requirements: Theorem: Let B = <B,v> be a partial order in which every subset X has a join tX. Then B is a complete bounded lattice. Proof: -Let B be such a partial order. Take any set X. Take {c  B: for every x  X: c v x}. This is itself a subset of B, hence it has a join in B: t{c  B: for every x  X: c v x}  B. But, of course, that is exactly uX, hence uX  B. So B is a complete lattice. -B  B, hence tB  B. Obviously, tB is a maximum in B. -Ø  B, hence tØ  B. With a bit of precision you can show that tØ is a minimum in B. Facts: Every finite lattice is complete. Every powerset lattice is complete. We have already indicated why the first fact holds above: for finite lattices tX and uX can be defined in terms of the two place join and meet. A powerset lattice is complete because its domain is the set of all subsets of some set A, and for any set X  pow(A), X and X are subsets of A, hence in pow(A). Not every lattice is complete. FIN+, for one thing is not. Not even every Boolean lattice is complete. Let A be an infinite set. FIN(A) is the set of all finite subsets of A. COFIN(A) is the set of all subsets whose settheoretic complement in A is finite: COFIN(A) = {X  A: AX is finite} (So if A = N, the set of natural numbers, N{1,2,3} is cofinite, but E, the set of even numbers is not.). Now look at <FIN(A)  COFIN(A), >. This is a bounded distributive lattice with minimum Ø and maximum A, and it is complemented as well: for every X  A: (X) = A  X. Now, if X  A is finite, then A  X is cofinite and if X  A is cofinite, A  X is finite. This means that FIN(A)  COFIN(A) is a Boolean lattice. But it is not complete, for the same reason as before: if A = N, then {{n}: n is even}  (FIN(A)  COFIN(A)). But {{n}: n is even} is neither finite, nor cofinite. While the theory of Boolean lattices doesn't make completeness a defining characteristic of the structures, it seems to me that in as much as we want as naturalistic structures complemented distributive lattices rather than merely witnessed distributive lattices, we want as naturalistic structures complete Boolean lattices, rather than just Boolean lattices. It is easy to see that: Fact: Let B = <B,v> be a complete lattice. B is a Boolean lattice iff B is a witnessed distributive lattice. This means that if we take our part-of structures to be partial orders closed under generalized t, we only need to add the two naturalistic constraints of distributivity and witness to get naturalistic part-of structures. 42 For the purposes of semantics we always assume our structures to be complete structures. I introduce one more important notion: Let B = <B,v> be a partial order with minimum 0 and let a  B. a is an atom in B iff a  0 and for every b  B: if b v a then either b =0 or b = a a is an atom in B if there is nothing in between 0 and a. Let B = <B, v> be a partial order with minimum 0. B is atomic iff for every b  B{0}: there is an atom a  B such that a v b. Thus, in an atomic partial order, every element has an atomic part. De facto this means that when you divide something into smaller and smaller parts, you always end up with atoms. Facts: Every finite lattice is atomic. Every powerset lattice is atomic. That every finite lattice is atomic is obvious when you think about it: in order for an element not to have any atomic parts, you need to find smaller and smaller parts all the way down. That implies infinity. In the powerset lattice <pow(A),> the singleton sets are the atoms. Not every infinite lattice is atomic. In fact, infinite lattices, and for that matter, infinite Boolean lattices exist that are atomless, that have no atoms at all. Even stronger, complete Boolean lattices exist that are atomless. We have defined Boolean lattices. We now define Boolean algebras: A Boolean algebra is a structure B = <B, v, , u, t, 0, 1> such that: 1. <B,v> is a Boolean lattice. 2. : B  B is the operation which maps every b  B onto its complement b 2. u: (BB)B is the operation which maps every a,b  B onto their meet (a u b) 3. t: (BB)B is the operation which maps every a,b  B onto their join (a t b) 5. 0  B is the minimum of B and 1  B is the maximum of B. 43 Let A = <A, vA, A, uA, tA, 0A, 1A> and B = <B, vB, B, uB, tB, 0B, 1B> be Boolean algebras. A and B are isomorphic iff there is an isomorphism from A into B. An isomorphism from A into B is a function h:A  B such that: 1. h is a bijection, a one-one function from A onto B. 2. h preserves the structure from A onto B: a. for every a  A: h(Aa) = Bh(a) b. for every a1,a2  A: h((a1 uA a2)) = (h(a1) uB h(a2)) c. for every a1,a2  A: h((a1 tA a2)) = (h(a1) tB h(a2)) d. h(1A) = 1B e. h(0A) = 0B Fact: If h is an isomorphism from A into B then the inverse function h1 is an isomorphism from B into A. What does the notion of isomorphism tell us? Suppose that there is an isomorphism between A and an unknown structure B: A 1 c b B a ? h a b c 0 The fact that h is a bijection tells us that B has as many elements as A: A 1 B c b a a b c d e f h h i k j 0 The fact that h maps 1 onto 1 and 0 onto 0 tells us that B has a maximum 1 (say j) and a minimum 0 (say k): A 1 c b B a h a b 1 d e f h i c 0 0 44 The fact that for each element x  A h(x) = h(x) tells us that also in B every element has a complement. A 1 B c b a a b c 1 d e f d e f h 0 0 Now we start to construct. Map a and b and c onto three of these elements. -They can't map onto 0 and 1, because we have a bijection. -Also, we cannot map, say, a onto d and b onto d. The reason is the following: a t b = c. If h(a)=d and h(b) = d, then h(a) t h(b) = d t d = 1. But h(a) t h(b) = h(a t b) = h(c). So, then h(c)=1. But then h is not a bijection. So if, say, we set h(a)=d, we cannot choose d as the value for either b or c. So we set, say, h(b)=e. Then we cannot set the value for c to e. So let's say we set h(c)=f. Since a u b = 0, we know that h(a) u h(b) = h(a u b) = h(0) = 0. Hence d u e = 0. The same argument shows that d u f = 0 and that e u f = 0. So we get: A 1 c b B 1 a d e f h a b c d e 0 f 0 Now h(a) = h(a) = d h(b) = h(b) = e h(c) = h(c) = f Since a t a = 1, we get: h(a) t h(b) = h(a t b) = h(1) = 1. Hence d t e = 1: The same argument shows that d t f = 1 and e t f = 1 45 So we get: A 1 B c b a a b c 1 f e d e d h 0 f 0 Now a t b = c. Hence h(a tb) = h(c). Hence h(a) t h(b) = h(c). Hence d t e = f. We determine in the same way that d t f = e and e t f = d and we get: A 1 B c b a a b c 1 f e d e d h 0 f 0 We see that isomorphic Boolean algebras have the same Boolean structure. They at most differ in the naming of the elements, but not in the Boolean relations between the elements. The set theory that the theory of Boolean algebras is formulated in is extensional. This means that A and B are not literally the same Boolean algebra. But mathematicians (and we) think of Boolean algebras as more intensional entities. With the mathematicians we will regard A and B as the same Boolean algebra. In general, we do not distinguish between isomorphic structures: if A happens to be an algebra of sets (like a powerset) and B an algebra of, say, functions, they are the same structure if they are isomorphic. And that means that we are free to think of them as either sets or functions, whichever is more convenient. As we will see, this double nature is one of the basic tenets of the functional type theory we have introduced. Secondly, I have related A and B through isomorphism h such that h(a)=d and h(b)=e and h(c)=f. But h is not the only isomorphism between A and B. If I set, as for h, h'(a)=d, but I set h'(b)=f and h'(c)=e I get another isomorphism between A and B (by working out the other values): A 1 c b B a 1 e f d f d h' a b c 0 0 46 e I use the pictures to indicate here which element from A maps onto which element of B. While this B-picture differs from the previous B-picture (e in the middle versus f in the middle), the difference lies only in the pictures, not in structure: in pictures of lattices only the vertical ordering is real (since it corresponds to the order v), the leftright order is only artistic license: if I check which elements are part of which elements in both B-pictures I get twice the same literal structure, B. The fact that there is usually more than one isomorphism between two structures means that we can let convenience determine which isomorphism we use when we think about these structures as the same thing. This is a second important tenet of the functional type theory, as we will see. I will now discuss some theorems and facts that will give us a lot of insight in the structure of Boolean algebras. Representation Theorem: Every complete atomic Boolean algebra is isomorphic to a powerset Boolean algebra. Proof sketch: Let B be a complete atomic Boolean algebra and let ATOMB be the set of atoms in B. For every b  B let ATb = {a  ATOMB: a v b} So ATb is the set of atoms below b, the set of b's atomic parts. Define a function h: B  pow(ATOMB) by: for every b  B: h(b) = ATb. What you can prove is that h is an isomorphism between B and <pow(ATOMB), , , , , Ø, ATOMB>. So every complete atomic Boolean algebra is isomorphic to the powerset Boolean algebra of the powerset of its atoms. This is not very difficult to prove, if you have first proved the following basic fact about complete atomic Boolean algebras: Fact: In a complete atomic Boolean algebra every element is the join of its atomic parts: for every b  B: b = tATb. The representation theorem tells us that the complete atomic Boolean algebras are just the powerset Boolean algebras. Since we have already seen that every finite lattice is complete and atomic, and hence so is every finite Boolean algebra, it follows that: Corrollaries: The finite Boolean algebras are exactly the finite powerset Boolean algebras. More generally: For each cardinality n there is exactly one complete atomic Boolean algebra of cardinality 2n. There are no complete atomic Boolean algebras of other cardinalities. This follows from the Representation Theorem, Cantor's Theorem, which says that if A has cardinality n, then pow(A) has cardinality 2n, and an easy theorem which says that if A and B have the same cardinality, <pow(A),> and <pow(B),> are isomorphic. 47 I will next discuss a theorem that gives a very deep insight in the structure of Boolean algebras. The decomposition theorem: Every Boolean algebra which has an atom can be decomposed into two nonoverlapping isomorphic Boolean algebras. Proof sketch: Let B be a Boolean algebra and a an atom in B. Look at the following sets: Ba = {b  B: a v b}, the set of all elements that a is part of. Ba = {b  B: b v a}, the set of all parts of of b. It is not difficult to prove that: Ba  Ba = B and Ba  Ba = Ø. This means that Ba and Ba partition B. Now, you can restrict the operations of B to B a and the result is a Boolean algebra Ba with a as 0. Similarly, Ba forms a Boolean algebra Ba with a as 1. And you can prove that Ba and Ba are isomorphic. Example: pick atom a in A. Look at Ba and Ba: A 1 o o a o o a 0 You see the two Boolean algebras sitting in the structure (each isomorphic to pow({a,b}). It doesn't matter which atom you use here. In fact, there is a further fact: Fact: If B is a Boolean algebra and a and b are atoms in B then Ba and Bb are isomorphic. That means that in a given Boolean algebra, if you stand at any of its atom and you look up, the sky looks the same, you see the same structure (up to isomorphism). So, we also get: A o o 1 A 1 b o c o o b o o o c 0 0 And it doesn't stop here. If Ba has an atom, then by the same theorem, both Ba and Ba themselves each decompose into two non-overlapping isomorphic Boolean algebras, so, all in all we have decomposed A into four isomorphic Boolean algebras: 48 A 1 o o a o o a 0 The inverse of the decompositiuon theorem is the composition theorem: First a notion. If R is a relation then [R]TR is the transitive closure of R. [R]TR = {R': R  R' and R' is transitive} So [R]TR is the intersection of all transitive relations that R is part of. This intersection is, of course, itself a relation that R is part of. Fact: [R]TR is transitive Proof: Assume <x,y>  [R]TR and <y,z>  [R]TR. Then for every R'  {R': R  R' and R' is transitive}: <x,y>  R' and <y,z>  R' (by definition of ). Then <x,z>  R', because R' is transitive. But that means that <x,z> is in every R'  {R': R  R' and R' is transitive}, and that means it is in their intersection. Hence, <x,z>  [R]TR, and hence [R]TR is transitive. Intuitively [R]TR is the minimal way of turning R into a transitive relation: for every x,y,z such that R contains <x,y> and <y,z> add to R the pair <z,x>. This gives you a new relation R1. Do the same for R1, this gives you a new relation R2. Keep doing this until you end up with a relation that is transitive. What you end up with is [R]TR, the result of closing R under transitivity. The Composition Theorem: Let B1 and B2 be non-overlapping isomorphic Boolean algebras and h an isomorphism between B1 and B2. Define v on B1  B2 by: v = [ vB1  vB2  {<a,h(a)>: a  B1} ]TR Then <B1  B2, v> is a Boolean lattice. So you take the order on B1, unite it with the order on B2, add the pairs of the isomorphism. This puts a directly below h(a) in the new structure. Now you only need to add transitivity arrows (for every a  B1 and every a1  B1 and every b1  B2 such that a1 vB1 a and h(a) vB2 b1 then you need to add <a1,b1> to v). The theorem says that the result will be a Boolean algebra. It is easy to see that its cardinality will be |B1| + |B2|. This gives you a second procedure to generate all finite Boolean algebras (the first is as powersets), but this time one that actually gives you a plotting algorithm. Start with two one element Boolean lattices: <{0},{<0,0>}> and <{1},{<1,1>}> (These are really degenerate Boolean algebra, since 0 =1. We really start counting at 2.) And the isomorphism h = {<0,1>}. Then the composition theorem tells us how to form the two element Boolean algebra: 49 0 1-element Boolean algebra: a point. 1 1 0 0 2 1-element Boolean algebras give a 2-element Boolean algebra: a line. Take two disjoint two element Boolean algebra and an isomorphism: turn the arrow into a line (the transitive closure adds <0,1>), and you get the 4-element Boolean algebra: 1 1 a b a 0 b 0 2 2-element Boolean algebras give a 4 element Boolean algebra: a square. Take two disjoint four element Boolean algebras and an isomorphism, form the eight element Boolean algebra: o o o o o o o o o o o o o o o o 2 4-element Boolean algebras give an 8 element Boolean algebra: a cube. Take two disjoint eight element Boolean algebras and an isomorphism, form the sixteen element Boolean algebra. o o o o o o o o o o o o o o o o 2 8-element Boolean algebras form a 16 element Boolean algebra: a four dimensional cube. 50 The key to the relation between the theory of Boolean algebras and the functional type theory lies in the following theorem: The Lifting Theorem: Let B be a Boolean algebra and A a set. Define on the function space (A  B) the following relation v: f v g iff for every a  A: f(a) vB g(a) Then: 1. <(A  B), v> is a Boolean lattice. 2. <(A  B), v> is complete iff B is complete. 3. <(A  B), v> is atomic iff B is atomic. Thus, if B is a Booelan algebra, you can lift the Boolean structure from B onto the set of all functions from A into B. Now we observe the following. BOOL is the smallest set of types such that: 1. t  BOOL 2. if a  TYPE and b  BOOL then <a,b>  BOOL The lifting theorem tells us that the domain of every Boolean type has the structure of a Boolean algebra. It tells us more actually. t is not just a Boolean type, but Dt is finite, hence the domain Dt forms a complete atomic Boolean algebra. And this means that the domain of every type in BOOL forms a complete atomic Boolean algebra. With that we have proved: Corrolary: Every Boolean type is isomorphic to a powerset Boolean algebra. To apply this to the type theory we now mention Cantors cardinality theorems (the first two of the three facts mentioned, since the third is trivial): Cantor's Theorems: 1. |pow(A)| = 2|A| 2. |(A  B)| = |B||A| 3. |(A  B)| = |A|  |B| We are now ready to harvest the fruits of the discussion. 51 D<e,t> is isomorphic to pow(D) D<e,t> = (D  {0,1}) |(D  {0,1})| = |{0,1}||D| = 2|D| = |pow(D)| From this, we now know, the above statement follows. The domain of functions from individuals to truth values is, up to isomorphism, the domain of sets of individuals. Thus functions from individuals to truth values are, up to isomorphism, just sets of individuals. Which sets depends on the choice of the isomorphism. The isomorphism we choose is the one that is maximally useful for conceptual and linguistic purposes: CHAR: pow(D)  D<e,t> CHAR1: D<e,t>  pow(D) (CHAR1 = {<X,f>: X  pow(D) and f  D<e,t> and CHAR(f)=X}) CHAR is the function that maps every set onto its characteristic function. CHAR1 is the isomorphism such that for every function f  D<e,t>: CHAR1(f) = {d  D: f(d)=1} Thus, the function which maps every function onto the set characterized by it is an isomorphism (and so is the function that maps every set onto its characteristic function). Example. Let D = {a,b,c}. D<e,t> a1 b1 c1 {a,b,c} a1 b1 c0 a1 b0 c1 a0 b1 c1 {a,b} {a,c} {b,c} a1 b0 c0 a0 b1 c0 a0 b0 c1 {a} {b} {c} a0 b0 c0 Ø The isomorphisms CHAR and CHAR1 map elements in the same position the cubes onto each other. 52 We could have taken another isomorphism, say, the function that maps every f onto {d  D: f(d)=0}. While technically sound, that would only have been confusing for us. Why? Well, mainly because we are creatures of habit. We are so used to thinking of a predicate logical formula WALK(j) as expressing the truth conditions of the natural language sentence John walks, that a logic in which the formula WALK(j) would express the truth conditions of the natural language sentence John doesn't walk would bother us. Yet, technically, there wouldn't be anything wrong with such a logic, as long as the formula WALK(j) would express the truth conditions of the natural language sentence John walks. Technically, there is nothing wrong, but practically such a logic would be dreadful. Our brain wants its calculating systems to be mnemonic and gets dreadfully upset when the mnemonics is upside down (that is why, in a theory of individuals and events I change the name of type e to d, so that I can use e for events). Practically useful logics are mnemonic, and that is the reason (and, given the existence of different isomorphisms, the only reason) that we make sure that the logical formula WALK(j) expresses that FM(j)  FM(WALK) and not that FM(j)  FM(WALK). So the isomorphism we use is CHAR and this means that we idenitify the a function into truth values with the set of arguments mapped onto 1. D<e<e,t>> is isomorphic to pow(DD) D<e,<e,t>> = (D  (D  {0,1)) |D<e,<e,t>>| = |(D  {0,1}||D| = (2|D|)|D| |pow(DD)| = 2|DD| = 2|D||D| Now (2n)m = 2nm. You don't remember? Well, show first that 2n2m = 2n+m. That is just a question of counting 2's: 2n2m = (2...2)(2...2) = (2....................2) n m n+m Given this: (2n)m = (2n)...(2n) = 2(n + ... + n) = 2nm m m So indeed, D<e,<e,t>> is isomorphic to pow(DD). That means that the domain of functions from individuals to functions from individuals to truth values is, up to isomorphism, just the domain of all two place relations, where relations are sets of ordered pairs. The difference between the two perspectives becomes maybe clearer if we look at a third set: ((DD){0,1}). |((DD){0,1})| = 2|D||D|, so this domain is also isomorphic to pow(DD). 53 That's not a surprise, because we have just above studied the relation between (A  {0,1}) and pow(A). So ((DD){0,1}) is just the set of all the characteristic functions of the sets in pow(DD). So, obviously, we can switch between a set of ordered pairs and the function that maps each ordered pair in that set onto 1 and each ordered pair not in that set onto 0. Now look at ((DD){0,1}) and (D  (D  {0,1})). A function in ((DD){0,1}) takes an ordered pair in DD and maps it onto a truth value. A function in (D  (D  {0,1})) takes an individual d in D and maps it onto a function fd in (D  {0,1}. That function fd takes an individual d' in D and maps it onto a truth value fd(d'). What the isomorphism says is that the total effect of this is that in two steps an ordered pair of d' and d is mapped onto a truth value. In other words: -a function in ((DD) {0,1}) is a relation which maps two arguments simultaneously onto a truth value. An n-place relation takes n arguments. -We can think of it as a set of n-tuples. -We can also think of it as a function that takes all the n arguments simultaneously, and maps them as an n-tuple onto a truth value. An n-place relation interpreted as a function that takes all n arguments simultaneously we call an uncurried function. -We can also think of it as a function that takes the n arguments one at a time, mapping the first argument in onto an n1 place function, then mapping the second argument in onto an n2 place function, ..., mapping the n1 th argument onto a 1place function, mapping the last argument onto a truth value. An n-place relation interpreted as a function that takes the n arguments one by one we call a curried function. Though the topic is spicy, the terminology curried has not to do with Indian food, but with the mathematician Curry who in the Nineteen Thirties developed Combinatorial Logic (which is one of the characterizations of the notion of algorithmic function, computable function). Curry based Combinatorial Logic on the technique of currying functions (turning n-place functions into functions that take the arguments one at a time). The actual technique of currying functions goes back to a paper by Schönfinkel in the Nineteen Twenties. The importance of currying (and uncurrying) for semantics can hardly be underestimated. You see it directly when you consider the compositionality problem, the problem of finding interpretations for the constituents: 54 Go back to predicate logic. Take a formula like KISS(MARY, JOHN). The construction tree in the logical language of this formula is: FORM PRED2 TERM TERM KISS MARY JOHN Currying predicate logic means that we can take a function CURRY(KISS) which also takes two arguments, but one at a time: This could give us a construction tree of the following form: FORM PRED1 PRED2 TERM TERM MARY CURRY(KISS) JOHN And given our considerations about semantic domains, the semantics of CURRY(KISS) could give you the same interpretation for the sentence. In fact, to get the same semantics as you have assigned to KISS in predicate logic, the only thing you need to make sure that is that CURRY maps FM(KISS) onto the right function. But, of course, this is eminently useful if we are dealing with the interpretation of natural language expressions that combine with their arguments one at a time, like verbs: S NP VP Mary V kiss NP John The domain D<e,<e,t>> and pow(DD) are isomorphic. As before, there is more than one isomorphism between these domains. Which one we choose is determined by what we want with the theory and again the mnemonics of the logic. The mnemonics of the theory of relations is, once again, determined by how we are used to write expressions with relations in predicate logic. The mnemonics is given by the fact that we are used to regard the truth conditions of the formula KISS(j,m) as adequate for those of the natural language expression John kisses Mary. This means that when we require that <j,m>  FM(KISS) we interpret the first element of the ordered pair <j,m> as the kisser and the second element as the kissee. 55 This mnemonics is a given fact that we are not going to tinker with. (The mnemonics is, by the way, much less strong for three place predicates. If I write a formula GIVE(a,b,c) it is clear to me (and that is the mnemonics) that a is the giver. But for the other two arguments I would be likely to tell you who's on second, i.e. fix the convention.) One third of the connection between D<e,<e,t>> and pow(DD) is going to be fixed by the mnemonic fact that when we say <j,m>  FM(KISS) we fix j as the kisser. The second third is fixed by the mnemonics of type theory. The type <e,<e,t>> is interpreted as the domain (D  (D  {0,1}). We see two e's in this type, and the left right order of the e's in the type encodes the order in which the arguments go in a function of this type: <e,<e,t>> 1 2 This is the fundamental mnemonics of functional type theory: a function of type <e,<e,t>> combines with an argument of type e (1 = first argument in) to give a function of type <e,t>. That one combines with an argument of type e (2 = second argument in) to give a truth value. This mnemonics was fixed by the definition of Da. As far as the functions of this type themselves go, we read the arguments from the inside out. A curried function f in domain (D  (D  {0,1})) takes two elements p, q  D in order to get a truth value: The first argument in gives you a function in (D  {0,1}): f(p)  (D  {0,1} The second argument in gives you a truth value in {0,1}: [(f(p)](q) This means indeed that in curried functions we count from the inside out. This is not a mnemonics, this is the way the theory works: Type: <e, <e,t>> 1 Argument combination: 2 [(f(1)](2 ) And this generalizes: The type of three-place relations between individuals is: <e,<e,<e,t>>> 1 2 3 Again, the numbers indicate the order in which the arguments go in the function. If f is a function in domain D<e,<e,<e,t>>>, which is (D  (D  (D  {0,1}))), then the order the argument go in is, again, inside out: Type: <e, <e, <e,t>>> 1 Argument combination: 2 3 [ [f(1)](2) ] (3) 56 In general, for n-place functions: Type: <e, <e, <... <e, <e,t>>...>> 1 Argument combination: 2 n1 n [[...[ [f(1)](2) ] ...](n1)](n) The final third of the connection between D<e,<e,t>> and pow(DD) is not fixed by mnemonics but by what we are using the theory for. And this final third is precisely the choice of the isomorphism between the two domains. For grammatical reasons we semanticists first want to combine the interpretation of kiss with the interpretation of the object John and then with the interpretation of the subject Mary. If we write KISS for the kissing relation in pow(DD), we have already, following the standard mnemonics, interpreted <j,m>  KISS as j is the kisser and m is the kissee. If we call the isomorphism from pow(DD) into (D  (D  {0,1})) CURRY, then these two concerns fix the isomorphism CURRY: [[CURRY(KISS)](j)](m) = 1 iff <m,j>  KISS CURRY is the function from pow(DD) into D<e,<e,t>> such that: for all R  DD: for all d1,d2  D: <d1,d2>  R iff [[CURRY(R)](d2)](d1)=1 Equivalently, CURRY1 is the function from D<e,<e,t>> into pow(DD) such that: for every f  D<e,<e,t>>: CURRY1(f) = {<d1,d2>  DD: [f(d2)](d1)=1} For the type logical language this means that if we see the following expression: ((KISS(JOHN))(MARY)) we should not be misled by the left right order of the arguments in this expression to think that it means that John kisses Mary. It means that Mary kisses John. The same holds for the type <e,<e,t>>. 1 2 The numbers I have used here indicate the order in which the arguments go in and not the order in which they will be sitting in the ordered pair. Given CURRY, the order in which the arguments go in is precisely the inverse order in which they sit in the ordered pair. The above formula corresponds to the predicate logical formula: KISS( MARY, JOHN ) This may be confusing in the beginning, but there is nothing to be done about it, you need to get used to it. 57 There is nothing to be done about it, because: 1. We don't want to reinterpret KISS( MARY, JOHN) to mean that John kisses Mary. That would again be dreadfully confusing. 2. We don't want to change the type-domain relation because, as you will find out, the powerfulness of type theory as a tool lies precisely in the fact that function-argument structure can be read off directly from the types. 3. We don't want to use another isomorphism than CURRY between these two domains for linguistic reasons. What we do do is introduce relational, uncurried notation as a notation convention in the language of type logic, because relational notation has one great advantage: it reduces the amount of brackets. Given all this we define: Notation convention: Relational notation for <e,<e,t>> Let α  EXP<e,<e,t>> and β1,β2  EXPe. We define: α(β1,β2) := ((α(β2))(β1)) The notation convention extends to all domains we assume to be linked by isomorphism CURRY (that is, the isomorphism that works like CURRY works for <e,<e,t>>). This means that we understand, for α  EXP<e,<e,<e,t>>>: α(β1,β2,β3) := (((α(β3))(β2))(β1)) In fact, α can be an expression of type <b,<a,t>> and β2  EXPb and β1  EXPa. Then we also can write, if we so want, ((α(β2))(β1)) as α(β1,β2). This applies for instance to verbs like believe. We will later introduce a type of propositions, which I will here call p. This will be the type of an expression like that the earth is flat. We will have a constant BELIEVE of type <p,<e,t>> and will form expressions like ((BELIEVE(β))(JOHN)) which we interpret as John believes β And, with CURRY we will unproblematically write that in relational notation as BELIEVE(JOHN,β). This notation convention is meant to be useful. That means that we apply it when it is useful to do so. If it is not useful, we don't apply it, or we apply something else. For instance, we have assumed a constant OLD of type <<e,t>,<e,t>>. And we were able to form a formula like ((OLD(MAN))(JOHN)). Now D<<e,t>,<e,t>> is isomorphic to pow(Dpow(D)). This means that, with CURRY, we could write the above expression in relational notation: OLD(JOHN,MAN). There is nothing wrong with that, except that it is conceptually easier to think of OLD as a function from sets into sets, than as a relation between individuals and sets. We switch from the functional perspective to the set theoretic perspective when it helps us interpreting the types: so <<e,t>,<e,t>> is the type of functions from sets into sets. That's easier than functions from functions into functions. But it isn't 58 really easier than relations between individuals and sets. So we choose the middle perspective. There are a few domains where the isomorphism CURRY is not the most natural one. For those we will blatantly choose another isomorphism and another notation convention. D<<e,t>,t> is isomorphic to pow(pow(D)) D<<e,t>,t> is the type of generalized quantifiers. Since <e,t> is (up to isomorphism) the type of sets of individuals, <<e,t>,t> is the type of functions from sets of individuals to {0,1}: (pow(D)  {0,1}). Thus a generalized quantifier is a function that takes a set of individuals and maps it onto a truth value. But, of course, any such function f characterizes a set: {X  pow(D): f(X)=1}. And by now you should see that this means that D<<e,t>,t> is isomorphic to pow(pow(D)). Thus, generalized quantifiers are sets of sets of individuals. D<<e,t>,<<e,t>,t>> is isomorphic to pow(pow(D)pow(D)) Next, we look at the type <<e,t>,<<e,t>,t>> of determiners. A determiner like EVERY is a function which maps a set of individuals BOY onto a set of set of individuals: (EVERY(BOY)). The latter is a function that maps a set of individuals WALK onto a truth value: ((EVERY(BOY))(WALK)). This means that EVERY is a function which takes a set of individuals (BOY), then takes another set of individuals (WALK), and gives a truth value. But that means that EVERY is a two-place relation between sets of individuals. In Generalized Quantifier Theory, determiners are interpreted directly as relations between sets of individuals. We have formulas like EVERY[BOY,WALK] and EVERY is interpreted as the subset relation between sets of individuals. This is, of course, natural, but now we should note that the isomorphism this uses is not CURRY, but an isomorphism that, for lack of a better term I will call CORIANDER: On <e,<e,t>>, CORIANDER is the isomorphism from pow(DD) into D<e,<e,t>> such that: For every f  D<e,<e,t>>: CORIANDER1(f) = {<d1,d2>  DD: [f(d1)](d2) = 1} We generalize this to other types in the same way as CURRY. So CORIANDER is the obvious isomorphism that we didn't want for type <e,<e,t>>, but it is the one we want for type <<e,t>,<<e,t>,t>>, and that is, because determiners as two place relations first combine with the nominal argument, and then with the verbal argument. 59 If we use CURRY, we should interpret EVERY as the superset relation, and write ((EVERY(BOY))(WALK)) as EVERY(WALK,BOY). But that is pathetic. Instead, what we do is not use the relational notation introduced above, but a different relational notation (with square brackets): EVERY[BOY,WALK] := ((EVERY(BOY))(WALK)) While both notation conventions are defined for any relational type, we see that their use is not completely trivial, because they indicate a deeper conceptual difference between the semantics of determiners and the semantics of transitive verbs (indicated by the difference in mnemonics). In general. Type <a,t> is the type of functions from a-entities into truth values and this is the type of sets of a-entities. For some types the set perspective helps, for others it doesn't. Thus: -<e,t> is usually the type of sets of individuals. -<<e,t>t> is sometimes the type of functions from sets of individuals to truth values, sometimes the type of sets of sets of individuals. -<t,t> is the type of functions from truth values to truth values, the type of one-place truth functions. We don't often think of negation as the set {0}. The functional perspective is dominant here. Type <a,<a,t>> is the type of functions from a-entities into functions from a-entities into ttuth values and this is the type of relations between a-entities. -<e,<e,t>> is usually the type of relations between individuals. -<<e,t>,<<e,t>,t>> is either the type of functions from sets of individuals to sets of sets of individuals, or the type of relations between sets of individuals. -<t,<t,t>> is the type of functions from pairs of truth values to truth values, twoplace truth functions, rarely that of relations between truth values. Type <b,<a,t>> is the type of functions from b-entities into functions from a-entities into truth values. These can be functions from b-entities into sets of a-entities, or relations between a-entities and b-entities (under CURRY). Thus <p,<e,t>> is the type of relations between individuals and propositions. But <<e,t>,<e,t>> is the type of functions from sets of individuals to sets of individuals. In the course of using type theory you will get used to reading types in their intended interpretation quite automatically. Again, technically, the above choices are arbitrary, and you can feel free to use a different isomorphic interpretation if it helps you better. Conceptually, the fact that we naturally think of, say, negation as a function and not at all as a set may well point at something much more fundamental about how our brains work. 60 2.5. Interpreting type-logic. Let us come back to our sentence (1): (1) Some old man and every boy kissed and hugged Mary. We gave the following type logical representation for this sentence: (((AND1((EVERY(BOY))))((SOME((OLD(MAN)))))) ((((AND2(HUGGED))(KISSED))(MARY))))  EXPt Of this, we specified the following parts: ((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))  EXP<<e,t>,t> applies to: (((AND2(HUGGED))(KISSED))(MARY)))  EXP<e,t> AND1  CON<<<e,t>,t>,<<<e,t>,t>,<<e,t>,t>>> It applies first to: (EVERY(BOY))  EXP<<e,t>,t> and then to: (SOME((OLD(MAN))))  EXP<<e,t>,t> EVERY, SOME  CON<<e,t>,<<e,t>,t>> BOY, (OLD(MAN))  EXP<e,t> OLD  CON<<e,t>,<e,t> and MAN  CON<e,t> AND2  CON<<e,<e,t>>,< <e,<e,t>>,<e,<e,t>>>> It applies first to HUGGED and then to KISSED, both constants of type <e,<e,t>> The result is: ((AND2(HUGGED))(KISSED))  EXP<e,<e,t>> which applies to MARY  CONe The resulting expression is a wellformed expression of type t. Let M = <D,F> be a model and g an assignment function. Let us calculate the semantic interpretation of this expression: v(((AND1((EVERY(BOY))))((SOME((OLD(MAN)))))) ((((AND2(HUGGED))(KISSED))(MARY))))bM,g = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g (v(((AND2(HUGGED))(KISSED))(MARY)))bM,g) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g ((v((AND2(HUGGED))(KISSED))bM,g(vMARYbM,g)))) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g 61 ((v((AND2(HUGGED))(KISSED))bM,g(F(MARY))))) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g (((v(AND2(HUGGED))bM,g(vKISSEDbM,g))(F(MARY))))) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g (((v(AND2(HUGGED))bM,g(F(KISSED)))(F(MARY))))) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g ((((vAND2bM,g(vHUGGEDbM,g))(F(KISSED)))(F(MARY))))) = (v((AND1((EVERY(BOY))))((SOME((OLD(MAN))))))bM,g ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = ((v(AND1((EVERY(BOY))))bM,g(v(SOME((OLD(MAN))))bM,g)) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = ((v(AND1((EVERY(BOY))))bM,g((vSOMEbM,g(v(OLD(MAN))bM,g)))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = ((v(AND1((EVERY(BOY))))bM,g((F(SOME)(v(OLD(MAN))bM,g)))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = ((v(AND1((EVERY(BOY))))bM,g((F(SOME)((vOLDbM,g(vMANbM,g)))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = ((v(AND1((EVERY(BOY))))bM,g((F(SOME)((F(OLD)(F(MAN))))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = (((vAND1bM,g(v(EVERY(BOY))bM,g))((F(SOME)((F(OLD)(F(MAN))))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = (((F(AND1)(v(EVERY(BOY))bM,g))((F(SOME)((F(OLD)(F(MAN))))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = (((F(AND1)((vEVERYbM,g(vBOYbM,g))))((F(SOME)((F(OLD)(F(MAN))))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) = (((F(AND1)((F(EVERY)(F(BOY)))))((F(SOME)((F(OLD)(F(MAN))))))) ((((F(AND2)(F(HUGGED)))(F(KISSED)))(F(MARY))))) Believe it or not, this is a truth value. Why? Because F(MAN) is a function in D<e,t> and F(OLD) a function in D<<e,t>,<e,t>>, which means that F(OLD)( F(MAN) ) is a function in D<e,t> again. Since F(SOME) is a function in D<<e,t>,<<e,t>,t>>, F(SOME)( F(OLD)( F(MAN) ) is a function in D<<e,t>,t>. F(AND1) takes that function and the function F(EVERY) ( F(BOY) ) and gives you once again a function f  D<<e,t>,t>. 62 By a similar argument, F(AND2) takes the functions F(HUGGED) and F(KISSED) and gives a function in D<e,<e,t>>. This function takes F(MARY) and gives a function g  D<e,t>. And indeed f(g)  Dt. This means that we have provided a solution to the compositionality problem. But we cannot stop here. We have interpreted every expression as a non-logical constant and the semantics for our type logical puts no constraints on the interpretation of the constants (except their typing). But that means that we still do worse than predicate logic, because predicate logic got the truth conditions of our sentence right, and so far we don't. The problem is that at the moment we are saying that every boy denotes some set of sets of individuals, but we are not saying which set. Similarly for some boy. At the moment, we are saying that and1 and and2 are some functions that respectively map pairs of sets of sets of individuals onto a set of sets of individuals, and pairs of relations between individuals onto a relation between individuals, but we are not saying which functions they denote. And if we want these expressions to have the semantic properties that they have, we have to constrain their interpretations more. This we can do by putting constraints on the interpretation function for the models. That is, instead of assuming that the interpretation of AND1, AND2, EVERY, SOME can vary from model to model, we fix for every model what their interpretation is. Similarly, instead of assuming that the interpretation of OLD can vary arbitrary across models, we assume some constraint on what the interpretation of OLD can be in different models. In this way, what we are doing is restricting the class of models for our type logical language TL to the class of models that satisfy these additional constraints. These constraints capture aspects of the lexical semantics of the English lexical items that these constants are used to represent. Hence we can see this restriction as a restriction from the class of all models for type logic, to the class of all models for English (where the relevant constants satisfy postulates that are appropriate for the lexical semantics of the lexical items they represent). Such constraints on the interpretations of constants used to represent lexical items of English are called meaning postulates. I will give one example here. Take EVERY  CON<<e,t>,<<e,t>,t>>. This denoted (up to isomorphism) a relation between sets of individuals. And from Generalized Quantifier Theory we know which relation, namely the subset relation . Which function of D<<e,t>,<<e,t>,t>> is that? Well, we use the isomorphisms for that: The function of D<<e,t>,<<e,t>,t>> that corresponds to the subset relation in pow(DD) is the unique function G in D<<e,t>,<<e,t>,t>> such that: CORIANDER1(G) = {<f1,f2>: f1,f2  D<e,t> and CHAR1(f1)  CHAR1(f2)} Where CORIANDER1(G) = {<f1,f2>: [G(f1)](f2) = 1} 63 The meaning postulate now says: Let TYE, type logical English be a type logical language where the constants we choose are the ones we find useful for English. Meaning Postulate on EVERY: A model for TL M = <D,F> is only a model for TLE if F(EVERY) = G. With this meaning postulate, we start predicting correct inference patterns, i.e. we start getting the truth conditions right. We can now show: v ((EVERY(BOY))(WALK)) bM,g = 1 iff [by the semantics of TL] [ F(EVERY) ( F(BOY) ) ] (F(WALK)) = 1 iff [by the meaning postulate] [G (F(BOY)] (F(WALK)) = 1 iff [by the definition of G] <F(BOY), F(WALK)>  CORIANDER1(G) iff [by the def. of CORIANDER1] CHAR1(F(BOY))  CHAR1(F(WALK)) iff [by the definition of CHAR1) {d  D: F(BOY)(d)=1}  {d  D: F(WALK)(d)=1} iff for every d  D: if F(BOY)(d)=1 then F(WALK)(d)=1. We can similarly define the interpretations of F(SOME), F(AND1), F(AND2), and define the interpretation of F(OLD) for OLD  CON<<e,t>,<e,t>> in terms of the interpretation of F(OLDp) for OLDp  CON<e,t>. Our success in this will depend on our capacity to define these interpretations correctly. This is not so much a question of whether such definitions are possible at all: the type theory is rich enough, and we are optimistic enough about our skills as semanticists, to be confident that such definitions indeed can be given. The problem is practical: learning to do it. As you can already see from the interpretation given and the definition of function G, it can be quite hairy to keep track of which function is which, and which isomorphism we need to apply at which point. And, in fact, you don't have to learn this, if you don't want to. And that is because we equipped the type logical language with a function definition operation, the lambda operator, which, as we will see, will actually do the hard work for you. To that we now turn. 64

Functional Type Theory

Related documents

Products

Support

Functional Type Theory

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib