Taxonomies for School and College Mathematics The American Mathematics Metadata Task Force PRE-DRAFT DOCUMENT Introduction. Professional mathematicians are at home with the Mathematics Subject Classification (MSC) used by Math Reviews, Zentralblatt, and other sources. This classification has a simple structure – it is a tree of depth two augmented by many cross-references. It works well for researchers and advanced students who have a clear picture of the logical structure of the discipline of mathematics, its fields and sub-fields, and how they interrelate. However, it is inappropriate for students and teachers who are not operating at the level of professional mathematicians and it does not address the vast body of mathematics learned in primary school, secondary school, and college. This paper describes work on a set of taxonomies covering these areas of mathematics. The impetus for this work came from the more general problem of defining metadata for electronically available mathematical resources and the more specific problem of cataloging and enabling useful searching of digital libraries in the mathematical sciences. We are grateful for the financial support supplied by the Eisenhower National Clearinghouse for Science and Mathematics and the other forms of support given by the National Engineering Education Delivery System, the Math Forum, the American Mathematics Society, and the Mathematical Association of America. Three Levels of Mathematics. The first major decision made was to divide mathematics into three levels. Some division is necessary because the meaning and usage of the same terms are different at different levels. Semantics is dictated not only by internal logic but also by cultural and natural language considerations, and these factors appear in different forms and to different degrees at different educational levels. Three levels of mathematics appear sufficient to reflect the major changes in usage and attitude that occur along the standard path from kindergarten to graduate school and there are satisfactory signs of semantic homogeneity within the levels chosen. Further subdivision might have some advantages, but the disadvantages of defining and maintaining more individual taxonomies argue strongly for a simple trichotomy. The levels are defined as follows. Level III. Level III is the professional and pre-professional level, represented by advanced mathematics majors, graduate students, researchers, and college teachers. At this level mathematical terms have precisely defined meanings within the axiomatic structure of mathematics. The only culture at play is that of mathematics itself, and there are clear notions of which terms are special cases of other terms. This level can be adequately handled by the MSC. Level II. Level II encompasses everything from high school algebra and geometry through non-math major courses in linear algebra, probability and statistics, ordinary differential equations, and the like. Level II covers the mathematics that comes after the introduction of variables and before a reliance on formal proof. This might seem quite broad from the perspective of the American educational system, but many “college subjects”, including calculus, linear algebra, and probability, are either taught in high schools or are mentioned in high school classes with the same basic usage of terminology. Basic mathematical objects, such as real numbers, vectors, and polynomials, also have the same meaning across this entire level of mathematics. Students at this level are expected to learn the names of the areas they are studying, so resources can be described for both students and teachers using the same vocabulary. The MSC cannot be used for Level II. The topics covered are at this level are too elementary to appear in MSC and there are many places where terms that do appear in MSC, like algebra, are neither as general or precise as the meanings of the same terms in professional mathematics. Level I. The first level of mathematics roughly covers K – 8 in America. It can be viewed as pre-variable mathematics. At this level purely mathematical topics coexist with the techniques and objects used to learn them. Thus a mathematics teacher preparing a geometry lesson might search for tangrams, which would not be recognized as a topic at Level II or above. Whereas we would expect a Level II student to know enough to look for multiplication, the term times table is synonymous with multiplication table for a Level I student or teacher. Taxonomies and Subsets of Resource Space The guiding metaphor for the taxonomies developed here is that of searching. Both taxonomies and searching can be defined in terms of a resource space whose points represent all of the documents, applets, lesson plans, online courses, software, etc. that theoretically could be found and used. Each term in a taxonomy defines a subset of resource space which will be called a basic taxonomic set. Boolean combinations of terms can be used to specify more sets called taxonomically definable sets. A search is an attempt to navigate to a particular set of resources, and the hope is that this set is either taxonomically definable or nearly so. In the classic view, a taxonomy is a tree based on the logical structure of disciplinary knowledge. Reflecting this logic, the taxonomic notions of narrowing and broadening are strict containment relations among corresponding sets in resource space. This is equivalent to saying that given any two basic taxonomic sets, either they are disjoint or one is contained in the other. In fact, this is too restrictive in practice and the mechanism of “see also” is used to describe overlaps among basic taxonomic sets. But even if taxonomies based on logical structures were trees, they would not be appropriate for searching in a space where several different semantic interpretations are at work at the same time. Moreover, approaching taxonomies by starting with definitions is backwards from the point of view of the searcher. It would be far better to observe which subsets of resource space result from satisfactory searches and to then construct an optimal taxonomy that makes these sets definable. Although some excellent theoretical work has been done by Hazewinkel and others, there is no existing technology for constructing optimal taxonomies from observed search patterns. It was therefore necessary to construct taxonomies “by hand.” The taxonomies proposed for Level I and Level II represent a compromise between an approach based on definitions and an approach based on searching. They were constructed by starting with lists of vocabulary used in digital libraries and by the National Council of Teachers of Mathematics. The digital library lists had been developed by catalog librarians and subsequently modified to reflect user behavior. The choices of terms from these lists and the internal relationships among them were developed using professional judgment about their logical relationships, their relationships in the minds of the target audience of teachers and students, and the frequency of resources that they represent. The developers were also conscious that taxonomies have the ability to affect as well as reflect usage. They endeavored to build in a judicious amount of guidance coming from a more advanced overview of mathematics. Taxonomic Trees for Level I and Level II The taxonomic structures for Levels I and II start with sets of keywords that are used to form a collection of small taxonomic trees of depth three whose roots have a single History Geometry Pi Notation Geometry Circle Area Pi Chord Diameter FIGURE I branch but which can have any number of leaves. An example of parts of two taxonomic trees is given in Figure I. Nodes of taxonomic trees are called terms and can be represented as strings of keywords concatenated using dots. Examples include geometry, geometry.pi, geometry.pi.history, and geometry.pi.notation. Notice that it does not suffice simply to name a key word. In the example in Figure I the key word pi appears in two different places, and indeed in many other places in the actual proposed taxonomies. As a data structure, the taxonomies presented here are a form of structured data. In order to make it easier to represent the taxonomies as tables, it is convenient to distinguish nodes at different depths of a taxonomic tree. The root of a taxonomic tree is called a root term, the children of root terms are called main terms, and the leaves are called leaf terms. Each term corresponds to a set of resources. Root terms define general contexts, such as geometry or algebra. Main terms must be interpreted in the context of their parent root terms Thus geometry.pi refers to resources about pi in the context of geometry (meaning pi as the ratio the circumference to the diameter of a circle) whereas number.pi refers to resources about pi in the context of number (meaning pi as a real number). Similarly, the leaves of a taxonomic tree are interpreted in the context of their parent main terms. Geometry.pi.history refers to the history of the geometric object pi and would not, for example, normally include resources discussing the irrationality or transcendence of pi. Broadening, Narrowing, and Non-transitivity Classical thesauri contain notions of broadening and narrowing corresponding to strict containment among basic taxonomic subsets of resource space. The taxonomies presented here use a weaker, less precise, and somewhat subjective notion that reflects the fact that many of the relations among terms stem from educational or popular usage as opposed to the logical structure of mathematics. In Level I and Level II, a term B is a narrowing of a term A if the set of resources defined by term B is largely contained in the set of resources defined by term A. Broadening remains the opposite of narrowing. The parent and child relationships in a taxonomic tree are supposed to be broadening and narrowing. Main terms narrow root terms and leaf terms narrow main terms. But since the underlying notions are defined loosely, there is no reason to assume that they are transitive. This is one reason that the basic structure of Level I and Level II consists many very simple trees instead of a more complex larger tree. The other reason is that this structure is designed so that the fundamental information needed at any point in a search is how to broaden and narrow the context, and nothing more. In classical thesauri narrowing and broadening apply to keywords. In fact, terms and keywords are usually the same thing since the data in a classical thesaurus is not viewed as structured data in the way done here. It is therefore tempting to use the taxonomic trees to define broadening and narrowing of keywords alone, forgetting about the added information given by the context of the keyword. Thus one might say that pi is a narrowing of geometry. This is a useful (if imprecise) point of view and one that is likely to be taken in practice, but it leads to an even more serious type of intransitivity. For example, in the context of geometry, pi is a narrowing of circle and in the context of number, calculation is a narrowing of pi, but it would be hard to argue that calculation is a narrowing of circle. The problem is that pi has been used in different contexts and replacing terms by keywords loses this information. Alternates and Alternates With Possible Broadening Broadening and narrowing cannot handle synonyms among keywords. In a professional taxonomy, this may not be a problem since a choice of notation or terminology can be made, but in taxonomies for more general users it is necessary to include keywords that mean the same or almost the same thing. At the elementary school level, for example, it is necessary to include both multiplication table and times table to accommodate the universe of students, teachers, and parents. This gives rise to the notion of an alternate, which expresses that two terms are often synonymous or at least describe roughly the same subsets of resource space. This is a nice symmetric relation, but more is needed. Consider, for example, the pair of terms mathematical_reasoning.argument and mathematical_reasoning.proof. In pre-rigorous mathematics these can be used more or less interchangeably when describing resources, but at some point proofs become more formalized and specialized than mere arguments. Another example is given by the words magnification and dilation in the context of geometric transformations. Dilations are sometimes used to mean only magnifications and sometimes include both magnifications and contractions. This asymmetry is reflected by a new relationship called an alternate with possible broadening. Recalling that terms are defined within contexts, A is an alternate with possible broadening of B if sometimes A and B define sets of resources with a small symmetric difference and sometimes A is a broadening of B. In cases where both equivalence and broadening occur, it may be possible to resolve the ambiguity by breaking the common context of A and B into smaller pieces, e.g., as in the case of proof and argument, but it may not be desirable to introduce more granularity into the taxonomy. In other cases, where there are simply alternate common usages, the ambiguity is a function of language and cannot be resolved in any case. To review, two terms are simply alternates if they describe roughly the same sets of resources and one term is an alternate with possible broadening of another if the first is sometimes an alternate and sometimes a broadening and the two situations both occur in the common contexts of the two terms. Granularity and Decision Procedures When constructing taxonomies, the granularity of the basic taxonomic sets should be taken into account when deciding whether to include or exclude keywords and terms. The goal in constructing Level I and Level II was to achieve a fairly uniform level of granularity, since that makes searching more efficient, and to keep the sets of resources defined fairly large. For example, the Pythagorean theorem was included as a keyword because it was felt that a significant number of online educational resources are devoted to the Pythagorean theorem in one form or another, but other named theorems such as Sperner’s lemma were excluded because the associated sets of resources were viewed as too small. Another important principle was that of minimality: narrowings of existing terms would not be included if the basic taxonomic set they would add was very close to a set that was already definable as a Boolean combination of existing basic taxonomic sets. To understand the difference, consider the keywords proof and geometry and suppose that the terms geometry (as a root term) and mathematical_reasoning.proof (as a main term) already existed. Should we add the term geometry.proof as a main term? The answer “yes” because “proofs in the context of geometry” define the set of resources that contain specific kinds of proofs, e.g., two-column proofs and proofs that use geometric axiom systems, whereas “mathematical.reasoning.proof and geometry” applies to any resource that involves a proof and geometry. Status and Applications of the Work The subject classifications defined here are still in draft form and are undergoing pubic review by the mathematical community. It is anticipated that a usable version will be released in the Summer or Fall of 2000. Several digital library projects mentioned in the introduction are waiting to incorporate the work of the American Mathematics Metadata Task Force into their collections. You are invited to learn more about this at the Web site http://mathmetadata.org/ammtf.