Taxonomic Structures for School Mathematics

advertisement
Taxonomies for School and College Mathematics
The American Mathematics Metadata Task Force
PRE-DRAFT DOCUMENT
Introduction.
Professional mathematicians are at home with the Mathematics Subject Classification
(MSC) used by Math Reviews, Zentralblatt, and other sources. This classification has a
simple structure – it is a tree of depth two augmented by many cross-references. It works
well for researchers and advanced students who have a clear picture of the logical
structure of the discipline of mathematics, its fields and sub-fields, and how they
interrelate. However, it is inappropriate for students and teachers who are not operating
at the level of professional mathematicians and it does not address the vast body of
mathematics learned in primary school, secondary school, and college.
This paper describes work on a set of taxonomies covering these areas of mathematics.
The impetus for this work came from the more general problem of defining metadata for
electronically available mathematical resources and the more specific problem of
cataloging and enabling useful searching of digital libraries in the mathematical sciences.
We are grateful for the financial support supplied by the Eisenhower National
Clearinghouse for Science and Mathematics and the other forms of support given by the
National Engineering Education Delivery System, the Math Forum, the American
Mathematics Society, and the Mathematical Association of America.
Three Levels of Mathematics.
The first major decision made was to divide mathematics into three levels. Some
division is necessary because the meaning and usage of the same terms are different at
different levels. Semantics is dictated not only by internal logic but also by cultural and
natural language considerations, and these factors appear in different forms and to
different degrees at different educational levels.
Three levels of mathematics appear sufficient to reflect the major changes in usage and
attitude that occur along the standard path from kindergarten to graduate school and there
are satisfactory signs of semantic homogeneity within the levels chosen. Further
subdivision might have some advantages, but the disadvantages of defining and
maintaining more individual taxonomies argue strongly for a simple trichotomy. The
levels are defined as follows.
Level III. Level III is the professional and pre-professional level, represented by
advanced mathematics majors, graduate students, researchers, and college teachers. At
this level mathematical terms have precisely defined meanings within the axiomatic
structure of mathematics. The only culture at play is that of mathematics itself, and there
are clear notions of which terms are special cases of other terms. This level can be
adequately handled by the MSC.
Level II. Level II encompasses everything from high school algebra and geometry
through non-math major courses in linear algebra, probability and statistics, ordinary
differential equations, and the like. Level II covers the mathematics that comes after the
introduction of variables and before a reliance on formal proof.
This might seem quite broad from the perspective of the American educational system,
but many “college subjects”, including calculus, linear algebra, and probability, are either
taught in high schools or are mentioned in high school classes with the same basic usage
of terminology. Basic mathematical objects, such as real numbers, vectors, and
polynomials, also have the same meaning across this entire level of mathematics.
Students at this level are expected to learn the names of the areas they are studying, so
resources can be described for both students and teachers using the same vocabulary.
The MSC cannot be used for Level II. The topics covered are at this level are too
elementary to appear in MSC and there are many places where terms that do appear in
MSC, like algebra, are neither as general or precise as the meanings of the same terms in
professional mathematics.
Level I. The first level of mathematics roughly covers K – 8 in America. It can be
viewed as pre-variable mathematics. At this level purely mathematical topics coexist
with the techniques and objects used to learn them. Thus a mathematics teacher preparing
a geometry lesson might search for tangrams, which would not be recognized as a topic
at Level II or above. Whereas we would expect a Level II student to know enough to
look for multiplication, the term times table is synonymous with multiplication table for a
Level I student or teacher.
Taxonomies and Subsets of Resource Space
The guiding metaphor for the taxonomies developed here is that of searching. Both
taxonomies and searching can be defined in terms of a resource space whose points
represent all of the documents, applets, lesson plans, online courses, software, etc. that
theoretically could be found and used. Each term in a taxonomy defines a subset of
resource space which will be called a basic taxonomic set. Boolean combinations of
terms can be used to specify more sets called taxonomically definable sets. A search is
an attempt to navigate to a particular set of resources, and the hope is that this set is either
taxonomically definable or nearly so.
In the classic view, a taxonomy is a tree based on the logical structure of disciplinary
knowledge. Reflecting this logic, the taxonomic notions of narrowing and broadening are
strict containment relations among corresponding sets in resource space. This is
equivalent to saying that given any two basic taxonomic sets, either they are disjoint or
one is contained in the other.
In fact, this is too restrictive in practice and the mechanism of “see also” is used to
describe overlaps among basic taxonomic sets. But even if taxonomies based on logical
structures were trees, they would not be appropriate for searching in a space where
several different semantic interpretations are at work at the same time. Moreover,
approaching taxonomies by starting with definitions is backwards from the point of view
of the searcher. It would be far better to observe which subsets of resource space result
from satisfactory searches and to then construct an optimal taxonomy that makes these
sets definable.
Although some excellent theoretical work has been done by Hazewinkel and others, there
is no existing technology for constructing optimal taxonomies from observed search
patterns. It was therefore necessary to construct taxonomies “by hand.” The taxonomies
proposed for Level I and Level II represent a compromise between an approach based on
definitions and an approach based on searching. They were constructed by starting with
lists of vocabulary used in digital libraries and by the National Council of Teachers of
Mathematics. The digital library lists had been developed by catalog librarians and
subsequently modified to reflect user behavior. The choices of terms from these lists and
the internal relationships among them were developed using professional judgment about
their logical relationships, their relationships in the minds of the target audience of
teachers and students, and the frequency of resources that they represent. The developers
were also conscious that taxonomies have the ability to affect as well as reflect usage.
They endeavored to build in a judicious amount of guidance coming from a more
advanced overview of mathematics.
Taxonomic Trees for Level I and Level II
The taxonomic structures for Levels I and II start with sets of keywords that are used to
form a collection of small taxonomic trees of depth three whose roots have a single
History
Geometry
Pi
Notation
Geometry
Circle
Area
Pi
Chord
Diameter
FIGURE I
branch but which can have any number of leaves. An example of parts of two taxonomic
trees is given in Figure I. Nodes of taxonomic trees are called terms and can be
represented as strings of keywords concatenated using dots. Examples include geometry,
geometry.pi, geometry.pi.history, and geometry.pi.notation. Notice that it does not
suffice simply to name a key word. In the example in Figure I the key word pi appears in
two different places, and indeed in many other places in the actual proposed taxonomies.
As a data structure, the taxonomies presented here are a form of structured data.
In order to make it easier to represent the taxonomies as tables, it is convenient to
distinguish nodes at different depths of a taxonomic tree. The root of a taxonomic tree is
called a root term, the children of root terms are called main terms, and the leaves are
called leaf terms.
Each term corresponds to a set of resources. Root terms define general contexts, such as
geometry or algebra. Main terms must be interpreted in the context of their parent root
terms Thus geometry.pi refers to resources about pi in the context of geometry (meaning
pi as the ratio the circumference to the diameter of a circle) whereas number.pi refers to
resources about pi in the context of number (meaning pi as a real number). Similarly, the
leaves of a taxonomic tree are interpreted in the context of their parent main terms.
Geometry.pi.history refers to the history of the geometric object pi and would not, for
example, normally include resources discussing the irrationality or transcendence of pi.
Broadening, Narrowing, and Non-transitivity
Classical thesauri contain notions of broadening and narrowing corresponding to strict
containment among basic taxonomic subsets of resource space. The taxonomies
presented here use a weaker, less precise, and somewhat subjective notion that reflects
the fact that many of the relations among terms stem from educational or popular usage
as opposed to the logical structure of mathematics. In Level I and Level II, a term B is a
narrowing of a term A if the set of resources defined by term B is largely contained in the
set of resources defined by term A. Broadening remains the opposite of narrowing.
The parent and child relationships in a taxonomic tree are supposed to be broadening and
narrowing. Main terms narrow root terms and leaf terms narrow main terms. But since
the underlying notions are defined loosely, there is no reason to assume that they are
transitive. This is one reason that the basic structure of Level I and Level II consists
many very simple trees instead of a more complex larger tree. The other reason is that
this structure is designed so that the fundamental information needed at any point in a
search is how to broaden and narrow the context, and nothing more.
In classical thesauri narrowing and broadening apply to keywords. In fact, terms and
keywords are usually the same thing since the data in a classical thesaurus is not viewed
as structured data in the way done here. It is therefore tempting to use the taxonomic
trees to define broadening and narrowing of keywords alone, forgetting about the added
information given by the context of the keyword. Thus one might say that pi is a
narrowing of geometry. This is a useful (if imprecise) point of view and one that is likely
to be taken in practice, but it leads to an even more serious type of intransitivity. For
example, in the context of geometry, pi is a narrowing of circle and in the context of
number, calculation is a narrowing of pi, but it would be hard to argue that calculation is
a narrowing of circle. The problem is that pi has been used in different contexts and
replacing terms by keywords loses this information.
Alternates and Alternates With Possible Broadening
Broadening and narrowing cannot handle synonyms among keywords. In a professional
taxonomy, this may not be a problem since a choice of notation or terminology can be
made, but in taxonomies for more general users it is necessary to include keywords that
mean the same or almost the same thing. At the elementary school level, for example, it
is necessary to include both multiplication table and times table to accommodate the
universe of students, teachers, and parents.
This gives rise to the notion of an alternate, which expresses that two terms are often
synonymous or at least describe roughly the same subsets of resource space. This is a
nice symmetric relation, but more is needed. Consider, for example, the pair of terms
mathematical_reasoning.argument and mathematical_reasoning.proof. In pre-rigorous
mathematics these can be used more or less interchangeably when describing resources,
but at some point proofs become more formalized and specialized than mere arguments.
Another example is given by the words magnification and dilation in the context of
geometric transformations. Dilations are sometimes used to mean only magnifications
and sometimes include both magnifications and contractions.
This asymmetry is reflected by a new relationship called an alternate with possible
broadening. Recalling that terms are defined within contexts, A is an alternate with
possible broadening of B if sometimes A and B define sets of resources with a small
symmetric difference and sometimes A is a broadening of B. In cases where both
equivalence and broadening occur, it may be possible to resolve the ambiguity by
breaking the common context of A and B into smaller pieces, e.g., as in the case of proof
and argument, but it may not be desirable to introduce more granularity into the
taxonomy. In other cases, where there are simply alternate common usages, the
ambiguity is a function of language and cannot be resolved in any case.
To review, two terms are simply alternates if they describe roughly the same sets of
resources and one term is an alternate with possible broadening of another if the first is
sometimes an alternate and sometimes a broadening and the two situations both occur in
the common contexts of the two terms.
Granularity and Decision Procedures
When constructing taxonomies, the granularity of the basic taxonomic sets should be
taken into account when deciding whether to include or exclude keywords and terms.
The goal in constructing Level I and Level II was to achieve a fairly uniform level of
granularity, since that makes searching more efficient, and to keep the sets of resources
defined fairly large. For example, the Pythagorean theorem was included as a keyword
because it was felt that a significant number of online educational resources are devoted
to the Pythagorean theorem in one form or another, but other named theorems such as
Sperner’s lemma were excluded because the associated sets of resources were viewed as
too small.
Another important principle was that of minimality: narrowings of existing terms would
not be included if the basic taxonomic set they would add was very close to a set that was
already definable as a Boolean combination of existing basic taxonomic sets. To
understand the difference, consider the keywords proof and geometry and suppose that
the terms geometry (as a root term) and mathematical_reasoning.proof (as a main term)
already existed. Should we add the term geometry.proof as a main term? The answer
“yes” because “proofs in the context of geometry” define the set of resources that contain
specific kinds of proofs, e.g., two-column proofs and proofs that use geometric axiom
systems, whereas “mathematical.reasoning.proof and geometry” applies to any resource
that involves a proof and geometry.
Status and Applications of the Work
The subject classifications defined here are still in draft form and are undergoing pubic
review by the mathematical community. It is anticipated that a usable version will be
released in the Summer or Fall of 2000. Several digital library projects mentioned in the
introduction are waiting to incorporate the work of the American Mathematics Metadata
Task Force into their collections. You are invited to learn more about this at the Web site
http://mathmetadata.org/ammtf.
Download