The ARC Project: Creating Logical Models of Gothic Cathedrals Using Natural Language Processing Charles Hollingsworth (cholling@gmail.com) Stefaan Van Liefferinge (svlieffe@uga.edu) Rebecca A. Smith (rsmith17@uga.edu) Michael A. Covington (mc@uga.edu) Walter D. Potter (potter@uga.edu) This research benefited from the generous support of a Digital Humanities Start-Up Level 1 Grant from the National Endowment for the Humanities (Grant Number HD5110110), a University of Georgia Research Foundation Grant, and from The University of Georgia President's Venture Fund. About ARC • ARC (Architecture Represented Computationally) is a collaborative project between architectural historians and artificial intelligence researchers • Our goal is to assist architectural historians (and others) with the task of gathering and using information from architectural descriptions • Specifically, we aim to create a logical representation for Gothic cathedrals, closely tied to the semantics of natural language, that reflects the mental model historians have of the "typical" Gothic cathedral. • This model can then be used to create representations of specific cathedrals based on verbal descriptions Why Gothic? • Gothic cathedrals are major monuments of cultural heritage • Gothic is particularly suited for logical analysis • Structure follows a logical form • Many typical features, such as pointed arches and cruciform floor plan • Much repetition of elements, such as columns and vaulting units Basic Outline of ARC Superuser Mode User Mode Administrator • A small set of Mode • Cannot add superusers • Administrators new create and edit input information information, but generic model of about specific can submit a Gothic buildings queries about cathedral the model • Need only • Consists of describe how • Can test features all or they differ from models for most Gothic the generic completeness cathedrals have model and in common consistency ARC English: An Architectural Description Language • At the superuser level, ARC is an exercise in natural language programming • Rather than enter information using Prolog or other programming language syntax, the superuser will enter information in "ARC English" • This is a true subset of English that is expressive enough to describe the necessary architectural entities, their properties, and their relationships (spatial and functional) to each other. • It should allow for multiple ways of expressing the same idea, rather than enforcing a strict syntax in the manner of programming languages Example of ARC English A column is a type of support. Every column has a base, a shaft, and a capital. Most columns have a plinth. The base is above the plinth, the shaft is above the base, and the capital is above the shaft. Some columns have a necking. The necking is between the shaft and the capital. Some challenges • Referring to unnamed entities: Skolem functions are used in place of proper nouns, allowing us to describe properties of hypothetical or nonspecific entities such as "each column's base" • Context sensitivity: When we say "the nave" or "the capital", which one are we referring to? This depends on what was said in previous sentences. Analysis takes place at the level of discourse, not at the sentence level. • Defeasible reasoning: "Most columns have a necking" makes no definite universal claim; allows for the possibility that a particular column has no necking • Partial ordering: If we're just told that the capital is above the shaft, we don't know that it's immediately above From ARC English to real-world descriptions • No matter how carefully we design ARC English, it will never capture the full range of English as used in scholarly articles about architecture • Real-world descriptions frequently contain information irrelevant to ARC, for example historical background • The task of the Administrator Mode software is more information extraction than natural language programming • The generic model tells us what hasn't been specified, and the software can search real-world descriptions to fill in the gaps (e.g. how many vaulting units are in the nave, whether the columns have a necking, how many stories in the elevation) Querying ARC • User mode interaction with ARC recalls natural-language database querying • Sample queries might include "How many vaulting units are in the nave at Saint-Denis?" or "Show me all cathedrals with a four-story elevation." • Whereas web searches only look for strings of characters, the ARC software will be able to process queries on a semantic level, resulting in more relevant information • ARC queries can also tell us whether a given description is underspecified (does not tell us all relevant information) or contradictory (contains incompatible information)