Foreward David L.Cohn In its early days, information technology focused on the capture, processing, storage and transfer of data. For each step, structures and standards were established and served as the foundation for subsequent phases. IBM’s universal punched cards captured data in volume, preparing it for processing. Electronics and programming languages established mechanisms and disciplines for that processing. Databases and query languages formalized data storage, and communication protocols led to widely accepted data communication. Classical information technology has focused on processing data. Indeed, when I was young (which some are not sure was ever the case), the field was called Data Processing. It has primarily dealt with the applications that did the processing (defined by Glushko and McGrath as “software artifacts that present, collect, and manipulate information”). We have a vast literature on modeling, creating, defining, testing and describing these processes. They are important because, without them, nothing would happen. However, as we move comfortably into the 21st century, information technology is evolving into Business Informatics. This term recalls the dramatic transformation information technology brought to biology through bioinformatics. We’ll likely see similar impact on business. With Business Informatics, we deal directly with the very concepts of data: what it means, how it is represented and which elements are related. These meanings, representations and relationships are present when data is structured into documents. Documents have long been important, but HTML and the World Wide Web dramatically increase their value. They’ve accelerated document exchange and emphasized the need for structures and discipline. These structures and disciplines are what Document Engineering is about, and the document-centric view is where this book is leading us. Applications are to information technology as verbs (the action words) are to human language. But human language would be useless without nouns (the actor words). In fact, nouns play a larger role in language than verbs. According to Princeton University’s Cognitive Science Laboratory, the English language has 114,648 distinct nouns but only 11,306 verbs (see wordnet.princeton.edu for a neat online lexical reference system). However, language depends on both and on their close relationship. Glushko and McGrath understand the dualism of information technology’s nouns and verbs. They note, “it is undeniable that documents and processes have an inseparable and complementary relationship.” However, the evolution of information technology has not supported this duality. If it had, we would have the tools to model, create, define, test and describe documents, just as we do for processes. Where are they? They are in Document Engineering. Unfortunately, the problem of creating these tools is hard. Just as there ten times as many nouns in English as verbs, we seem to have ten times as many ways of representing information as of processing it. Glushko and McGrath have laid down an organized approach to identify the key documents, canonize their representations and leverage these to solve the larger problem. They have begun to develop the structure that will lead us to the needed tool set. And there is good news along the way. The document view of Business Informatics may be more natural than the process view. Documents are concrete entities, and people are comfortable agreeing on their description and meaning; processes are abstract, and consensus is difficult. In the work described in this book, and in related efforts covered elsewhere, document-based analysis is proving to be a powerful technique for designing, building and managing information systems. The journey is, indeed, the proverbial thousand miles; this book has begun it with well more than the usual single step. Fortunately, we don’t have to reach the final destination to reap substantial rewards. David L. Cohn Director, Business Informatics IBM Research Yorktown Heights, New York