ENVIRONMENTAL INFORMATICS (1) Draft outline of a discipline devoted to the study of environmental information: creation, storage, access, organization, dissemination, integration, presentation and usage • Rudolf B. Husar • Center for Air Pollution Impact and Trend Analysis (CAPITA) • Washington University St. Louis, MO 63130 • September 1992 R. Husar 1992 ENVIRONMENTAL INFORMATICS: Application of Information Science, Engineering and Technology to Environmental Problems Rudolf B. Husar Director, Center for Air Pollution Impact and Trend Analysis Washington University, St. Louis, MO Environmental information is becoming unmanageable by traditional methods. There is a need to develop effective methods to store, organize, access, dessimilate, filter, combine and deliver this peculiar resource. Information science is to explain information as a resource and the manner in which it is created, transformed and used. Information engineering deals with the design of information systems, while information technology deals with the actual processes of storage transformation and delivery Presented topics will include: information as a resource; user driven data model; valueadded processes; application of database, geographic information systems, hypertext, multimedia, expent system technologies;and the integration of these technologies into information systems. The principles of Environmental Informatics will be discussed in the context of Global Change databases organized by ORNL-CDIAS, NASA and by Washington University. The talk will be augmented a live demonstration of the Voyager 1 Data Delivery System that combines database, GIS, hypertext, direct manipulation and multimedia technologies. R. Husar 1992 THE PROBLEM: The researcher cannot get access to data; if he can, he can not read them; if he can read them, he does not know how good they are; and if he finds them good he can not merge them with other data Form: Information Technology and the Conduct of Research: The Users View. National Academy Press, Washington, D.C. 1989 R. Husar 1992 DATA PATHWAY DATA PATHWAY Monitoring Site Principal Investigator Information Centers R. Husar 1992 INFORMATICS - THE SCIENCE Systems exists that organize, store, manipulate, retrieve, analyze, evaluate, and provide information in various chunks to a variety of people. The practice of informatics has evolved from professional know-how and technology, not as a product of 'basic' research. Informatics is in a prescientific stage of naming, taxonomy, descriptions and definitions. First we need to understand how existing information systems work. Next we need to formulate a model of these practices: components, activities, values added, clients served and the problems solved by the IS. Finally, we have to apply the newly gained insights (science) to the design of better IS. Note: The steam engine was used in practice well before the Carnot cycle theory was invented. SCIENCE The field is in pre-scientific stage. Mostly taxonomy of working systems. Goals: Understanding the forms of environmental knowledge Usages of environmental knowledge Processes of new knowledge creation R. Husar 1992 Informatics is in a pre-scientific stage of naming, taxonomy, descriptions and INFORMATICS - THE ENGINEERING Information systems exists that organize, store, manipulate, retrieve, analyze, evaluate, and provide information in various chunks to a variaty of people. Design of information storage and flow systems. Emphasis on user driven design to complement technology and content driven info flows. Goals: Augment human decision and learning processes. Unite data and metadata Reduce resistances to info flow The activities of information engineering include: Matching the information need of the user to the information sources, using available technology. Develop methodologies for the organization, transformations and delivery of environmental data/information/knowledge. Identify the key information values and the processes that will enhance those values. Seek out a set of universal values that can be added to information, that are independent of the user environment ( i.e. accessibility, common coding, and documentation). Develop new tools that will enhance and augment the human mind in dealing with environmental information, e.g. to minimize the 'info-glut'. R. Husar 1992 INFORMATICS - THE TECHNOLOGY The information revolution is driven by the confluence of comuter hardware, software and communications technologies. Hardware : Computers, communications, microelctronics. Software : Database, hypertext, geographic information systems (GIS), hypertext, multimedia, object orientation. Communications: Wide area (Internet) and local networks; bulletin boards, CD ROM. Intellectual Technologies: Indexing, classification/organisation, searchning, presenting. These technologies provide the hope to overcome the information/data glut. Develop knowledge and data storage, delivery and processing systems. Goals: Merge database, hypertext, numerical modeling technologies User programmable, socially well behaving info systems Ultimately interoperable with the universe Information systems are implemented using suitable technologies. The information revolution is driven by the confluence of computer hardware, software and communications technologies. Hardware : Computers, communications, microelectronics. Software : Database, hypertext, geographic info systems (GIS), multimedia, object orientation. R. Husar 1992 Communications: Wide area (Internet) and local networks; bulletin boards, CD ROM. Intellectual Technologies: Indexing, classification/organization, searching, USER-DRIVEN INFORMATION PROCESSING • • • • • • • • • • • • • • • • • • • • • • Action matching goals compromising bargaining choosing Productive Knownledge presenting options advantages disadvantages Informing Knowledge separating evaluating validating interpreting synthesizing Information grouping classifying formatting displaying Data DECISION PROCESSES JUDGMENTAL PROCESSES ANALYZING PROCESSES ORGANIZING PROCESSES R. Husar 1992 • • • • • VALUE ADDED PROCESSES Metaphors are useful in describing new, unfamiliar topics. Environmental Information systems can be viewed as refineries that transform low-value data into information and knowledge through a series of value-adding processes. Data constitute the raw input from which productive knowledge, used for decision making is derived. Data refers to numbers, files and the associated labeling that describes it. Data are turned into information when one establishes relationships among data, e.g. relational database. Informing knowledge educates while productive knowledge is used for decision making. In fact, one of the practical definitions of knowledge is 'whatever is used for decision-making'. R. Husar 1992 USES OF ENVIRONMENTAL DATA • Environmental data/information is used to: • • Provide Historical Record • Identify Deviation from Expected Trend • Anticipate Future Environmental Problems • Provide Legal/Regulatory Record • Support Research • Support Education • Support Communication • • The main uses are in science, education and to support regulations. R. Husar 1992 CONTENT, TECHNOLOGY AND USER DRIVEN DATA FLOWS • • • • Most agencies are disseminating information relevant to their own domain of activity. Such data flow is content driven. New technologies such as the papyrus, printed book, CD-ROM and computer networks provide bursts of information flow resulting in technology-driven information flows. However, in scientific, educational, and regulatory use of environmental data, there is a need for compatible information from various domains, requiring data merging and synthesis. Such data flow is user driven since the user dictates the form, content and flow of the data. Content and technology-driven data flows are fine but they are inadequate to handle modern information needs. The challenge is to develop the user-driven model and to reconcile and integrate it with the other models. R. Husar 1992 ENVIRONMENTAL INFORMATICS The study of environmental information and its use in environmental management, science and education. More than the study of computers in environmental information. It's focus is on the environmental field, rather then on computers and the technology. Pressedent: Medical Informatics, a mature field with a goal, domain, textbooks, college courses, research groups and funding agencies. ENVIRONMENTAL INFORMATICS, EI A tentative definition of EI is: The study of environmental information and its use in decision making, education and science. EI focuses on the environmental field, rather then on computers and the technology. It's approach is to systematically study environmental information, as branches of science, engineering and technology. Much of the presentation below is a synthesis of ideas 'borrowed' and adopted to the environmental field. There is precedent: Medical Informatics, a mature field with a goal, domain, textbooks, college courses, research groups and funding agencies. Other relevant fields include library sciences, management sciences, information engineering. R. Husar 1992 INFORMATION AS A RESOURCE Environmental information and information in general has several unique characteristics. In the post-industrial era, material goods were replaced by information as the commodity of transactions. It became a resource in itself. As other resources, information needs to be acquired, organized and distributed i.e. managed. However, it is a remarkable resource: It can not be depleted by use. In fact, it expands and gets better with use. Information is not scarce; it is in chronic surplus. Scarcity is in time to process it into knowledge. The processing costs are borne by the info user. Info can be owned by many at the same time. It is shared, not exchanged in transactions. Therefore, one must develop different tools from those that proved useful for natural, capital, human and technological resource management. R. Husar 1992 DATA FLOW IMPEDIMENTS DATA FLOW IMPEDIMENTS R. Husar 1992 ASSUMPTIONS AND RATIONALE For the foreseeable future, environmental information will grow in quantity and quality. Individual agencies are collecting, organizing, and disseminating information relevant to their own domain of activity. There is not enough manpower and time to digest, analyze, integrate, and ultimately make use of the accumulated environmental information. Therefore, there is a need for a systematic effort to develop suitable data organization, manipulation, integration, and delivery system. A possible mechanism for accomplishing these task is to form a consortium of informatics-minded institutions - the EI Group. For the foreseeable future, environmental information will grow in quantity. There is not enough manpower and time to analyze, integrate, and use of all the data The problem is not so much the quantity of data, but rather the form in which is delivered, e.g. the automobile windshield delivers lots of data but we can still process it with ease. What is needed is a faster way to metabolize the expanding environmental data sets. Therefore, there is a need for a systematic effort to better understand environmental information: its characteristics, use and management. R. Husar 1992 USER DRIVEN FLOW OF ENVIRONMENTAL INFORMATION USAGE OUTPUT DATA SCIENCE Data&Model EDUCATION REGULATION EduWare EXTERNAL POLLUTANT WORD RELEASES DesSuppSystem AMBIENT LEVELS EFFECTS USER DRIVEN FLOW OF ENVIRONMENTAL INFORMATION In scientific, educational, and regulatory use of environmental data, there is a need for multiplicity of compatible data sets and knowledge from various domains. There is a set of universal values that can be added to the data such as accessibility, common coding, and documentation. These values in conjunction with a set of software tools could minimize the "info-glut". Use of data for science, education, regulation and policy requires: R. Husar 1992 Specification of the information need by the user An information system (model, educational software or a decision support system) • POSSIBLE ACTIVITIES OF EI GROUP • EI Science: Define the domain of EI; environmental information as a resource; seek general laws of EI; info uses; driving forces. • EI Engineering: Study the components of EI systems; creation; value-added processes; data/information/knowledge structures for storage and transmission; design of EI systems. • Education: Develop educational materials on EI; conduct workshops, training sessions. • Work closely with others on: • Data Integration: Collect, reconcile, integrate, document data/information/knowledge bases. • Data Exchange: Foster exchange through depositories, data catalogs, transfer mechanisms, and nomenclature standards. • Tools Development: Evaluate and develop software tools R. Husar for the access, manipulation, and1992 presentation of environmental information. • • • • • • REQUIREMENTS FOR THE EI GROUP The EI GROUP has to have a solid understanding of environmental data needs for science, education, and policy development, regulations, and other uses. Know how to translate the information needs to information systems and to design a data flow and transformation systems (information engineering). The EI GROUP has to be well versed in modern information science and technology as applicable to environmental informatics. Where necessary, the Group has to develop new concepts and technologies. It has to interface with the users of the environmental information, to assure the usefulness of the effort. Interface with, and utilize the existing governmental and private data sources, building on and enhance not competing with those effort. R. Husar 1992 • • • • OUTPUT OF THE EI GROUP Technology: Adopt and apply evolving technologies for DBMS, GIS, Hypertext, Expert Systems, User Interfaces, Multimedia, Object Orientation Public Databases: Prepare relevant, high quality, well documented, compatible, integrated, raw, and aggregated environmental databases to be usable for science, education, enforcement, and other purposes. Make such high qualityhigh value data environmental information available to many users. Software Tools: Provide "smart" data display/manipulation tools that will help turning data into knowledge. e.g. GIS, Voyager, Movie, Hypertext, Video/Sound. • • Federal agencies have recognized these needs and formed the Interagency Working Group on Data Management for Global Change IWGDMGC The federal effort could be augmented by companion academic efforts, possibly through a consortium of informatics-minded institutions - the EI Group. R. Husar 1992 POSSIBLE ACTIVITIES OF THE EI GROUP • • • • • Data Integration: Collect, reconcile, integrate, document information bases. Data Exchange: Foster exchange of environmental data through depositories, data catalogs, transfer mechanisms, and nomenclature standards. Tools Development: Evaluate and develop software tools for the access, manipulation, and presentation of environmental information. End-Use Projects: Conduct specific research and development projects for science, education, and regulations. Education: Conduct workshops, training sessions, and prepare educational material for environmental informatics. R. Husar 1992 REQUIREMENTS FOR THE EI GROUP • • • • • The EI GROUP has to have a solid understanding of environmental data needs for science, education, and policy development, regulations, and other uses. Know how to translate the information needs to information systems and to design a data flow and transformation systems (information engineering). The EI GROUP has to be well versed in modern information science and technology as applicable to environmental informatics. Where necessary, the Group has to develop new concepts and technologies. Interface with, and utilize the existing governmental and private data sources, building on and enhance not competeing with those effort. It has to interface with the users of the environmental information, to assure the usefulness of the effort. R. Husar 1992 EI GROUP OUTPUT New Developments: Environmental Informatics • • • • • • • • • • Science: Define the domain of EI. Develop new methods to classify, organize, and create environmental knowledge. Engineering: Create an infrastructure and methodology for the organization, transformation, and delivery of environmental information. Technology: Examine the evolving technologies for Database Management Systems (DBMS), Geographic Information System (GIS), Hypertext, Expert Systems, User Interface, Multimedia, Object Orientation. Apply and adopt these technologies to environmental information. Provide High Grade Environmental Databases for Public Use Prepare relevant, high quality, well documented, compatible, integrated, raw, and aggregated environmental databases to be usable for science, education, enforcement, and other purposes. Make such high quality-high value data environmental information available to many users. Provide Software Tools Provide "smart" data manipulation tools that will help turning data into knowledge. Provide tools for data access, manipulation, and presentation (e.g. GIS, Voyager, Movie, Hypertext, Video/Sound). R. Husar 1992 Funding FUNDING OPTIONS AGENCIES PROPOSAL RESEARCH GROUPS NASA NSF FOREST SERVICE NOAA Global Change Education Regional Global Change EPA Regulatory U of NEW HAMPSHIRE U of VERMONT R. Husar 1992 WASHINGTON U. Information and Decision Making (1) Arno Penzias: Ideas and Information • • • • • • • • • An instrument operator, traffic controller, economist .... all process information. A common thread among these activities is that is decision making. A decision may be simple such as selecting ....replacing a . or as complex as developing a new clean air legislation. Decisions are followed by actions and actions generally in new information . This rather circular behavior keeps the decision process going until some goal is met, the task is finished , or the project is set aside for a time. Healthy flow of information separates winning organizations from losers. ( More on the flow concept here) For quality information, today's consistently successful decision makers rely on a combination of man and mashine. Getting the best combination requires understanding how the two fit together and the roles each may play. It also requires having an information strategy that is suitable for both he decisionmaker's preferences and the problem at hand. Knowledge is whatever information is used to make decision. "Deciding" is acting on information. Managers are transformers of information pp125 R. Husar 1992 Information and Decision Making (2) Arno Penzias: Ideas and Information • • • • • • • • • • • • • Information Flow and Decision Making An instrument operator, traffic controller, economist .... all process information. A common thread among these activities is that is decision making. A decision may be simple such as selecting ....replacing a . or as complex as developing a new clean air legislation. Decisions are followed by actions and actions generally reswult in new information . This rather circular behavior keeps the decision process going until some goal is met, the task is finished , or the project is set aside for a time. Barring blind luck, the quality of decision can not be any better than the quality of the information behind it. Healthy flow of information separates winning organizations from losers. ( More on the flow concept here) Knowledge is whatever information is used to make decision. "Deciding" is acting on information. Managers are transformers of information pp125 Despite the explosive growth in computing, we have yet to feel the full impact of the informationprocessing resource that microprocessors offer. The computing power will immensity the challenge of developing ever more powerful methods of telling mashines to do what we whish them to do. This requires the solution of "the software problem". Solving the "the software problem" includes producing software more quickly, with fewer bugs at lower cost- software that is easier to to understand, modify and reuse different applications. Give user to customize a system by modifying. R. Husar 1992 Information and Decision Making (3) Arno Penzias: Ideas and Information • • • UNIX - Social behavior Most applications use different formats to move information between them. UNIX programs communicate with each other in a specific way. This arrangement allows the programmer to plug programs together like Lego sets, without worrying about the details of interfacing. UNIX's modularity permits users to build customized application programs out of modular pars from libraries and programs borrowed from friends. Convenient "User programmability" has the potential to unleash the creative powers of many users instead of relying on the program creator for all the insights needed to create well suited applications What next? Search of nonprocedural programming that frees users from worrying about how a given task is to be accomplished and allow them to merely state what they want. R. Husar 1992 Information and Decision Making (4) Arno Penzias: Ideas and Information • • • • • • Networking To benefit from information created for different purposes under different conditions and at different location, users need convenient interfaces to the systems providing the data. Ultimately, the intervening networking technology that provides the interface should be flexible enough to accept information in whatever format the data source provides it and translate it to the needed format most suitable for human perception. Human pattern recognition skills, tactile sensitivities and similar interfaces to the external world attest to the massive processing power that the brain dedicates to such functions. Evidently, the experience of evolution has demonstrated the need for a variety of sensitive interfaces, . The greatest subtlety of our own human interfaces appears to be in the way we effortlessly integrate disparate sensory inputs. It is the single good feeling you get in a theater or sports arena from words, music, spectacle, and someone sitting next to you- all at the same time. In contrast, most of our present technology tends to deal with each input the words the visual input etc. as a separate entity. User preferences and productivity needs are the driving forces behind the call for better interface between people and mashines. Much of the additional computer processing power will be devoted to providing better interfaces between people and mashine. R. Husar 1992 Information and Decision Making (5) Arno Penzias: Ideas and Information • • • • • • • • • • • • Computers and human information processing While computers afford humans much valuable help in processing massive amounts of data. However, mashines are best at manipulate numbers or symbols; people connect them to meaning. Machines offer little serious competiion in areas of creativity, integration of disparate information, and flexible adaptation to unforeseen circumstances. Here the human mind functions best. Computing systems lack a key attribute of human intelligence: the ability to move from one context to another. Just-in-time Information processing – symbiotinc co-evolution Computers and communiation systems can speed up the Connectivity can spped Today, access to on-line data reduction schemes enables us to think of the results as we get them. These better tools can profoundly change the way we work. Today, we can ask questions in time to get answers, make decisions and create more powerful ideas. Generate knowledge faster… While ideas flow from human minds, computers can help shaping much of the information that leads to those ideas. By providing needed information in timely way and in digestable form, electronic data processing and delivery system can someone make informed decisions, Tools of the mind , mind ampliing. Same way as steam enfine amplies humans physical power, the computer/communication technologies can amlify its mental powers. In this sence, the goal of the information techloogy promoted here is not so much to intruduce ‘artificail intelligence, but tho amplify the actual intelligence of humans to perfom increasingly complex taks. • R. Husar 1992 Information and Decision Making (6) Arno Penzias: Ideas and Information • • • • • • • • The Software Problem Despite the explosive growth in computing, we have yet to feel the full impact of the informationprocessing resource that microprocessors offer. The computing power will immensity the challenge of developing ever more powerful methods of telling machines to do what we whish them to do. This requires the solution of "the software problem". Solving the "the software problem" includes producing software more quickly, with fewer bugs at lower cost- software that is easier to to understand, modify and reuse different applications. Give user to customize a system by modifying. Most applications use different formats to move information between them. UNIX programs communicate with each other in a specific way. This arrangement allows the programmer to plug programs together like Lego sets, without worrying about the details of interfacing. UNIX's modularity permits users to build customized application programs out of modular pars from libraries and programs borrowed from friends. Convenient "User programmability" has the potential to unleash the creative powers of many users instead of relying on the program creator for all the insights needed to create well suited applications What next? Search of nonprocedural programming that frees users from worrying about how a given task is to be accomplished and allow them to merely state what they want. R. Husar 1992 Information and Decision Making (7) Arno Penzias: Ideas and Information • • • • • • • • Data Access To benefit from information created for different purposes under different conditions and at different location, users need convenient interfaces to the systems providing the data. Ultimately, the intervening networking technology that provides the interface should be flexible enough to accept information in whatever format the data source provides it and translate it to the needed format most suitable for human perception. Human pattern recognition skills, tactile sensitivities and similar interfaces to the external world attest to the massive processing power that the brain dedicates to such functions. Evidently, the experience of evolution has demonstrated the need for a variety of sensitive interfaces, . The greatest subtlety of our own human interfaces appears to be in the way we effortlessly integrate disparate sensory inputs. It is the single good feeling you get in a theater or sports arena from words, music, spectacle, and someone sitting next to you- all at the same time. In contrast, most of our present technology tends to deal with each input the words the visual input etc. as a separate entity. User preferences and productivity needs are the driving forces behind the call for better interface between people and machines. Much of the additional computer processing power will be devoted to providing better interfaces between people and machine. • • • R. Husar 1992 Spatial Time Series: Analysis-Forecasting - Control Bennett, R.J. Pion Limited, London 1979 Description (Characterization) • In order to understand the functioning of organisms, one has to understand • • 1. 2. individual holons (downward face) the relationship between the holons (upward) Koestler’s holarchy • • Involves summarizing the response characteristics of the system by purely descriptive measures. • Description is accomplished by monitoring, followed by descriptive statistics. Explanation • Associate and explain events that occur in space-time. Build assotiative, causal relationships, build model. Analysis stages (p. 20): – – – – – Stage 1. Prior hypothesis of systems structure Stage 2. System identification and specification Stage 3. Parameter estimation Stage 4. Check of model fit [Stage 5. System explanation, forecasting, control ] R. Husar 1992 Moor’s Law The single most important thing to know about the evolution of technology is Moore's Law. Most readers will already be familiar with this "law." However, it is still true today that the best of industry executives, engineers, and scientists fail to account for the enormous implications of this central concept. Gordon Moore, a founder of Intel Corporation, observed in 1965 that the trend in the fabrication of solid state devices was for the dimensions of transistors to shrink by a factor of two every 18 months. Put simply, electronics doubles its power for a given cost every year and a half. In the three decades since Moore made his observation the industry has followed his prediction almost exactly. Many learned papers have been written during that period predicting the forthcoming end of this trend, but it continues unabated today. Papers projecting the end are still being written, accompanied with impressive physical, mathematical, and economic reasons why this rate of progress cannot continue. Yet it does. Moore's Law is not a "law" of the physical world. It is merely an observation of industry behavior. It says that things in electronics get better, that they get better exponentially, and that this happens very fast. Some, even Gordon Moore himself, have conjectured that this is simply a self-fulfilling prophecy. Since every corporation knows that progress must happen at a certain rate, they maintain that rate for fear of being left behind. It is also possible that Moore's Law is much broader than it appears. Possibly it applies to all of technology, and has applied for centuries while we were unaware of its consequences or mechanisms. Perhaps it was only possible to be explicit about technological change in 1965 because the size of transistors gave us for the first time a quantitative measure of progress. If this is so, then we are embedded in an expanding universe of technology, where the dimensions of the world about us are forever changing in an exponential fashion. The notion of exponential change is deceptively hard to understand intuitively. All of us are accustomed to linear projection. We seem to view the world through linear glasses -- if something grows by a certain amount this year, it will grow an equal amount the next year. But according to Moore's Law, electronics that is twice as effective in a year and a half will be sixteen times as effective in 6 years and over a thousand times as effective in 15 years. This implies periodic overthrows of everything we know. An executive in the telecommunications industry recently said that the problem he confronted was that the "mean time between decisions exceeded the mean time between surprises." Moore's Law guarantees the frequency of surprises. R. Husar 1992 Metcalfe's Law -- Network Externalities • • • • There is another "law" that affects the introduction of new technology -- this time in an inhibiting fashion. Metcalfe's Law, also known to economists generally as the principle of network externalities, applies when the value of a new communications service depends on how many other users have adopted this service. If this is the case, then the early adopters of a given service or product are disincented, since the value they would obtain is very small in the absence of other users. In this situation innovation is often throttled. Metcalfe's law often applies to communications services. A classic example, of course, is the videotelephone. There is no value in having the first videotelephone, and it only acquires value slowly as the population of users increases. If there are n users at a given time, then there are n(n-1) possible one-way connections. Thus the value grows as the square of the number of users. The value starts slowly, then reaches some point where it begins to rise rapidly. It seems as if there needs to be a critical mass for takeoff, and that there is no way to achieve that critical mass, given the burden on initial subscribers. Metcalfe's Law has defeated many technological possibilities, left stillborn at the starting gate of market penetration. Nonetheless, there are important examples of breakthroughs. For example, facsimile became a market success, but only after decades of technological viability. Even so, facsimile is a complex story, involving the evolution of standards, the inevitable progress of electronics, the equally-inevitable progress in the efficiency of signal-processing algorithms, and the rise of the business need for messaging services. Moore's and Metcalfe's laws make an interesting pair. In the communications field Moore's law guarantees the rise of capabilities, while Metcalfe's law inhibits them from happening. Devices that appear to have little intrinsic value without the existence of a large networked community continue to diminish in cost themselves until they reach the point where the value and cost are commensurate. Thus Moore's Law in time can overcome Metcalfe's Law. R. Husar 1992 Metcalfe's Law -- Network Externalities(2) • Economists know it as the law of increasing returns, of network externalities, but the idea is that the more people that are connected to a network the more valuable it is. Specifically, the value of a network grows by the square of the number of users. The value is measured by how many people I can communicate with out there, so the total value of the network grows as the square of the number of users. Now, what this means is that a small network has almost no value, and a large network has a huge value. What it gives you is the lock-in phenomenon of winner takes all. You want to have the same thing as everybody else. The idea is that you don’t want to be the first person on your block to get the plague. But when all your friends get it, you think about getting it. The more people have it, the more you’re likely to get it and suddenly there is this capture effect where everybody has it. This law of network externality governs so much of the business and is at the heart of the Microsoft trial. Why does Microsoft have a monopoly? Is this a natural phenomenon that has to do with networks? • David Reed coined another law—Reed’s Law—that says there’s something beyond Metcalfe’s Law. There are three kinds of networks. – – – First, there’s broadcast like radio and TV, which we’ll call a Sarnoff network. The value of that network is proportional to the number of people receiving the broadcast. Amazon would be this type of network, because people shop there but don’t interact with each other. Then there’s the Metcalfe’s Law-type network where people talk to each other, for example, classified ads. Reed said that the important thing about the Internet is neither of those. The Internet exhibits a third kind of law—where communities with special interests can form. The thing about communities is there are 2n of them, so in a large network the value of having so many possible communities and subnetworks is the dominant factor. He predicts a scaling of networks, starting with small networks having only the Sarnoff linear factor, larger networks dominated by the square factor, and giant networks dominated by the 2n factor of the formation of communities. • Napster is another example of what’s going on in information technology. First, it’s an example of the kind of network where winner takes all. Napster is where all the songs are, so that’s where everybody else is. If Napster goes under, when they go under, then all the little sites won’t be able to replace it because people won’t find what they want there. Napster also brings up one of the other properties of information, which is troublesome and is going to shape our society in the coming years—the idea that information can be copied perfectly at zero cost. That flies in the face of so much of what we believe about commerce. As my friend Douglas Adams said to me, we protect our intellectual property by the fact that it’s stuck onto atoms, but when it’s no longer stuck onto atoms, there is really no way to protect it. He would like to sell his books at half a cent a page, the idea being that for every page you read, you pay him half a cent. If you get into the book 20 pages and you say, “This book is really bad,” you don’t pay anymore. That would eliminate the “copying of information at zero cost” issue that he experiences as an author. He says people come up to him in the street and say, “I’ve read your book 10 times,” and he says, “Yes, but you didn’t pay 10 times.” • So these are some of the things that trouble me about the future of information technology. What are its limits? Will the laws of network effects doom us all to a shared mediocrity? What will happen to intellectual property and its effect on creativity? Is it like the railroads, or is this something fundamentally different that will last through the next century? R. Husar 1992 The Evolution of the World Wide Web • • • • • • • • • • The most important case study in communications technology is the emergence of the World Wide Web. This revolutionary concept seemed to spring from nothingness into global ubiquity within the span of only two years. Yet its development was completely unforeseen in the industry – an industry that had pursued successive long and fruitless visions of videotelephony, home information systems, and video-on-demand, and had spent decades in the development of ISDN with no apparent application. It now seems incredible that no one had foreseen the emergence of the Web, but except for intimations in William Gibson’s science fiction novel Neuromancer, there is no mention in either scientific literature or in popular fiction of this idea prior to its meteoric rise to popularity. There is a popular notion that all technologies take 25 years from ideation to ubiquity. This has been true of radio, television, telephony, and many other technologies prevalent in everyday life. How, then, did the Web achieve such ubiquity in only a few years? Well, the historians argue, the Web relied on the Internet, which in turn was enabled by the widespread adoption of personal computers. Surely this took 25 years. We might even carry this further. The personal computer would not have been possible without the microprocessor, which depended on the integrated circuit evolution, which itself evolved from the invention of the transistor, and so forth. By such arguments nearly every development, it seems, could be traced back to antiquity. Although the argument about the origin and length of gestation seems an exercise in futility, the important point is that many revolutions are enabled by a confluence of events. The seed of the revolution may not seem to lie in any individual trend, but in the timely meeting of two or more seemingly-unrelated trends. In the case of the World Wide Web the prevalence of PCs and the growing ubiquity of the Internet formed an explosive mixture ready to ignite. Perhaps no invention was really even required. The world was ready -- it was time for the Web. While this physical infrastructure was forming in the world’s networks and on the desktops of users, there was a parallel evolution of standards for the display and transmission of graphical information. HTML, the hypertext markup language, and HTTP, hypertext transmission protocol, were unknown acronyms to the majority of technical people, let alone the lay public. But the definition of these standards that would enable the computers and networks to exchange rich mixtures of text and pictures was taking shape in Switzerland at the physics laboratory CERN, where Tim Berners-Lee was the principle champion. The role of standards in today’s information environment is critical, but often unpredictable. What is really important is that many users agree on doing something exactly the same way, so that everyone achieves the benefits of interoperability with everyone else. It is exactly the same concept of network externalities that is at work in Metcalfe’s law. An international standard can stimulate the market adoption of a particular approach, but it can also be ignored by the market. Unless users adopt a standard it is like the proverbial tree falling in the forest without a sound. Standards are, for the most part, advisory. User coalitions or powerful corporations can force their own standards in a fascinating and ever-changing multi-player game. Moreover, de facto standards often emerge from the marketplace itself. So in the middle 1980s there was a prevalent physical infrastructure with latent capabilities and an abstract agreement on standards for graphics. One more development and two brilliant marketing ideas were required to jumpstart the Web. The development was that of Mosaic at the National Center for Supercomputing Applications at the University of Illinois. Mosaic was the first browser, a type of program now known throughout the world for providing a simple point-and-click user interface to distributed information. Following the initial versions of Mosaic from NCSA, commercial browsers were popularized by Netscape and Microsoft. The revolutionary marketing ideas needed for the Web now seem obvious and ordinary. A decade ago, however, they were not at all obvious. One idea was to enable individual users to provide the content for the Web. The other idea was to give browsers free to everyone. Between these ideas, Metcalfe's Law was overcome. Even though browsers initially had almost no value, since there were no pages to browse, they could be obtained electronically at no cost. The price was directly related to the value. Thus browsers spread rapidly, just as their value began to build with the accumulation of web pages. Allowing the users to provide content was counter to every idea that had been held by industry. The telecommunications and computer industries had tried for a decade to develop and market remote access to information and entertainment held in centralized databases. This was the cornerstone of what were called "home information systems" that were given trials in many cities during the 1970s and 1980s. Later, the vision pursued by the industry was that of video-ondemand -- the dream of providing access to every movie and television show ever made, like a giant video rental store, over a cable or telephone line. Virtually every large telecommunications company had trials and plans for videoon-demand, and the central multi-media servers required for content storage were being developed by Microsoft, Oracle, and others. The Web exemplifies some powerful current trends -- the empowerment of users, R. Husar and 1992 geographically-distributed content, distributed intelligence, and intelligence control at the periphery of the network. Another principle is that of open, standard interfaces that allow users and third parties to build new applications and capabilities upon a standardized infrastructure. It is hard to criticize industry for pursuing the centralized approach. Imagine proposing the Web to a corporate board in 1985, and describing how Information Technology and the Conduct of Research: The Users View National Academy Press, Washington, D.C. 1989 • Committee rationale: There are serious impediments to the wider and more effective use of information technology. Committee members were active researchers are outside the field of "information technology". In the absence of considerable knowledge about the field, the panel was approaching it by asking the researchers about their experiences. • p 1. Information technology - the set of computer and communications technologies - has changed the conduct of scientific, engineering and clinical research. New technologies offer the prospect of new ways of finding, understanding, storing, and communicating information and should increase the capabilities and productivity of researchers. Among these new technologies are simulations, new methods of presenting observational and computational results as visual images, the use of knowledge-based systems as "intelligent assistants" and more flexible and intuitive ways for people to interact with and control computers. • The conduct of research: The everyday work of researcher involves writing proposals, developing theoretical models, designing experiments, collecting data, analyzing data, communicating with colleagues, studying research literature, reviewing colleagues work, and writing articles. They look at three particular aspect of research: data collection and analysis, communication and collaboration, and information storage and retrieval. R. Husar 1992 Information Technology and the Conduct of Research: The Users View National Academy Press, Washington, D.C. 1989 DATA COLLECTION AND ANALYSIS It is one of the most widespread use of information technology in research. Trends: Increased use of computers Dramatic increase of data storage and processing capacity Creation of new computer controlled instruments that produce more data Increase communication among researchers using networks. Availability of software packages for standard research (e.g. statistical) • • • Difficulties: 1. Uneven distribution of computing resources, the has and have-nots 2. Finding the right software. Commercial software is often unsuitable for specialized needs. Most researchers, although they are not skilled software creators, develop their own software with the help of graduate students. Such software is designed for one purpose and it is difficult to understand, to maintain or transport to other computing environments. • 3. Transmitting data over networks at high speed. COMMUNICATION AND COLLABORATION Routine word processing and electronic mail are the most pervasive form of computer use. Electronic publishing and data communication-coordination is becoming increasingly used. Trends: Information can be shared more quickly New collaborative arrangements Difficulties: Incompatibility of technologies Networks are anarchic. R. Husar 1992 Information Technology and the Conduct of Research: The Users View National Academy Press, Washington, D.C. 1989 IFORMATION STORAGE AND RETRIEVAL How it is stored determines how accessible it is. Scientific text is stored on print ( hard copy) and accessible though indices, catalogs of a library. Data and databases are stored mostly on computers disks. A database along with the procedures for indexing, cataloging and searching makes up an information management system. Difficulties: The researcher cannot get access to data; if he can, he can not read them; if he can read them, he does not know how good they are; and if he finds them good he can not merge them with other data . Difficulty accessing data stored by other researchers. Such access permits reanalysis and replication, both essential elements of scientific process. At present data storage is largely an individual researcher's concern, in line with the tradition that researchers have first right to their data. The result has bee a proliferation of idiosyncratic methods for storing, organizing, and indexing data, with the researchers data essentially inaccessible to all other researchers. Formats in data files vary from researcher to researcher, even within a discipline. These problems prohibit a researcher from merging someone else's data in his own database. Hence, considerable effort must be dedicated to converting data formats. [not enough metadata]. Finally when a researcher reads another database, he has no notion as to the quality of the data it contains. The data sets do not have enough QC information and descriptive metadata. There is a need for evaluated high quality databases. Given a high quality well described database, a major difficulty exists in conducting searches. Most info searches are incomplete, cumbersome, inefficient, expensive, and executable only by specialists. Searches are incomplete because the databases themselves are incomplete. Updating is expensive because data are stored in more than one database. Cumbersome and inefficient because different databases are organized according to different principles. ( data models) Another difficulty in storing data information is private ownership. By tradition, researchers hold their data privately. In general, they neither submit their data to a central archive nor make their data available via computer. Increasingly, however, in disciplines such as meteorology and biomedical sciences, submission of primary data into databanks is has become accepted as a duty. In some fields, the supporting agencies require that the data be archived in machine readable format and that any professional article be accompanied by a disk describing the underlying data. Also, a comprehensive reference service for computer-readable data should be developed. [Master directory] In addition, peer review of articles and proposals has been constrained by the difficulty of gaining access to the data used for the analysis. If writer were required to make their primary data available, reviewers could repeat at least part of their analysis reported. Such a review would be more stringent, would demand more effort from reviewers and raises some operational questions that need to be resolved. ; but arguably lead to more careful checking of published results. Underlying difficulties in information storage and retrieval are problems in the institutional management of resources. . Who is to mange, maintain, and update info services.? Who is to create and enforce standards? At present, the research community has tree alternatives: federal government which manages resources as MEDLINE< and GenBank; professional societies such as the American Chemical Society which manages the Chemical Abstracts Service; and non-profit organizations such as Institute for Scientific Information. R. Husar 1992 Information Technology and the Conduct of Research: The Users View National Academy Press, Washington, D.C. 1989 Recommendations • Institutions supporting researchers must develop support policies,services standards for better use if info technology. The institutions are Universities, University Departments, Funding Agencies, Scientific Associations, Network Administrators, Info Service providers, Software vendors and professional groups • The Federal Government should support software development for scientific research. The software should meet standards of compatibility, reliability, documentation and should be made available to other researchers. • Data collected with government support rightfully belong to the public domain. with reasonable time for first publication should be respected. • There is a pressing need for more compact form of storage • Tool building for non-defense software should be encouraged • The Federal Government should fund pilot projects to on information storage and dissemination concepts in selected disciplines and implement software markets with emphasis on the development of generic tools useful for multiple disciplines. • The institutions lead by the federal government should develop an information technology network for use by all qualified researchers. R. Husar 1992 Measuring for Environmental Results by William K. Reilly, EPA Journal, May-June 1989 • • • • • • • • • • • A key element in any effort to measure environmental success is information--information on where we've been with respect to environmental quality, where we are now, and where we want to go. Since its beginning, EPA has devoted a great deal of time, attention, and money to gathering data. We are spending more than half a billion dollars a year on collecting,, processing, and storing environmental data. Vast amounts of data are sitting in computers at EPA Headquarters, at Research Triangle Park, North Carolina, and at other EPA facilities across the country. But having all this information--about air and water quality, about production levels and health effects of various chemicals, about test results and pollution discharges and wildlife habitats--doesn't necessarily mean that we do anything with it. The unhappy truth is that we have been much better at gathering raw data than at analyzing and using data to identify or anticipate environmental problems and make decisions on how to prevent or solve them. As John Naisbitt put it in his book Megatrends: "We are drowning in information but starved for knowledge." Our various data systems, and we have hundreds of them, are mostly separate and distinct, each with its own language, structure, and purpose. Information in one system is rarely transferable to another system. I suspect that few EPA employees have even the faintest idea of how much data are available within this Agency, let alone how to gain access to it. And if that is true of our own employees, how must the public feel when they ponder the wealth of information lurking, just out of reach, in EPA's huge and seemingly impenetrable data bases? The strategic information effort I have described, however, will require a new attitude on the part of every EPA program manager--a willingness to break out of the traditional constraints of media-specific and category-specific thinking. Just as important, we must find ways to share our data more effectively with the people who paid for it in the first place: the American public. Eventually, as EPA makes progress in standardizing and integrating its information systems, the information in those systems--apart from trade secrets--should be as accessible as possible. Such information could be made available through on-line computer telecommunications, through powerful new compact disc (CD-ROM) technologies, and perhaps a comprehensive annual report on environmental trends. Sharing information with the public is an important step toward establishing a common base of understanding with the American people on questions of environmental risk. As the recent furor over residues of the chemical Alar on apple products shows, there can be a wide gap between public perceptions of risk and the degree of risk indicated by the best available scientific data. EPA must share and explain our information about the hazards of life in our complex industrial society with others--with other nations, with state and local governments, with academia, with industry, with public-interest groups, and with citizens. We need to raise the level of debate on environmental issues and to insure the informed participation of all segments of our society in achieving our common goal: a cleaner, ,healthier environment. Environmental data, collected and used within the strategic framework I have described, can and will make a significant contribution to accomplishing our major environmental objectives over the next few years. Strategic data will help us: Create incentives and track our progress in finding ways to prevent pollution before it is generated. Improve our understanding of the complex environmental interactions that contribute to international problems like acid rain, stratospheric ozone depletion and global warming. Identify threats to our nation's ecology and natural systems--our wetlands, our marine and wildlife resources--and find ways to reduce those threats. Manage our programs and target our enforcement efforts to achieve the greatest environmental results. R. Husar 1992 USES OF ENVIRONMETAL DATA Environmental data are used for many purposes. They may be to support environmental management or to the good of the society by by deriving more general environmental knowledge • • • • • • • • Provide Historical Record Identify Deviation from Expected Trend Anticipate Future Environmental Problems Provide Legal Record Support Environmental Research Support Environmental Education Support Communication Record Monitoring and Control Procedures R. Husar 1992 • • • • • • • • • • Taylor Model Taylor Model One of the specific tools employed by the staff of University Library was the Taylor Model.[5] Taylor's model is a theoretical model and is not predictive. The University Library adapted it as a working tool and, in turn, adopted the concepts of "value-adding" and the importance of Information Use Environments as critical guiding principles in the construction of the Library of the Future. In Taylor's model, individuals work in information environments and part of those information environments are the problem-solving or wrestling with problems or questions that naturally occur. Taylor's model allows that these "problem dimensions" have certain characteristics that exist along a continuum. The model also allows that information also has traits that exist along a continuum. The combination of the user's problem dimensions and the traits of the information involved create a picture of the "information worlds" within which groups of users work. However effectively a given information system (in the largest sense of the word "system") meshes with the individual or group's Information Use Environment is the measure of the degree of success of that system. It is inherent to Taylor's model that the degree of "value-added" by any component or service within the system is judged wholly from the user's point-of-view. If it isn't valuable to the user within the user's information environment, than the service isn't valuable, period. In order to begin to create these pictures of how our campus clientele gather and use information, the staff conducted some 1400 interviews with representative percentages of faculty, staff, and students. The interviewees were asked opened ended question not about how they used existing libraries services, but about how they gathered and used information. The results of the analysis of the interviews showed that campus users do indeed have very different information gathering and use patterns and that these patterns (described in Taylor's terminology) do differ along the lines of both discipline and scholarly level, i.e., the types of information required by those studying in the humanities are markedly different than those required by engineers. In turn, while the nature of the material is consistent, there are differences even within a discipline among the levels in a user group, i.e., what a humanities faculty member needs is significantly different than what a freshmen in the same area needs. There are even noticeable differences between subject areas in the same discipline such as the visual arts as compared to the literary arts. The differences are not just present in the types of information required, but also in how the information is gathered and used. This means that what each group values and requires differs widely. Often what the library considered important was not what the user considered important. Findings related to major user groups, especially faculty user groups, were taken back to those groups for discussion and confirmation. The conclusion was that developing profiles of the information gathering and use patterns by precise user group would be a powerful tool for prioritizing and planning. We also concluded that "cookie cutter" services that offered essentially the same services to all users were no longer useful or advisable. These assumptions are in the process of being applied to other areas of library responsibility such as resource allocation, including collection management budgets, personnel deployment, training programs, etc. The need to bring library resources to bear on individualizing library services has become a priority. In keeping with some of the findings of the LSBC, a means to transfer the librarians investment of their resources in activities associated with problem-solving, time-savings, and cost-savings activities is also a priority. These and other findings form that basis for another of our specific projects, the development of a suite of virtual libraries that are discipline-specific. The term "virtual library" refers to an environment, an environment in which the "client services" aspect is the most commonly referred to and the most immediately relevant to the user. In our discussion, a virtual library environment is not access to some local or remote OPAC, nor is it access to the Internet or some specific listserver on the Internet. The client server component of a virtual library environment may offer all of the latter as part of client services, but as a concept, a virtual library environment goes far beyond those notions. A virtual library environment is one in which component parts combine to provide intellectual and real access to information, the value of which is framed entirely from the users' point of view, meeting the individuals' unique information needs. Virtual libraries are not a single entity, but a host of component parts brought together in a dynamic environment. Frequently, virtual libraries are also defined as the act of remote access to the contents and services of libraries and other information resources, combining an onsite collection of current and heavily used materials in print, microformats, and electronic form, with an electronic network which provides access to, and delivery from, external library and commercial information and knowledge sources worldwide. In essence, the faculty member and student are provided the "effect" of a library which is a synergy created by bringing together technologically the resources of many libraries, information services, and knowledge stores.[6] In addition, librarians will be working collaboratively with their faculty to develop the tools to build, maintain, manipulate, and distribute these collections of data resources. 5] Robert S. Taylor, Value-added Processes in Information Systems (Norwood: Ablex Publishing Corporation), 1986. Taylor refers to his model as "rather an early presentation of a way of thinking about the field of information science" and also as "a frame of reference for ordering what we know about information use environments..." It is a very complex, powerful and sophisticated model. To mention some of its principle components only briefly, as we do in this paper, is to do the model and the book an injustice. Interested readers should examine the book for a true idea of the range of Taylor's thinking. R. Husar 1992 Value of information • he creation of a library's catalog is a process that adds value. Taylor (1986) has suggested that there are three major processes that add value to information: organizing, analyzing, and judgmental. Within each of these three major processes there are additional specific activities as shown in Figure 1. • Figure 1 o Organizing processes • • • • • • Grouping Classifying Relating Formatting Displaying o Analyzing processes • • • • • • • • • • • • Separating Evaluating Validating Comparing Interpreting Synthesizing o Judgmental processes Presenting options Presenting advantages/disadvantages Typically, catalogers perform a number of these "value-adding" processes when they are creating or editing bibliographic records for the library's catalog. Figure 2 shows how a MARC cataloging record can add different kinds of value to an information package (Taylor, 1986). • • Taylor points to the fact that the term "value" is user-based and that the value-added approach is connected to the users and the user environment. In consequence, an information system/service should be responsive to the use environment in order to help the users making choices, or assist them in clarifying problems. The purpose is to develop information services with "provision of analysed, evaluated, and interpreted information for use in a particular situation"x InfoManagment There are a number of approaches to adding value to information already in use but there is room for further development. In organisations information experts might discuss with managers their media preferences, information use strategies and barriers they have encountered in using and applying knowledge. Information experts can then begin tailoring information products and services to enable managers to make decisions, solve problems, think strategically, scan the environment and carry out other aspects of their work roles. One approach adds value to information to help information users match the information provided by a system with their needs (Taylor, 1985). The added values include ease of use, noise reduction, quality, adaptability, time-saving and cost saving. Another approach is directed toward reducing information overload for managers (34) by increasing the quality of information. Some of the values are related to the scarcity of information and the degree of confidence a manager places in information. R. Husar 1992