Environmental Data Integration (Fusion Synthesis)

advertisement
ENVIRONMENTAL INFORMATICS (1)
Draft outline of a discipline devoted to the study of environmental information:
creation, storage, access, organization, dissemination, integration, presentation and
usage
• Rudolf B. Husar
•
Center for Air Pollution Impact and Trend Analysis (CAPITA)
•
Washington University
St. Louis, MO 63130
•
September 1992
R. Husar 1992
ENVIRONMENTAL INFORMATICS:
Application of Information Science, Engineering and
Technology to Environmental Problems
Rudolf B. Husar
Director, Center for Air Pollution Impact and Trend Analysis
Washington University, St. Louis, MO
Environmental information is becoming unmanageable by traditional methods.
There is a need to develop effective methods to store, organize, access, dessimilate, filter,
combine and deliver this peculiar resource.
Information science is to explain information as a resource and the manner in which it is
created, transformed and used.
Information engineering deals with the design of information systems, while information
technology deals with the actual processes of storage transformation and delivery
Presented topics will include: information as a resource; user driven data model; valueadded processes; application of database, geographic information systems, hypertext,
multimedia, expent system technologies;and the integration of these technologies into
information systems.
The principles of Environmental Informatics will be discussed in the context of Global Change
databases organized by ORNL-CDIAS, NASA and by Washington University.
The talk will be augmented a live demonstration of the Voyager 1 Data Delivery System that
combines database, GIS, hypertext, direct manipulation and multimedia technologies.
R. Husar 1992
THE PROBLEM:
The researcher cannot get access to data;
if he can, he can not read them;
if he can read them,
he does not know how good they are;
and if he finds them good
he can not merge them with other data
Form:
Information Technology and the Conduct of Research:
The Users View.
National Academy Press, Washington, D.C. 1989
R. Husar 1992
DATA PATHWAY
DATA PATHWAY
Monitoring Site
Principal Investigator
Information Centers
R. Husar 1992
INFORMATICS - THE SCIENCE
Systems exists that organize, store, manipulate, retrieve, analyze, evaluate, and
provide information in various chunks to a variety of people.
The practice of informatics has evolved from professional know-how and
technology, not as a product of 'basic' research.
Informatics is in a prescientific stage of naming, taxonomy, descriptions and
definitions.
First we need to understand how existing information systems work.
Next we need to formulate a model of these practices: components, activities,
values added, clients served and the problems solved by the IS.
Finally, we have to apply the newly gained insights (science) to the design of better
IS.
Note: The steam engine was used in practice well before the Carnot cycle theory was
invented.
SCIENCE
The field is in pre-scientific stage. Mostly taxonomy of working systems.
Goals:
Understanding the forms of environmental knowledge
Usages of environmental knowledge
Processes of new knowledge creation
R. Husar 1992
Informatics is in a pre-scientific stage of naming, taxonomy, descriptions and
INFORMATICS - THE ENGINEERING
Information systems exists that organize, store, manipulate, retrieve, analyze,
evaluate, and provide information in various chunks to a variaty of people.
Design of information storage and flow systems. Emphasis on user driven design to complement
technology and content driven info flows.
Goals:
Augment human decision and learning processes.
Unite data and metadata
Reduce resistances to info flow
The activities of information engineering include:
Matching the information need of the user to the information sources, using
available technology.
Develop methodologies for the organization, transformations and delivery of
environmental data/information/knowledge.
Identify the key information values and the processes that will enhance those
values.
Seek out a set of universal values that can be added to information, that are
independent of the user environment ( i.e. accessibility, common coding, and
documentation).
Develop new tools that will enhance and augment the human mind in dealing with
environmental information, e.g. to minimize the 'info-glut'.
R. Husar 1992
INFORMATICS - THE TECHNOLOGY
The information revolution is driven by the confluence of comuter hardware, software
and communications technologies.
Hardware : Computers, communications, microelctronics.
Software : Database, hypertext, geographic information systems (GIS), hypertext,
multimedia, object orientation.
Communications: Wide area (Internet) and local networks; bulletin boards, CD ROM.
Intellectual Technologies: Indexing, classification/organisation, searchning,
presenting.
These technologies provide the hope to overcome the information/data glut.
Develop knowledge and data storage, delivery and processing systems.
Goals:
Merge database, hypertext, numerical modeling technologies
User programmable, socially well behaving info systems
Ultimately interoperable with the universe
Information systems are implemented using suitable technologies. The information
revolution is driven by the confluence of computer hardware, software and
communications technologies.
Hardware : Computers, communications, microelectronics.
Software : Database, hypertext, geographic info systems (GIS), multimedia, object
orientation.
R. Husar
1992
Communications: Wide area (Internet)
and
local networks; bulletin boards, CD ROM.
Intellectual Technologies: Indexing, classification/organization, searching,
USER-DRIVEN INFORMATION PROCESSING
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Action
matching goals
compromising
bargaining
choosing
Productive Knownledge
presenting
options
advantages
disadvantages
Informing Knowledge
separating
evaluating
validating
interpreting
synthesizing
Information
grouping
classifying
formatting
displaying
Data
DECISION
PROCESSES
JUDGMENTAL
PROCESSES
ANALYZING
PROCESSES
ORGANIZING
PROCESSES
R. Husar 1992
•
•
•
•
• VALUE ADDED PROCESSES
Metaphors are useful in describing new, unfamiliar topics.
Environmental Information systems can be viewed as refineries that
transform low-value data into information and knowledge through a
series of value-adding processes.
Data constitute the raw input from which productive knowledge, used
for decision making is derived.
Data refers to numbers, files and the associated labeling that
describes it. Data are turned into information when one establishes
relationships among data, e.g. relational database. Informing
knowledge educates while productive knowledge is used for
decision making.
In fact, one of the practical definitions of knowledge is 'whatever is
used for decision-making'.
R. Husar 1992
USES OF ENVIRONMENTAL DATA
• Environmental data/information is used to:
•
•
Provide Historical Record
•
Identify Deviation from Expected Trend
•
Anticipate Future Environmental Problems
•
Provide Legal/Regulatory Record
•
Support Research
•
Support Education
•
Support Communication
•
• The main uses are in science, education and to support
regulations.
R. Husar 1992
CONTENT, TECHNOLOGY AND USER DRIVEN DATA FLOWS
•
•
•
•
Most agencies are disseminating information relevant to their own domain of activity.
Such data flow is content driven.
New technologies such as the papyrus, printed book, CD-ROM and computer networks
provide bursts of information flow resulting in technology-driven information flows.
However, in scientific, educational, and regulatory use of environmental data, there is a
need for compatible information from various domains, requiring data merging and
synthesis. Such data flow is user driven since the user dictates the form, content and
flow of the data.
Content and technology-driven data flows are fine but they are inadequate to handle
modern information needs. The challenge is to develop the user-driven model and to
reconcile and integrate it with the other models.
R. Husar 1992
ENVIRONMENTAL INFORMATICS
The study of environmental information and its use in environmental management,
science and education.
More than the study of computers in environmental information. It's focus is on the
environmental field, rather then on computers and the technology.
Pressedent: Medical Informatics, a mature field with a goal, domain, textbooks,
college courses, research groups and funding agencies.
ENVIRONMENTAL INFORMATICS, EI
A tentative definition of EI is:
The study of environmental information and its use in decision making, education and
science.
EI focuses on the environmental field, rather then on computers and the technology. It's
approach is to systematically study environmental information, as branches of
science, engineering and technology.
Much of the presentation below is a synthesis of ideas 'borrowed' and adopted to the
environmental field.
There is precedent: Medical Informatics, a mature field with a goal, domain, textbooks,
college courses, research groups and funding agencies.
Other relevant fields include library sciences, management sciences, information
engineering.
R. Husar 1992
INFORMATION AS A RESOURCE
Environmental information and information in general has several unique
characteristics. In the post-industrial era, material goods were replaced by
information as the commodity of transactions. It became a resource in
itself.
As other resources, information needs to be acquired, organized and
distributed i.e. managed. However, it is a remarkable resource:
It can not be depleted by use.
In fact, it expands and gets better with use.
Information is not scarce; it is in chronic surplus.
Scarcity is in time to process it into knowledge.
The processing costs are borne by the info user.
Info can be owned by many at the same time.
It is shared, not exchanged in transactions.
Therefore, one must develop different tools from those that proved useful for
natural, capital, human and technological resource management.
R. Husar 1992
DATA FLOW IMPEDIMENTS
DATA FLOW IMPEDIMENTS
R. Husar 1992
ASSUMPTIONS AND RATIONALE
For the foreseeable future, environmental information will grow in quantity and
quality.
Individual agencies are collecting, organizing, and disseminating information
relevant to their own domain of activity.
There is not enough manpower and time to digest, analyze, integrate, and ultimately
make use of the accumulated environmental information.
Therefore, there is a need for a systematic effort to develop suitable data
organization, manipulation, integration, and delivery system.
A possible mechanism for accomplishing these task is to form a consortium of
informatics-minded institutions - the EI Group.
For the foreseeable future, environmental information will grow in quantity.
There is not enough manpower and time to analyze, integrate, and use of all the
data
The problem is not so much the quantity of data, but rather the form in which is
delivered, e.g. the automobile windshield delivers lots of data but we can still
process it with ease.
What is needed is a faster way to metabolize the expanding environmental data
sets.
Therefore, there is a need for a systematic effort to better understand environmental
information: its characteristics, use and management.
R. Husar 1992
USER DRIVEN FLOW OF ENVIRONMENTAL INFORMATION
USAGE
OUTPUT
DATA
SCIENCE
Data&Model
EDUCATION REGULATION
EduWare
EXTERNAL POLLUTANT
WORD
RELEASES
DesSuppSystem
AMBIENT
LEVELS
EFFECTS
USER DRIVEN FLOW OF ENVIRONMENTAL INFORMATION
In scientific, educational, and regulatory use of environmental data, there is a need for
multiplicity of compatible data sets and knowledge from various domains.
There is a set of universal values that can be added to the data such as accessibility,
common coding, and documentation. These values in conjunction with a set of
software tools could minimize the "info-glut".
Use of data for science, education, regulation and policy requires:
R. Husar 1992
Specification of the
information need by the user
An information system (model, educational software or a decision support system)
• POSSIBLE ACTIVITIES OF EI GROUP
•
EI Science: Define the domain of EI; environmental
information as a resource; seek general laws of EI; info uses;
driving forces.
•
EI Engineering: Study the components of EI systems;
creation; value-added processes; data/information/knowledge
structures for storage and transmission; design of EI systems.
•
Education: Develop educational materials on EI; conduct
workshops, training sessions.
• Work closely with others on:
•
Data Integration: Collect, reconcile, integrate, document
data/information/knowledge bases.
•
Data Exchange: Foster exchange through depositories,
data catalogs, transfer mechanisms, and nomenclature
standards.
•
Tools Development: Evaluate and develop software tools
R. Husar
for the access, manipulation,
and1992
presentation of environmental
information.
•
•
•
•
•
• REQUIREMENTS FOR THE EI GROUP
The EI GROUP has to have a solid understanding of
environmental data needs for science, education, and policy
development, regulations, and other uses.
Know how to translate the information needs to information
systems and to design a data flow and transformation systems
(information engineering).
The EI GROUP has to be well versed in modern information
science and technology as applicable to environmental informatics.
Where necessary, the Group has to develop new concepts and
technologies.
It has to interface with the users of the environmental
information, to assure the usefulness of the effort.
Interface with, and utilize the existing governmental and private
data sources, building on and enhance not competing with those
effort.
R. Husar 1992
•
•
•
• OUTPUT OF THE EI GROUP
Technology: Adopt and apply evolving technologies for DBMS, GIS, Hypertext,
Expert Systems, User Interfaces, Multimedia, Object Orientation
Public Databases: Prepare relevant, high quality, well documented, compatible,
integrated, raw, and aggregated environmental databases to be usable for
science, education, enforcement, and other purposes. Make such high qualityhigh value data environmental information available to many users.
Software Tools: Provide "smart" data display/manipulation tools that will help
turning data into knowledge. e.g. GIS, Voyager, Movie, Hypertext, Video/Sound.
•
•
Federal agencies have recognized these needs and formed the
Interagency Working Group on Data Management for Global Change
IWGDMGC
The federal effort could be augmented by companion academic efforts,
possibly through a consortium of informatics-minded institutions - the EI Group.
R. Husar 1992
POSSIBLE ACTIVITIES OF THE EI GROUP
•
•
•
•
•
Data Integration: Collect, reconcile, integrate, document
information bases.
Data Exchange: Foster exchange of environmental data
through depositories, data catalogs, transfer mechanisms, and
nomenclature standards.
Tools Development: Evaluate and develop software tools for
the access, manipulation, and presentation of environmental
information.
End-Use Projects: Conduct specific research and
development projects for science, education, and regulations.
Education: Conduct workshops, training sessions, and
prepare educational material for environmental informatics.
R. Husar 1992
REQUIREMENTS FOR THE EI GROUP
•
•
•
•
•
The EI GROUP has to have a solid understanding of
environmental data needs for science, education, and policy
development, regulations, and other uses.
Know how to translate the information needs to information
systems and to design a data flow and transformation systems
(information engineering).
The EI GROUP has to be well versed in modern information
science and technology as applicable to environmental informatics.
Where necessary, the Group has to develop new concepts and
technologies.
Interface with, and utilize the existing governmental and private
data sources, building on and enhance not competeing with those
effort.
It has to interface with the users of the environmental
information, to assure the usefulness of the effort.
R. Husar 1992
EI GROUP OUTPUT
New Developments: Environmental Informatics
•
•
•
•
•
•
•
•
•
•
Science:
Define the domain of EI. Develop new methods to
classify, organize, and create environmental knowledge.
Engineering: Create an infrastructure and methodology for the organization,
transformation, and delivery of environmental information.
Technology: Examine the evolving technologies for Database Management
Systems (DBMS), Geographic Information System (GIS), Hypertext, Expert
Systems, User Interface, Multimedia, Object Orientation. Apply and adopt these
technologies to environmental information.
Provide High Grade Environmental Databases for Public Use
Prepare relevant, high quality, well documented, compatible, integrated, raw, and
aggregated environmental databases to be usable for science, education,
enforcement, and other purposes. Make such high quality-high value data
environmental information available to many users.
Provide Software Tools
Provide "smart" data manipulation tools that will help turning data into knowledge.
Provide tools for data access, manipulation, and presentation (e.g. GIS, Voyager,
Movie, Hypertext, Video/Sound).
R. Husar 1992
Funding
FUNDING OPTIONS
AGENCIES
PROPOSAL
RESEARCH
GROUPS
NASA
NSF
FOREST
SERVICE
NOAA
Global
Change
Education
Regional
Global
Change
EPA
Regulatory
U of
NEW HAMPSHIRE
U of
VERMONT
R. Husar 1992
WASHINGTON U.
Information and Decision Making (1)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
•
•
•
An instrument operator, traffic controller, economist .... all process information. A common thread
among these activities is that is decision making. A decision may be simple such as selecting
....replacing a . or as complex as developing a new clean air legislation. Decisions are followed by
actions and actions generally in new information . This rather circular behavior keeps the decision
process going until some goal is met, the task is finished , or the project is set aside for a time. Healthy
flow of information separates winning organizations from losers. ( More on the flow concept here)
For quality information, today's consistently successful decision makers rely on a combination of man
and mashine. Getting the best combination requires understanding how the two fit together and the
roles each may play. It also requires having an information strategy that is suitable for both he decisionmaker's preferences and the problem at hand.
Knowledge is whatever information is used to make decision.
"Deciding" is acting on information.
Managers are transformers of information
pp125
R. Husar 1992
Information and Decision Making (2)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
•
•
•
•
•
•
•
Information Flow and Decision Making
An instrument operator, traffic controller, economist .... all process information. A common thread
among these activities is that is decision making. A decision may be simple such as selecting
....replacing a . or as complex as developing a new clean air legislation. Decisions are followed by
actions and actions generally reswult in new information . This rather circular behavior keeps the
decision process going until some goal is met, the task is finished , or the project is set aside for a time.
Barring blind luck, the quality of decision can not be any better than the quality of the information
behind it.
Healthy flow of information separates winning organizations from losers. ( More on the flow concept
here)
Knowledge is whatever information is used to make decision.
"Deciding" is acting on information.
Managers are transformers of information
pp125
Despite the explosive growth in computing, we have yet to feel the full impact of the informationprocessing resource that microprocessors offer. The computing power will immensity the challenge of
developing ever more powerful methods of telling mashines to do what we whish them to do. This
requires the solution of "the software problem". Solving the "the software problem" includes producing software more quickly, with fewer bugs at lower
cost- software that is easier to to understand, modify and reuse different applications. Give user to
customize a system by modifying.
R. Husar 1992
Information and Decision Making (3)
Arno Penzias: Ideas and Information
•
•
•
UNIX - Social behavior
Most applications use different formats to move information between them. UNIX programs
communicate with each other in a specific way. This arrangement allows the programmer to plug
programs together like Lego sets, without worrying about the details of interfacing. UNIX's modularity
permits users to build customized application programs out of modular pars from libraries and
programs borrowed from friends. Convenient "User programmability" has the potential to unleash the
creative powers of many users instead of relying on the program creator for all the insights needed to
create well suited applications
What next? Search of nonprocedural programming that frees users from worrying about how a given
task is to be accomplished and allow them to merely state what they want.
R. Husar 1992
Information and Decision Making (4)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
Networking
To benefit from information created for different purposes under different conditions and at different
location, users need convenient interfaces to the systems providing the data. Ultimately, the
intervening networking technology that provides the interface should be flexible enough to accept
information in whatever format the data source provides it and translate it to the needed format most
suitable for human perception.
Human pattern recognition skills, tactile sensitivities and similar interfaces to the external world attest
to the massive processing power that the brain dedicates to such functions.
Evidently, the experience of evolution has demonstrated the need for a variety of sensitive interfaces, .
The greatest subtlety of our own human interfaces appears to be in the way we effortlessly integrate
disparate sensory inputs. It is the single good feeling you get in a theater or sports arena from words,
music, spectacle, and someone sitting next to you- all at the same time. In contrast, most of our
present technology tends to deal with each input the words the visual input etc. as a separate entity.
User preferences and productivity needs are the driving forces behind the call for better interface
between people and mashines.
Much of the additional computer processing power will be devoted to providing better interfaces
between people and mashine.
R. Husar 1992
Information and Decision Making (5)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
•
•
•
•
•
•
Computers and human information processing
While computers afford humans much valuable help in processing massive amounts of data. However, mashines are
best at manipulate numbers or symbols; people connect them to meaning.
Machines offer little serious competiion in areas of creativity, integration of disparate information, and flexible adaptation
to unforeseen circumstances. Here the human mind functions best. Computing systems lack a key attribute of human
intelligence: the ability to move from one context to another.
Just-in-time Information processing – symbiotinc co-evolution
Computers and communiation systems can speed up the Connectivity can spped
Today, access to on-line data reduction schemes enables us to think of the results as we get them. These better tools
can profoundly change the way we work. Today, we can ask questions in time to get answers, make decisions and
create more powerful ideas. Generate knowledge faster…
While ideas flow from human minds, computers can help shaping much of the information that leads to those ideas. By
providing needed information in timely way and in digestable form, electronic data processing and delivery system can
someone make informed decisions,
Tools of the mind , mind ampliing. Same way as steam enfine amplies humans physical power, the
computer/communication technologies can amlify its mental powers.
In this sence, the goal of the information techloogy promoted here is not so much to intruduce ‘artificail intelligence, but
tho amplify the actual intelligence of humans to perfom increasingly complex taks.
•
R. Husar 1992
Information and Decision Making (6)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
•
•
The Software Problem
Despite the explosive growth in computing, we have yet to feel the full impact of the informationprocessing resource that microprocessors offer. The computing power will immensity the challenge of
developing ever more powerful methods of telling machines to do what we whish them to do. This
requires the solution of "the software problem". Solving the "the software problem" includes producing software more quickly, with fewer bugs at lower
cost- software that is easier to to understand, modify and reuse different applications. Give user to
customize a system by modifying.
Most applications use different formats to move information between them. UNIX programs
communicate with each other in a specific way. This arrangement allows the programmer to plug
programs together like Lego sets, without worrying about the details of interfacing. UNIX's modularity
permits users to build customized application programs out of modular pars from libraries and
programs borrowed from friends. Convenient "User programmability" has the potential to unleash the
creative powers of many users instead of relying on the program creator for all the insights needed to
create well suited applications
What next? Search of nonprocedural programming that frees users from worrying about how a given
task is to be accomplished and allow them to merely state what they want.
R. Husar 1992
Information and Decision Making (7)
Arno Penzias: Ideas and Information
•
•
•
•
•
•
•
•
Data Access
To benefit from information created for different purposes under different conditions and at different location,
users need convenient interfaces to the systems providing the data. Ultimately, the intervening networking
technology that provides the interface should be flexible enough to accept information in whatever format the
data source provides it and translate it to the needed format most suitable for human perception.
Human pattern recognition skills, tactile sensitivities and similar interfaces to the external world attest to the
massive processing power that the brain dedicates to such functions.
Evidently, the experience of evolution has demonstrated the need for a variety of sensitive interfaces, . The
greatest subtlety of our own human interfaces appears to be in the way we effortlessly integrate disparate
sensory inputs. It is the single good feeling you get in a theater or sports arena from words, music, spectacle,
and someone sitting next to you- all at the same time. In contrast, most of our present technology tends to deal
with each input the words the visual input etc. as a separate entity.
User preferences and productivity needs are the driving forces behind the call for better interface between
people and machines.
Much of the additional computer processing power will be devoted to providing better interfaces between
people and machine.
•
•
•
R. Husar 1992
Spatial Time Series: Analysis-Forecasting - Control
Bennett, R.J. Pion Limited, London 1979
Description (Characterization)
• In order to understand the functioning of organisms, one has to
understand
•
•
1.
2.
individual holons (downward face)
the relationship between the holons (upward) Koestler’s holarchy
•
• Involves summarizing the response characteristics of the system by
purely descriptive measures.
• Description is accomplished by monitoring, followed by descriptive
statistics.
Explanation
• Associate and explain events that occur in space-time. Build assotiative,
causal relationships, build model. Analysis stages (p. 20):
–
–
–
–
–
Stage 1. Prior hypothesis of systems structure
Stage 2. System identification and specification
Stage 3. Parameter estimation
Stage 4. Check of model fit
[Stage 5. System explanation, forecasting, control ]
R. Husar 1992
Moor’s Law
The single most important thing to know about the evolution of technology is Moore's Law. Most readers will
already be familiar with this "law." However, it is still true today that the best of industry executives,
engineers, and scientists fail to account for the enormous implications of this central concept.
Gordon Moore, a founder of Intel Corporation, observed in 1965 that the trend in the fabrication of solid state
devices was for the dimensions of transistors to shrink by a factor of two every 18 months. Put simply,
electronics doubles its power for a given cost every year and a half.
In the three decades since Moore made his observation the industry has followed his prediction almost
exactly. Many learned papers have been written during that period predicting the forthcoming end of this
trend, but it continues unabated today. Papers projecting the end are still being written, accompanied
with impressive physical, mathematical, and economic reasons why this rate of progress cannot continue.
Yet it does.
Moore's Law is not a "law" of the physical world. It is merely an observation of industry behavior. It says that
things in electronics get better, that they get better exponentially, and that this happens very fast. Some,
even Gordon Moore himself, have conjectured that this is simply a self-fulfilling prophecy. Since every
corporation knows that progress must happen at a certain rate, they maintain that rate for fear of being
left behind.
It is also possible that Moore's Law is much broader than it appears. Possibly it applies to all of technology,
and has applied for centuries while we were unaware of its consequences or mechanisms. Perhaps it was
only possible to be explicit about technological change in 1965 because the size of transistors gave us for
the first time a quantitative measure of progress. If this is so, then we are embedded in an expanding
universe of technology, where the dimensions of the world about us are forever changing in an
exponential fashion.
The notion of exponential change is deceptively hard to understand intuitively. All of us are accustomed to
linear projection. We seem to view the world through linear glasses -- if something grows by a certain
amount this year, it will grow an equal amount the next year. But according to Moore's Law, electronics
that is twice as effective in a year and a half will be sixteen times as effective in 6 years and over a
thousand times as effective in 15 years. This implies periodic overthrows of everything we know. An
executive in the telecommunications industry recently said that the problem he confronted was that the
"mean time between decisions exceeded the mean time between surprises." Moore's Law guarantees the
frequency of surprises.
R. Husar 1992
Metcalfe's Law -- Network Externalities
•
•
•
•
There is another "law" that affects the introduction of new technology -- this time in an inhibiting
fashion. Metcalfe's Law, also known to economists generally as the principle of network
externalities, applies when the value of a new communications service depends on how many
other users have adopted this service. If this is the case, then the early adopters of a given
service or product are disincented, since the value they would obtain is very small in the absence
of other users. In this situation innovation is often throttled.
Metcalfe's law often applies to communications services. A classic example, of course, is the
videotelephone. There is no value in having the first videotelephone, and it only acquires value
slowly as the population of users increases. If there are n users at a given time, then there are
n(n-1) possible one-way connections. Thus the value grows as the square of the number of users.
The value starts slowly, then reaches some point where it begins to rise rapidly. It seems as if there
needs to be a critical mass for takeoff, and that there is no way to achieve that critical mass, given the burden on
initial subscribers.
Metcalfe's Law has defeated many technological possibilities, left stillborn at the starting gate of market
penetration. Nonetheless, there are important examples of breakthroughs. For example, facsimile became a
market success, but only after decades of technological viability. Even so, facsimile is a complex story, involving
the evolution of standards, the inevitable progress of electronics, the equally-inevitable progress in the efficiency
of signal-processing algorithms, and the rise of the business need for messaging services.
Moore's and Metcalfe's laws make an interesting pair. In the communications field Moore's law guarantees the
rise of capabilities, while Metcalfe's law inhibits them from happening. Devices that appear to have little intrinsic
value without the existence of a large networked community continue to diminish in cost themselves until they
reach the point where the value and cost are commensurate. Thus Moore's Law in time can overcome Metcalfe's
Law.
R. Husar 1992
Metcalfe's Law -- Network Externalities(2)
•
Economists know it as the law of increasing returns, of network externalities, but the idea is that the more people that are
connected to a network the more valuable it is. Specifically, the value of a network grows by the square of the number of
users. The value is measured by how many people I can communicate with out there, so the total value of the network grows as
the square of the number of users. Now, what this means is that a small network has almost no value, and a large network has a
huge value. What it gives you is the lock-in phenomenon of winner takes all. You want to have the same thing as everybody
else. The idea is that you don’t want to be the first person on your block to get the plague. But when all your friends get it, you
think about getting it. The more people have it, the more you’re likely to get it and suddenly there is this capture effect where
everybody has it. This law of network externality governs so much of the business and is at the heart of the Microsoft trial. Why
does Microsoft have a monopoly? Is this a natural phenomenon that has to do with networks?
•
David Reed coined another law—Reed’s Law—that says there’s something beyond Metcalfe’s Law. There are three kinds of
networks.
–
–
–
First, there’s broadcast like radio and TV, which we’ll call a Sarnoff network. The value of that network is proportional to the number of
people receiving the broadcast. Amazon would be this type of network, because people shop there but don’t interact with each other.
Then there’s the Metcalfe’s Law-type network where people talk to each other, for example, classified ads. Reed said that the important
thing about the Internet is neither of those.
The Internet exhibits a third kind of law—where communities with special interests can form. The thing about communities is there are
2n of them, so in a large network the value of having so many possible communities and subnetworks is the dominant factor. He predicts
a scaling of networks, starting with small networks having only the Sarnoff linear factor, larger networks dominated by the square factor,
and giant networks dominated by the 2n factor of the formation of communities.
•
Napster is another example of what’s going on in information technology. First, it’s an example of the kind of network where
winner takes all. Napster is where all the songs are, so that’s where everybody else is. If Napster goes under, when they go
under, then all the little sites won’t be able to replace it because people won’t find what they want there. Napster also brings up
one of the other properties of information, which is troublesome and is going to shape our society in the coming years—the idea
that information can be copied perfectly at zero cost. That flies in the face of so much of what we believe about commerce. As
my friend Douglas Adams said to me, we protect our intellectual property by the fact that it’s stuck onto atoms, but when it’s no
longer stuck onto atoms, there is really no way to protect it. He would like to sell his books at half a cent a page, the idea being
that for every page you read, you pay him half a cent. If you get into the book 20 pages and you say, “This book is really bad,”
you don’t pay anymore. That would eliminate the “copying of information at zero cost” issue that he experiences as an
author. He says people come up to him in the street and say, “I’ve read your book 10 times,” and he says, “Yes, but you didn’t
pay 10 times.”
•
So these are some of the things that trouble me about the future of information technology. What are its limits? Will the laws of
network effects doom us all to a shared mediocrity? What will happen to intellectual property and its effect on creativity? Is it
like the railroads, or is this something fundamentally different that will last through the next century?
R. Husar 1992
The Evolution of the World Wide Web
•
•
•
•
•
•
•
•
•
•
The most important case study in communications technology is the emergence of the World Wide Web. This revolutionary concept seemed to spring
from nothingness into global ubiquity within the span of only two years. Yet its development was completely unforeseen in the industry – an industry that
had pursued successive long and fruitless visions of videotelephony, home information systems, and video-on-demand, and had spent decades in the
development of ISDN with no apparent application. It now seems incredible that no one had foreseen the emergence of the Web, but except for
intimations in William Gibson’s science fiction novel Neuromancer, there is no mention in either scientific literature or in popular fiction of this idea prior to
its meteoric rise to popularity.
There is a popular notion that all technologies take 25 years from ideation to ubiquity. This has been true of radio, television, telephony, and many other
technologies prevalent in everyday life. How, then, did the Web achieve such ubiquity in only a few years? Well, the historians argue, the Web relied on
the Internet, which in turn was enabled by the widespread adoption of personal computers. Surely this took 25 years. We might even carry this further.
The personal computer would not have been possible without the microprocessor, which depended on the integrated circuit evolution, which itself
evolved from the invention of the transistor, and so forth. By such arguments nearly every development, it seems, could be traced back to antiquity.
Although the argument about the origin and length of gestation seems an exercise in futility, the important point is that many revolutions are enabled by a
confluence of events. The seed of the revolution may not seem to lie in any individual trend, but in the timely meeting of two or more seemingly-unrelated
trends. In the case of the World Wide Web the prevalence of PCs and the growing ubiquity of the Internet formed an explosive mixture ready to ignite.
Perhaps no invention was really even required. The world was ready -- it was time for the Web. While this physical infrastructure was forming in the
world’s networks and on the desktops of users, there was a parallel evolution of standards for the display and transmission of graphical information.
HTML, the hypertext markup language, and HTTP, hypertext transmission protocol, were unknown acronyms to the majority of technical people, let
alone the lay public. But the definition of these standards that would enable the computers and networks to exchange rich mixtures of text and pictures
was taking shape in Switzerland at the physics laboratory CERN, where Tim Berners-Lee was the principle champion.
The role of standards in today’s information environment is critical, but often unpredictable. What is really important is that many users agree on doing
something exactly the same way, so that everyone achieves the benefits of interoperability with everyone else. It is exactly the same concept of network
externalities that is at work in Metcalfe’s law. An international standard can stimulate the market adoption of a particular approach, but it can also be
ignored by the market. Unless users adopt a standard it is like the proverbial tree falling in the forest without a sound. Standards are, for the most part,
advisory. User coalitions or powerful corporations can force their own standards in a fascinating and ever-changing multi-player game. Moreover, de
facto standards often emerge from the marketplace itself.
So in the middle 1980s there was a prevalent physical infrastructure with latent capabilities and an abstract agreement on standards for graphics. One
more development and two brilliant marketing ideas were required to jumpstart the Web. The development was that of Mosaic at the National Center for
Supercomputing Applications at the University of Illinois. Mosaic was the first browser, a type of program now known throughout the world for providing a
simple point-and-click user interface to distributed information. Following the initial versions of Mosaic from NCSA, commercial browsers were
popularized by Netscape and Microsoft.
The revolutionary marketing ideas needed for the Web now seem obvious and ordinary. A decade ago, however, they were not at all obvious. One idea
was to enable individual users to provide the content for the Web. The other idea was to give browsers free to everyone. Between these ideas, Metcalfe's
Law was overcome. Even though browsers initially had almost no value, since there were no pages to browse, they could be obtained electronically at no
cost. The price was directly related to the value. Thus browsers spread rapidly, just as their value began to build with the accumulation of web pages.
Allowing the users to provide content was counter to every idea that had been held by industry. The telecommunications and computer industries had
tried for a decade to develop and market remote access to information and entertainment held in centralized databases. This was the cornerstone of
what were called "home information systems" that were given trials in many cities during the 1970s and 1980s. Later, the vision pursued by the industry
was that of video-ondemand -- the dream of providing access to every movie and television show
ever made, like a giant video rental store, over a cable or telephone line.
Virtually every large telecommunications company had trials and plans for videoon-demand, and the central multi-media servers required for content
storage were being developed by Microsoft, Oracle, and others. The Web exemplifies some powerful current trends -- the empowerment of users,
R. Husar and
1992
geographically-distributed content, distributed intelligence, and intelligence
control at the periphery of the network. Another principle is that of open,
standard interfaces that allow users and third parties to build new applications and capabilities upon a standardized infrastructure.
It is hard to criticize industry for pursuing the centralized approach. Imagine proposing the Web to a corporate board in 1985, and describing how
Information Technology and the Conduct of Research: The Users View
National Academy Press, Washington, D.C. 1989
•
Committee rationale: There are serious impediments to the wider and more
effective use of information technology. Committee members were active
researchers are outside the field of "information technology". In the absence
of considerable knowledge about the field, the panel was approaching it by
asking the researchers about their experiences.
•
p 1. Information technology - the set of computer and communications
technologies - has changed the conduct of scientific, engineering and clinical
research. New technologies offer the prospect of new ways of finding,
understanding, storing, and communicating information and should increase
the capabilities and productivity of researchers. Among these new
technologies are simulations, new methods of presenting observational and
computational results as visual images, the use of knowledge-based systems
as "intelligent assistants" and more flexible and intuitive ways for people to
interact with and control computers.
•
The conduct of research: The everyday work of researcher involves writing
proposals, developing theoretical models, designing experiments, collecting
data, analyzing data, communicating with colleagues, studying research
literature, reviewing colleagues work, and writing articles. They look at three
particular aspect of research: data collection and analysis, communication
and collaboration, and information storage and retrieval.
R. Husar 1992
Information Technology and the Conduct of Research: The Users View
National Academy Press, Washington, D.C. 1989
DATA COLLECTION AND ANALYSIS
It is one of the most widespread use of information technology in research. Trends:
Increased use of computers
Dramatic increase of data storage and processing capacity
Creation of new computer controlled instruments that produce more data
Increase communication among researchers using networks.
Availability of software packages for standard research (e.g. statistical)
•
•
•
Difficulties:
1.
Uneven distribution of computing resources, the has and have-nots
2.
Finding the right software. Commercial software is often unsuitable for specialized needs. Most
researchers, although they are not skilled software creators, develop their own software with the help of
graduate students. Such software is designed for one purpose and it is difficult to understand, to
maintain or transport to other computing environments.
•
3.
Transmitting data over networks at high speed.
COMMUNICATION AND COLLABORATION
Routine word processing and electronic mail are the most pervasive form of computer use. Electronic
publishing and data communication-coordination is becoming increasingly used. Trends:
Information can be shared more quickly
New collaborative arrangements
Difficulties:
Incompatibility of technologies
Networks are anarchic.
R. Husar 1992
Information Technology and the Conduct of Research: The Users View
National Academy Press, Washington, D.C. 1989
IFORMATION STORAGE AND RETRIEVAL
How it is stored determines how accessible it is. Scientific text is stored on print ( hard copy) and accessible though indices,
catalogs of a library. Data and databases are stored mostly on computers disks.
A database along with the procedures for indexing, cataloging and searching makes up an information management system.
Difficulties:
The researcher cannot get access to data; if he can, he can not read them; if he can read them, he does not know how good they
are; and if he finds them good he can not merge them with other data .
Difficulty accessing data stored by other researchers. Such access permits reanalysis and replication, both essential elements of
scientific process. At present data storage is largely an individual researcher's concern, in line with the tradition that
researchers have first right to their data. The result has bee a proliferation of idiosyncratic methods for storing, organizing,
and indexing data, with the researchers data essentially inaccessible to all other researchers.
Formats in data files vary from researcher to researcher, even within a discipline. These problems prohibit a researcher from
merging someone else's data in his own database. Hence, considerable effort must be dedicated to converting data formats.
[not enough metadata].
Finally when a researcher reads another database, he has no notion as to the quality of the data it contains. The data sets do not
have enough QC information and descriptive metadata. There is a need for evaluated high quality databases.
Given a high quality well described database, a major difficulty exists in conducting searches. Most info searches are incomplete,
cumbersome, inefficient, expensive, and executable only by specialists. Searches are incomplete because the databases
themselves are incomplete. Updating is expensive because data are stored in more than one database. Cumbersome and
inefficient because different databases are organized according to different principles. ( data models)
Another difficulty in storing data information is private ownership. By tradition, researchers hold their data privately. In general,
they neither submit their data to a central archive nor make their data available via computer. Increasingly, however, in
disciplines such as meteorology and biomedical sciences, submission of primary data into databanks is has become accepted
as a duty. In some fields, the supporting agencies require that the data be archived in machine readable format and that any
professional article be accompanied by a disk describing the underlying data. Also, a comprehensive reference service for
computer-readable data should be developed. [Master directory]
In addition, peer review of articles and proposals has been constrained by the difficulty of gaining access to the data used for the
analysis. If writer were required to make their primary data available, reviewers could repeat at least part of their analysis
reported. Such a review would be more stringent, would demand more effort from reviewers and raises some operational
questions that need to be resolved. ; but arguably lead to more careful checking of published results.
Underlying difficulties in information storage and retrieval are problems in the institutional management of resources. . Who is to
mange, maintain, and update info services.? Who is to create and enforce standards? At present, the research community
has tree alternatives: federal government which manages resources as MEDLINE< and GenBank; professional societies such
as the American Chemical Society which manages the Chemical Abstracts Service; and non-profit organizations such as
Institute for Scientific Information.
R. Husar 1992
Information Technology and the Conduct of Research: The Users View
National Academy Press, Washington, D.C. 1989
Recommendations
•
Institutions supporting researchers must develop support policies,services standards for better use if info
technology. The institutions are Universities, University Departments, Funding Agencies, Scientific
Associations, Network Administrators, Info Service providers, Software vendors and professional groups
•
The Federal Government should support software development for scientific research. The software should
meet standards of compatibility, reliability, documentation and should be made available to other researchers.
•
Data collected with government support rightfully belong to the public domain. with reasonable time for first
publication should be respected.
•
There is a pressing need for more compact form of storage
•
Tool building for non-defense software should be encouraged
•
The Federal Government should fund pilot projects to on information storage and dissemination concepts in
selected disciplines and implement software markets with emphasis on the development of generic tools useful
for multiple disciplines.
•
The institutions lead by the federal government should develop an information technology network for use by
all qualified researchers.
R. Husar 1992
Measuring for Environmental Results
by William K. Reilly, EPA Journal, May-June 1989
•
•
•
•
•
•
•
•


•

•

•
A key element in any effort to measure environmental success is information--information on where we've been with respect to environmental
quality, where we are now, and where we want to go. Since its beginning, EPA has devoted a great deal of time, attention, and money to gathering
data. We are spending more than half a billion dollars a year on collecting,, processing, and storing environmental data. Vast amounts of data are
sitting in computers at EPA Headquarters, at Research Triangle Park, North Carolina, and at other EPA facilities across the country.
But having all this information--about air and water quality, about production levels and health effects of various chemicals, about test results and
pollution discharges and wildlife habitats--doesn't necessarily mean that we do anything with it. The unhappy truth is that we have been much
better at gathering raw data than at analyzing and using data to identify or anticipate environmental problems and make decisions on how to
prevent or solve them. As John Naisbitt put it in his book Megatrends: "We are drowning in information but starved for knowledge."
Our various data systems, and we have hundreds of them, are mostly separate and distinct, each with its own language, structure, and purpose.
Information in one system is rarely transferable to another system. I suspect that few EPA employees have even the faintest idea of how much data
are available within this Agency, let alone how to gain access to it. And if that is true of our own employees, how must the public feel when they
ponder the wealth of information lurking, just out of reach, in EPA's huge and seemingly impenetrable data bases?
The strategic information effort I have described, however, will require a new attitude on the part of every EPA program manager--a willingness to
break out of the traditional constraints of media-specific and category-specific thinking.
Just as important, we must find ways to share our data more effectively with the people who paid for it in the first place: the American public.
Eventually, as EPA makes progress in standardizing and integrating its information systems, the information in those systems--apart from trade
secrets--should be as accessible as possible. Such information could be made available through on-line computer telecommunications, through
powerful new compact disc (CD-ROM) technologies, and perhaps a comprehensive annual report on environmental trends.
Sharing information with the public is an important step toward establishing a common base of understanding with the American people on
questions of environmental risk. As the recent furor over residues of the chemical Alar on apple products shows, there can be a wide gap between
public perceptions of risk and the degree of risk indicated by the best available scientific data.
EPA must share and explain our information about the hazards of life in our complex industrial society with others--with other nations, with state
and local governments, with academia, with industry, with public-interest groups, and with citizens. We need to raise the level of debate on
environmental issues and to insure the informed participation of all segments of our society in achieving our common goal: a cleaner, ,healthier
environment.
Environmental data, collected and used within the strategic framework I have described, can and will make a significant contribution to
accomplishing our major environmental objectives over the next few years. Strategic data will help us:

Create incentives and track our progress in finding ways to prevent pollution before it is generated.

Improve our understanding of the complex environmental interactions that contribute to international problems like acid rain, stratospheric
ozone depletion and global
warming.

Identify threats to our nation's ecology and natural systems--our wetlands, our marine
and wildlife resources--and find ways to reduce those threats.

Manage our programs and target our enforcement efforts to achieve the greatest
environmental results.
R. Husar 1992
USES OF ENVIRONMETAL DATA
Environmental data are used for many purposes. They may be to support environmental
management or to the good of the society by by deriving more general environmental
knowledge
•
•
•
•
•
•
•
•
Provide Historical Record
Identify Deviation from Expected Trend
Anticipate Future Environmental Problems
Provide Legal Record
Support Environmental Research
Support Environmental Education
Support Communication
Record Monitoring and Control Procedures
R. Husar 1992
•
•
•
•
•
•
•
•
•
•
Taylor Model
Taylor Model
One of the specific tools employed by the staff of University Library was the Taylor Model.[5] Taylor's model is a theoretical model and is not predictive. The University Library adapted
it as a working tool and, in turn, adopted the concepts of "value-adding" and the importance of Information Use Environments as critical guiding principles in the construction of the
Library of the Future.
In Taylor's model, individuals work in information environments and part of those information environments are the problem-solving or wrestling with problems or questions that naturally
occur. Taylor's model allows that these "problem dimensions" have certain characteristics that exist along a continuum. The model also allows that information also has traits that exist
along a continuum. The combination of the user's problem dimensions and the traits of the information involved create a picture of the "information worlds" within which groups of users
work. However effectively a given information system (in the largest sense of the word "system") meshes with the individual or group's Information Use Environment is the measure of the
degree of success of that system.
It is inherent to Taylor's model that the degree of "value-added" by any component or service within the system is judged wholly from the user's point-of-view. If it isn't valuable to the user
within the user's information environment, than the service isn't valuable, period.
In order to begin to create these pictures of how our campus clientele gather and use information, the staff conducted some 1400 interviews with representative percentages of faculty, staff,
and students. The interviewees were asked opened ended question not about how they used existing libraries services, but about how they gathered and used information.
The results of the analysis of the interviews showed that campus users do indeed have very different information gathering and use patterns and that these patterns (described in Taylor's
terminology) do differ along the lines of both discipline and scholarly level, i.e., the types of information required by those studying in the humanities are markedly different than those
required by engineers. In turn, while the nature of the material is consistent, there are differences even within a discipline among the levels in a user group, i.e., what a humanities faculty
member needs is significantly different than what a freshmen in the same area needs. There are even noticeable differences between subject areas in the same discipline such as the visual
arts as compared to the literary arts. The differences are not just present in the types of information required, but also in how the information is gathered and used. This means that what
each group values and requires differs widely. Often what the library considered important was not what the user considered important. Findings related to major user groups, especially
faculty user groups, were taken back to those groups for discussion and confirmation.
The conclusion was that developing profiles of the information gathering and use patterns by precise user group would be a powerful tool for prioritizing and planning. We also concluded
that "cookie cutter" services that offered essentially the same services to all users were no longer useful or advisable. These assumptions are in the process of being applied to other areas of
library responsibility such as resource allocation, including collection management budgets, personnel deployment, training programs, etc. The need to bring library resources to bear on
individualizing library services has become a priority. In keeping with some of the findings of the LSBC, a means to transfer the librarians investment of their resources in activities
associated with problem-solving, time-savings, and cost-savings activities is also a priority.
These and other findings form that basis for another of our specific projects, the development of a suite of virtual libraries that are discipline-specific. The term "virtual library" refers to an
environment, an environment in which the "client services" aspect is the most commonly referred to and the most immediately relevant to the user. In our discussion, a virtual library
environment is not access to some local or remote OPAC, nor is it access to the Internet or some specific listserver on the Internet. The client server component of a virtual library
environment may offer all of the latter as part of client services, but as a concept, a virtual library environment goes far beyond those notions. A virtual library environment is one in which
component parts combine to provide intellectual and real access to information, the value of which is framed entirely from the users' point of view, meeting the individuals' unique
information needs.
Virtual libraries are not a single entity, but a host of component parts brought together in a dynamic environment. Frequently, virtual libraries are also defined as the act of remote access to
the contents and services of libraries and other information resources, combining an onsite collection of current and heavily used materials in print, microformats, and electronic form, with
an electronic network which provides access to, and delivery from, external library and commercial information and knowledge sources worldwide. In essence, the faculty member and
student are provided the "effect" of a library which is a synergy created by bringing together technologically the resources of many libraries, information services, and knowledge stores.[6]
In addition, librarians will be working collaboratively with their faculty to develop the tools to build, maintain, manipulate, and distribute these collections of data resources.
5] Robert S. Taylor, Value-added Processes in Information Systems (Norwood: Ablex Publishing Corporation), 1986. Taylor refers to his model as "rather an early presentation of a way of
thinking about the field of information science" and also as "a frame of reference for ordering what we know about information use environments..." It is a very complex, powerful and
sophisticated model. To mention some of its principle components only briefly, as we do in this paper, is to do the model and the book an injustice. Interested readers should examine the
book for a true idea of the range of Taylor's thinking.
R. Husar 1992
Value of information
•
he creation of a library's catalog is a process that adds value. Taylor (1986) has suggested that there are three major processes that add value
to information: organizing, analyzing, and judgmental. Within each of these three major processes there are additional specific activities as
shown in Figure 1.
•
Figure 1 o Organizing processes
•
•
•
•
•
•
Grouping
Classifying
Relating
Formatting
Displaying
o Analyzing processes
•
•
•
•
•
•
•
•
•
•
•
•
Separating
Evaluating
Validating
Comparing
Interpreting
Synthesizing
o Judgmental processes
Presenting options
Presenting advantages/disadvantages
Typically, catalogers perform a number of these "value-adding" processes when they are creating or editing bibliographic records for the
library's catalog. Figure 2 shows how a MARC cataloging record can add different kinds of value to an information package (Taylor, 1986).
•
•
Taylor points to the fact that the term "value" is user-based and that the value-added approach is connected to the users and the user
environment. In consequence, an information system/service should be responsive to the use environment in order to help the users making
choices, or assist them in clarifying problems. The purpose is to develop information services with "provision of analysed, evaluated, and
interpreted information for use in a particular situation"x
InfoManagment There are a number of approaches to adding value to information already in use but there is room for further development. In
organisations information experts might discuss with managers their media preferences, information use strategies and barriers they have
encountered in using and applying knowledge. Information experts can then begin tailoring information products and services to enable
managers to make decisions, solve problems, think strategically, scan the environment and carry out other aspects of their work roles. One
approach adds value to information to help information users match the information provided by a system with their needs (Taylor, 1985). The
added values include ease of use, noise reduction, quality, adaptability, time-saving and cost saving. Another approach is directed toward
reducing information overload for managers (34) by increasing the quality of information. Some of the values are related to the scarcity of
information and the degree of confidence a manager places in information.
R. Husar 1992
Download