Extrapolating Trends for Information Technology Gio Wiederhold Stanford University September 1999 Based on “Trends for Information Technology” 1999 www-db.stanford.edu/pub/gio/1999/miti.htm Oct 1999 Gio XIT 1 Trends 1998 : 1999 • Users of the Internet 40% 52% of U.S. population • Growth of Net Sites (now 2.2M public sites with 288M pages) • Expected growth in E-commerce by Internet users [BW, 6 Sep.1999] An unstainable trend cannot be sustained [Herbert Stein] new services Oct 1999 1998 1999 7.2% 16.0% 6.3% 16.4% Centroid, in 1999 3.1% 10.3% ~1% of total market 2.6% 4.0% 1.4% 4.2% 8.0% 33.0% = $9.5Billion % – – – – – – segment books music & video toys travel tickets Overall 90 80 70 60 50 40 30 20 10 E-penetration Toys 0 98 99 00 01 02 03 04 0.3 1 3 9 27 81 ** Year / % Gio XIT 2 Interactions Research & Inno vation Tool building General Technology Push Information Technology Consumer Product building & marketing Pull Business needs Government responsibilities Oct 1999 Gio XIT 3 Assumptions • Hardware technology will continue to lead and encourage broader usage • Communication technology will continue to lead and become more economical • User interfaces will improve and not be a barrier to the acceptance of technology • Government policies will not hinder open interaction - or not be able to Oct 1999 Gio XIT 4 The Problem of Information Growth: "We are drowning in information but starved for knowledge. This level of information is clearly impossible to be handled by present means. Uncontrolled and unorganized information is no longer a resource in an information society, instead it becomes the enemy." -- John Naisbitt, author of 1982 bestseller Megatrends . . . and it’s not getting better Dealing with this issue requires Precision: • Helpful for casual users • Essential for business Oct 1999 Gio XIT 5 Precision in: • Search for Information – recall versus precision • Relevance of Information for the Customer – modeling the customer • Meaning of the Information – resolving semantic mismatch • Timeliness of Information – resolving temporal mismatch Service model to achieve these objectives services add value by increasing precision Oct 1999 Gio XIT 6 Search techniques to add value Yahoo Junglee AltaVista Excite Firefly Cookies Alexa Google catalogues and organizes useful web sites. integrates diverse sources. automatically surfs and indexes the web. also tracks queries and classifies customers. provides customer control over their profiles. track users’ activities between sessions. collects webpages and their usage. ranks the reference importance of web pages. ... Oct 1999 Gio XIT 7 Problems for search engines and progress • Unsuitable source representations • part classification: HTML --- XML • print formats: postscript, adobe PDF • non-text: images, sound, video • hidden in databases behind CGI scripts Being improved. Rate? • Inconsistent semantics • context distinct / scope / view • Naïve modeling of customers • roles & growth Search engines cannot solve all problems Oct 1999 Gio XIT 8 The world wide information network and its participants _ …. _ External: …. sources and / or sinks _ …. _ _ …. …. _ data, meta-data, knowledge Oct 1999 _ …. _ _ …. _ _ …. _ …. …. …. _ …. _ …. …. _ …. _ _ …. _ …. _ …. _ …. …. _ …. _ …. _ …. _ …. Internal: transformers and memory. Gio XIT 9 Understand the Architecture for Information Technology: Component Classification Customers Customers Customers Customers Customers Services Services Services Sources Sources Sources Oct 1999 Gio XIT 10 Specifications for the components Customer models Customers Customers Customers Customers Customers Services Services Services Sources Sources Sources Oct 1999 Catalogs Content & Methods Metadata Gio XIT 11 Functional Service Layers Human-computer Interaction User interface Client Service interface Resource access interface Applicationspecific code MEDIATION Services Domainspecific code Available Sources Sourcespecific code Real-world interface Oct 1999 Gio XIT 12 Modeling: sources • Models provide abstractions • abstractions represent a point of view • Models of databases are schemas and E-R models • well established • constraints - references, uniqueness • scopes remain implicit • Information systems have meta-data • XML has DTD’s • under discussion, still limited Focus on resources Meta data Oct 1999 Gio XIT 13 Customer models Customer is a person 6 one specific task • arranging a vacation trip • activity location town hotel by grade flight public transport rented car •arranging a business trip • location hotel by plan flight taxi or rented car • getting a computer for Joe Cheap • search CPU by price modem display • getting a computer for Peter Fast • search CPU by speed storage display network Hierarchical alternatives at each level ( evaluate, commit, rollback ) Oct 1999 Gio XIT 14 Personal vs. Customer Model Actual Person has multiple roles how to switch explicitly implicitly keep past contexts Oct 1999 Switching rate will differ • work versus fun • adequacy of models Gio XIT 15 Service layer Multiple domains ! Customer Service MEDIATION Resource access Oct 1999 Shared software, standards ? Gio XIT 16 Value-added intermediate services 1 Needs Technologies extant and new Describe customer Build interpretable workflow model with model meta-specifications for selection Discover new resources Monitor and index public metadata, describe resource capabilities, contents & methods Select relevant resources Match available metadata and indices of resource contents to leaf nodes in the customer model Easy access to resources Wrap resources to make them compatible, exploit wrapper templates, skip unavailable sources Filter out excessive data Filters attached to the customer model; balance relevant volume and precision Oct 1999 Gio XIT 17 Value-added intermediate services 2 Needs Technologies, extant and new Identify articulations * Matching of related concepts, use articulation rules to match nodes Match level of detail * Automatic abstraction to match sources at Integrate information articulation points within the customer model Attach data instances to articulation points, combine elements , link to customer model Omit redundant data, documents Match data for content, omit overlap, report inconsistencies in overlapping sources Reduce customer overload Summarize according to customer model, rank information at each level Inform customer Present information according to model hierarchy, consider bandwidth Oct 1999 Gio XIT 18 Abstraction layers differ: Example in medical research • • • • • • • • Individual patient records Family based genetic traces Disease-based summaries Genetically-linked disease data Ligand-based genomic segments Aggregated gene sequences 3-D configurations of segments Drug-gene interactions All have their own hierarchies, roots Oct 1999 Gio XIT 19 * Combining the models Identify articulations • Match customer and resource terms • semantic mismatches • thesauri, matching rules Match level of detail • Match customer and resource values, summarize numbers, result ranks • completeness, unit mismatches, text • indicate constraints in models • textual abstraction • input for visualization Oct 1999 Gio XIT 20 Mediator Service Design Principle Transform Data into Information Match User Model Hierarchical to Resource Model General network (and maintain models) Oct 1999 Gio XIT 21 Result modes for ranking Databases: • Completeness • All the answers Prolog • Correctness • The first answer Optimization • The best one • Assumes all factors are known, no human decision Oct 1999 Customer: • wants choices • explanation • background Gio XIT 22 Ranking Qualitative Significant Differences: in terms of the customer model Plan 1. UA59 dep.Wash.Dulles 17:10, arr. LAX 19:49 Plan 2. AA75 dep.Wash.Dulles 18:00, arr. LAX 20:10 Plan 3. UA119 dep.Wash.Dulles 9:25, arr. LAX 12:00 Oct 1999 Busy Joe: Speedy Mike: Greedy Pete: P1= P2, P3 P2, P1=P3 P1=P3, P2 Gio XIT 23 Mediation for Quality User Model f(S,C,T) S= source reliability C= confidence Assessments: T= S1=.8 S2=.9 S3=8 Estimates: C1= 5+_1 T1=100+_160 S1 Oct 1999 C2= 8+_1 T2=70+_30 S2 BEST= low cost rapid response reliable delivery trustworthiness C3= 10+_1 T3=50+_80 S3 Gio XIT 24 Computing Projections For decision-making: not just past data Next period alternatives and subsequent periods 0.25 0.5 0.6 0.3 0.05 0.3 0.3 0.07 0.1 0.4 0.2 past now 01.3 future time Integrate simulation results into information systems: SQL SimQL Oct 1999 Gio XIT 25 Extending the support into the future Must manage multiple projected futures --Novel tools needed to help the decision maker: 1. Assess the likelihood of a branch being taken (if not controlable) 2. Compute probabilities into the future, up to desired/final endpoints 3. Compute results at each node, by backtracking from the endpoints and considering the probabilities 4. Compare the associated costs and benefits for the alternatives at any future time 5. Recalculate to get new, better values, less uncertainty • Trim or summarize unlikely branches to reduce the complexity • Prune to the current state and delete all but one actual path Oct 1999 Gio XIT 26 Architecture instances Applications . . . . Mediators . . . . . . Resources . . . _ …. …. . _ …. …. . _ …. …. . include computational resources Oct 1999 Gio XIT 27 Assigning maintenance responsibility a. Source data quality – supplier database, files, or web pages b. Interface to the source – Sources wrapper, supplier or vendor for supplier c. Source selection – expert specialist in mediator d. Source quality assessment – customer input to mediator Services e. Semantic interoperation – specialist group providing input to the mediator f. Consistency and metadata information – mediator service operation or warehouse g. Informal, pragmatic integration – client services with customer input h. User presentation formats – Customers client services with customer input Oct 1999 Gio XIT 28 Summary To sustain the trend 1. The value of the results has to keep increasing precision, relevance not volume 2. Value is provided by experts, encoded as models of diverse resources, customers Problems to be addressed mismatches quality Clear models temporal extensions maintenance } Oct 1999 Gio XIT 29 Technology Transition . • Economic drivers have to be considered. • Three party model a • Industry: need-based invention • academia: formalization • innovators: new technology I i • New Service models provide new Opportunities • supply innovative tools to industry • supply specialized information to industry Oct 1999 Gio XIT 30 Understanding the other parties Motivation is profit and loss avoidance of • Industry: investment -– payoff to stockholders / retain value / stable • Academia: prestige -- (leads to continuing funding) – visibility, not stability or reliability • Innovative businesses: leverage -- not sustainable – low downside cost, high upside risk, – change expected and needed • Government research: – technology dissemination & shelving service ? Oct 1999 Gio XIT 31 Research economy transfer paths people results high volume Taxes high-value modest volume Product suppliers (PS) Products Tool suppliers (TS) versus Customers Research Teaching Oct 1999 Government Gio XIT 32 Operating Systems • Microsoft Windows, personal computer and WS. proprietary product, no obligations to hardware, rapidly adapted to new requirements • UNIX, an open systems, consensus and takes time. • SUN servers • LINUX clients and servers, free, low entry cost • …. • Mainframe operating systems, little growth expected • VMS (COMPAQ) reliable 24 hour / 7 day Oct 1999 Gio XIT 33 1 Pre-competitive development. 2 Integration and Marketing 3. Problem: Asynchrony. 3.1 Industry-driven. research. 3.2 Curiosity-driven research. 3.3 Fundamental research 3.4 Transition windows 4 Transition agents. 4.1 Link academic researchers to industry 4.2 Link academic and industrial research. 4.3 Startup companies. 4.4 Incubator services. 4.5 Research stores. Commercial Technology Transfer Company. Governmental Technology Transfer Institute. Other candidate organization models for research stores. 5 Research Venues and Technology Transfer. 6 Summary Gio XIT 34 Oct 1999 Alternative solutions • A Super Database – unwieldly – obsolete before it is established • Distributed, free standing databases (today) – awkward for sharing information (much knowledge derives from the intersections) – hyperlinks and shared references allow navigation • Distributed databases with a single standard allowing interoperation – standards follow progress, cannot lead it • Distributed databses with published formats – requires rapid adaptation to keep up with resources (but the number of resources per project will be limited) with mediators to isolate projects from resources Oct 1999 Gio XIT 35 Paying • • • Free goods (as information), supported by advertisers The referred service pays for references made After contact and selection direct by credit card at some processing overhead and delay • Customer trust for tolerable losses • Audited ba mediator, violators are blacklisted only • Escrow for substantial value: more delay • Very small transactions use wallets a. Risk is assumed by the vendor: b. Risk is assumed by the customer: • Subscriptions for long-term interactions Oct 1999 Gio XIT 36