ppt - Programming Systems Lab

advertisement

COMS E6125 Web-enHanced

Information Management

(WHIM)

Prof. Gail Kaiser

Spring 2009

20 January 2009 Kaiser: COMS E6125 1

What is this course about?

• Information Management on/for the Web

• General hypertext and markup

• Web protocols and mechanics

• Structuring Web content

• Developing Web applications

 Lectures will survey basics

20 January 2009 Kaiser: COMS E6125 2

What is this course

NOT about?

• Networking

• Internet services

• Network security

• User interfaces

• Multimedia

• Mobile computing

• The latest greatest technology from facebook, google, yahoo,

<fill in here>

 Students choose their own advanced topics

20 January 2009 Kaiser: COMS E6125 3

Website

• Course website: http://bank.cs.columbia.edu/classes/cs6125

– Syllabus, lecture slides (ppt and pdf), assignments, everything else you need to know about the course

• We will also use CourseWorks

– Assignment submission, optional discussion board

(e.g., to find team members for project)

20 January 2009 Kaiser: COMS E6125 4

Teaching Staff

• Instructor: Prof. Gail Kaiser, kaiser+6125@cs.columbia.edu

(note the

+6125, its important!)

• TAs: Mr. Swapneel Sheth, sks2142@columbia.edu

, and

Mr. Suman Srinivasan, sumans@cs.columbia.edu

• Check website for office hours

20 January 2009 Kaiser: COMS E6125 5

Textbook

• None - some sources will be referenced, but generally you should find your own technical materials

20 January 2009 Kaiser: COMS E6125 6

Course Organization

• First half of class sessions will consist of overview/survey lectures

(breadth)

• Second half of class sessions will consist of student presentations

• Students should choose one or more relevant areas of interest for their paper and project (depth)

20 January 2009 Kaiser: COMS E6125 7

Course

Grading/Workload

• 45% individual research paper

• 45% individual or team project

• 10% individual presentation

20 January 2009 Kaiser: COMS E6125 8

First Assignment:

Paper Proposal

• Sketch the topic you have in mind

• Include tentative reference list

(specific background reading to learn about the topic)

• Long list of suggested topics at http://bank.cs.columbia.edu/classes/cs

6125/topics.htm

, or invent your own

20 January 2009 Kaiser: COMS E6125 9

First Assignment:

“Goal” of Paper

• Do not simply survey some topic

• Compare this to that, argue a position in favor or against something, evaluate something according to some meaningful criteria, etc.

• Explain why your topic is relevant to this course (this may be obvious to you but may

not be to me)

20 January 2009 Kaiser: COMS E6125 10

First Assignment:

Background Reading

• List some specific materials you intend to read to learn about the topic

– Scholarly papers from conferences or journals

– White papers

– Third-party reviews or commentaries (blogs ok)

– System documentation

– Specifications of "standards" (or proposed standards)

– Not just advertising or publicity brochures

– Not wikipedia

• Should include materials from at least two different points of view (e.g., do not take all your references from the same website)

20 January 2009 Kaiser: COMS E6125 11

First Assignment:

Logistics

• Due Monday February 2 nd by 5pm

• Maximum two pages (not including optional figures and required reference list)

• Submit by posting in Paper Proposal folder on CourseWorks

• Must be in a format I can read, which means pdf, word, html, plain ascii text

(with all figures embedded or viewable in a browser without special “plugins”)

20 January 2009 Kaiser: COMS E6125 12

Upcoming Assignments:

Paper

• Paper outline due Monday February

16 th

• Full paper due Friday March 13 th

20 January 2009 Kaiser: COMS E6125 13

Heads Up on Project

• Project Proposal due Monday March 23 rd

Optionally work in teams (see http://bank.cs.columbia.edu/classes/cs6125/team

_advice )

• Build a new system or extend an existing system – submit code, demo system

• OR evaluate/compare one or more existing system(s) – submit procedures and findings, demo evaluation harness

• You may "continue" your paper topic towards the project, or do something entirely different

20 January 2009 Kaiser: COMS E6125 14

Heads Up on Presentation

• Individual ~10 minute talk in class during one of last few class sessions

• No formal proposal, clue me to the topic when you schedule the presentation

• May be based on paper, project, or some other topic (in the case of team members all presenting on the same project, please coordinate to avoid redundancy and discuss your plans with me in advance)

20 January 2009 Kaiser: COMS E6125 15

Today’s Topic

• History of Hypertext and the Web

 Warning: The upcoming content is not terribly technical, intended just to introduce the historical context

20 January 2009 Kaiser: COMS E6125 16

In The Beginning…

• “ As We May Think ”, by Vannevar Bush, in

The Atlantic Monthly, July 1945

• Recommended that scientists work on inventing machines for storing, organizing, retrieving and sharing the increasing vast amounts of human knowledge

• He targeted physicists and electrical engineers - there were no computer scientists in 1945

20 January 2009 Kaiser: COMS E6125 17

“Memex” Proposal

• MEMEX = MEMory EXtension

• Create and follow “associative trails”

(links) and annotations between microfilm documents

• Technically based on “rapid selectors”

Bush built in 1930’s to search microfilm

• Conceptually based on human associative memory rather than indexing

20 January 2009 Kaiser: COMS E6125 18

“Memex” Design

20 January 2009 Kaiser: COMS E6125 19

Who was Vannevar Bush ?

• MIT Prof, advisor to US President Roosevelt

• Developed devices intended to detect submarines during World War I

• Organized government support of academic research for military applications in World

War II

• Called for continued federal support of academic research after the war, e.g.,

National Science Foundation ( NSF )

• Memex (never implemented) was one possible suggestion for what scientists should “do” for the government now that the war was over

20 January 2009 Kaiser: COMS E6125 20

And then came

“ Hypertext ”

• Ted Nelson coined term ~1965

• The prefix hyper- ("over" or

"beyond") signifies the overcoming of the linear constraints of written text

20 January 2009 Kaiser: COMS E6125 21

NLS Too Early

• Doug Engelbart at SRI starting ~1962

• Developed oN Line System (NLS) to crossreference research papers for sharing among geographically distributed researchers

• Invented the computer mouse

• Invented WYSIWYG word processing

• Invented windows-based desktop, including on-line help system

• Invented online teleconferencing

• All publicly demonstrated in 1968

20 January 2009 Kaiser: COMS E6125 22

"I don't know why we call it a mouse. It started that way and we never changed it."

20 January 2009 Kaiser: COMS E6125 23

Xanadu Too Vaporware

• Ted Nelson defined sophisticated Xanadu allencompassing hypermedia publishing system

• Bi-directional links, versioning (no deletion, no broken links)

• Excruciatingly concerned with copyrights, permissions and micropayments for linking and access (“transcopyright”)

• Rails against Web that “trivializes” Nelson’s hypertext model and states “We fight on.”

• Version 1.0 finally released in June 2007

( http://xanarama.net/ )

20 January 2009 Kaiser: COMS E6125 24

20 January 2009 Kaiser: COMS E6125 25

20 January 2009 Kaiser: COMS E6125 26

Many Others Followed

• Numerous academic and a few commercial hypertext systems from ~1967 - 1980s

– Brown University Hypertext Editing System

– CMU ZOG

– Xerox PARC NoteCards

– Apple Hypercard

• Used for manuals/handbooks, museum exhibits, education, collaborative work

• First ACM Hypertext Conference in 1987

20 January 2009 Kaiser: COMS E6125 27

Problematic Issues

• Some systems “closed”, with links directly embedded in documents (markup)

• Others “open”, with separate linkbase (database of anchors and resource locators)

• Search (information retrieval)

• Cognitive overhead for authors

• Disorientation for users (“lost in hyperspace”)

Scaling to large numbers of documents and links, and/or to large numbers of users

20 January 2009 Kaiser: COMS E6125 28

Proposed “ Dexter ”

Standard

• Series of academic workshops 1988-1990

• For comparison and interchange

• 3 layers: run-time, storage and withincomponent (anchors)

• Computed links, multi-headed links, links to links, typed links, links as components

• Extended to support multimedia synchronization, link context

 Still didn’t scale

20 January 2009 Kaiser: COMS E6125 29

Open Hypermedia

Systems

• Early to Mid-1990s

• Standard “ open hypermedia protocol ” (link service) supporting client interoperability

• Anchors and links maintained in a linkbase separate from the [read-only] documents

• Typically “wrap” document editors and viewers to define anchors and follow links

• No distinction between authors and users

 Still didn’t scale

20 January 2009 Kaiser: COMS E6125 30

And finally … the

World Wide Web

• A big step backwards?

– Embedded links (markup), unidirectional, untyped, not application-independent, etc.

– Readers cannot easily be authors, no private or group annotations and links over read-only (to readers) documents

• But it scaled, perhaps because it indeed allowed dangling links (a hypertext no-no)

• And attracted authors and users like no other hypertext system before or since

20 January 2009 Kaiser: COMS E6125 31

Information Management:

A Proposal

• By Tim Berners-Lee , then a Physicist at CERN

(Swiss National Physics Lab)

• TBL had earlier (~1980) developed another hypertext system for CERN, Enquire, which was little-used and eventually “lost” ( manual still available )

• Proposal written March 1989, more widely circulated May 1990

• Originally called “Mesh” or “Information Mesh”

20 January 2009 Kaiser: COMS E6125 32

Information Management:

A Proposal

• Persuaded CERN management to fund development of a “global” hypertext system

• Goal to manage information about accelerators and physics experiments as projects evolved and staff turned over

• Development started October 1990, by TBL,

Robert Cailliau and a couple visiting students, now called “World Wide Web”

20 January 2009 Kaiser: COMS E6125 33

Problem: Information Loss

• CERN involved several thousand people, with very high turnover, organized into a multiply connected "web" whose interconnections evolve

• Information about what physics experiment facilities (including software) existed and how to find out about them traveled informally

• Much information never recorded, or too hard or time-consuming to find

20 January 2009 Kaiser: COMS E6125 34

Examples

• Where is this module used?

• Who wrote this code? Where does he/she work?

• What documents exist about that concept?

• Which laboratories are included in that project?

• Which systems depend on this device?

• What documents refer to this one?

20 January 2009 Kaiser: COMS E6125 35

Predictions

“CERN is a model in miniature of the rest of world in a few years time.

CERN meets now some problems which the rest of the world will have to face soon. In 10 years, there may be many commercial solutions to the problems above, while today we need something to allow us to continue.”

20 January 2009 Kaiser: COMS E6125 36

Solution: Linked Information

• Pool of information that could grow and evolve with the organization and the projects it describes

• "web" of nodes with links between them is far more useful than a fixed hierarchical system

20 January 2009 Kaiser: COMS E6125 37

Example Nodes

• People

• Groups of people

• Software modules

• Projects

• Concepts

• Documents

• Types of hardware

• Specific hardware objects

20 January 2009 Kaiser: COMS E6125 38

Example Links

• A depends on B

• A is part of B

• A made B

• A refers to B

• A uses B

• A is an example of B

20 January 2009 Kaiser: COMS E6125 39

General Requirements

 System must allow any sort of information to be entered

 Another user must be able to find the information, sometimes without knowing what he/she is looking for o System should be aware of the generic types of the links between items (e.g., dependencies), and the types of nodes

(people, things, documents…) without imposing any limitations

20 January 2009 Kaiser: COMS E6125 40

System Requirements

 Remote access across networks

 Platform heterogeneity

 Non-Centralization - allow existing systems to be linked together without requiring any central control or coordination

 Access to existing data and databases in hypertext form o Private links - one must be able to add one's own private links to and from public information, and also annotate links as well as nodes privately

20 January 2009 Kaiser: COMS E6125 41

Bells and Whistles:

Graphics

 “Storage of ASCII text, and display on 24x80 screens, is in the short term sufficient, and essential.

Addition of graphics would be an optional extra with very much less penetration for the moment.”

20 January 2009 Kaiser: COMS E6125 42

Bells and Whistles:

Automatic Data Analysis

o Search for anomalies such as undocumented software or divisions which contain no people

 Generate lists of people or devices for other purposes, such as mailing lists of people to be informed of changes o Look at the topology of an organization or a project, and draw conclusions about how it should be managed, and how it could evolve

20 January 2009 Kaiser: COMS E6125 43

Bells and Whistles:

Visualization

 “Imagine making a large three-dimensional model, with people represented by little spheres, and strings between people who have something in common at work. Now imagine picking up the structure and shaking it, until you make some sense of the tangle: perhaps, you see tightly knit groups in some places, and in some places weak areas of communication spanned by only a few people. Perhaps a linked information system will allow us to see the real structure of the organisation in which we work.”

20 January 2009 Kaiser: COMS E6125 44

Bells and Whistles:

Live Links

 Allow documents to be linked into "live" data so that every time the link is followed, the information is retrieved

 The data to which a link (or a hot spot) refers may be very static, or it may be temporary

 [If one sacrifices portability], make following a link fire up a special application, so that diagnostic programs, for example, could be linked directly into the maintenance guide

20 January 2009 Kaiser: COMS E6125 45

Non-Requirements

 “Discussions on Hypertext have sometimes tackled the problem of copyright enforcement and data security. These are of secondary importance at CERN, where information exchange is still more important than secrecy.

Authorisation and accounting systems for hypertext could conceivably be designed which are very sophisticated, but they are not proposed here. In cases where reference must be made to data which is in fact protected, existing file protection systems should be sufficient.”

20 January 2009 Kaiser: COMS E6125 46

Specific Applications

 Development Project Documentation

 Document Retrieval

 Personal Skills Inventory

20 January 2009 Kaiser: COMS E6125 47

Original Vision (CERN‘89)

20 January 2009 Kaiser: COMS E6125 48

Client/Server

Architecture

20 January 2009 Kaiser: COMS E6125 49

Gateways to

Existing Data

20 January 2009 Kaiser: COMS E6125 50

Implementation

• Originally combination browser and editor only on NeXT cubes

• Later line-mode browser, GUI browsers for X and Mac

• First web server was nxoc01.cern.ch, later called info.cern.ch

20 January 2009 Kaiser: COMS E6125 51

20 January 2009 Kaiser: COMS E6125 52

Deployment

• Line-mode browser released for use outside CERN in August 1991

• Submission to 1991 ACM Hypertext conference rejected

• Various GUI browsers released in 1992

• Mosaic released by NCSA in September

1993, developed by undergraduate Marc

Andreesen (who later founded Netscape )

20 January 2009 Kaiser: COMS E6125 53

Load on info.cern.ch

But info.cern.ch preexisted the Web

• Internet != Web

• 1962 military packet switching network invented (on paper)

• 1969 ARPANET comes on line with 4 nodes

• 1976-1983 UUCP, BITNET, CSNET, etc.

• 1985 Merged Internet with 2k nodes

• 1988 56k nodes, 1992 1.1G nodes, 1996

15G nodes, …

20 January 2009 Kaiser: COMS E6125 55

Internet Information Access

• Lots of anonymous ftp resources available by mid 1970s, but had to know where to look

• 1989 McGill’s Archie (ARCHIvE) finds files by name using regular expressions

• 1990 Thinking Machine’s WAIS (Wide Area

Information Servers) adds content indexing

• 1991 U. Minn.’s Gopher (“go for” and school mascot) adds friendly menu-based UI, augmented by U. Nevada’s VERONICA spider indexing

20 January 2009 Kaiser: COMS E6125 56

Money and Politics

• University of Minnesota announced that they would begin to charge licensing fees for Gopher's use in February 1993

• US government’s Acceptable Use Policy previously prohibiting commercial use of the Internet “re-interpreted” in March

1993

• CERN's directors announce in April 1993 that WWW technology would be freely usable by anyone, with no fees payable to

CERN

20 January 2009 Kaiser: COMS E6125 57

What happened to

Tim Berners-Lee?

• Left CERN in 1994 for MIT to become the

Director of the new World Wide Web

Consortium ( W3C )

• Technically a research staff member, not an MIT professor (only has a BA, in

Physics)

• Knighted by Queen Elizabeth II in 2004, numerous other honors and awards

• Never got rich…

20 January 2009 Kaiser: COMS E6125 58

Summary

• Many people trace the Web’s origins to Vannevar

Bush, although there were other early attempts to introduce something hypertext-like over microfiche and/or paper documents

• TBL’s World Wide Web “succeeded” whereas numerous earlier and contemporary hypertext systems “failed” because it was simple and

scalable without trying to be perfect

• Many fancy ideas from other hypertext work are being re-introduced on top of Web (Web 2.0)

20 January 2009 Kaiser: COMS E6125 59

Reminders

• Paper proposal due February 2 nd

• Project proposal due March 23 rd

• Paper must be individual, projects may optionally be done in teams

20 January 2009 Kaiser: COMS E6125 60

COMS E6125 Web-enHanced

Information Management

(WHIM)

Prof. Gail Kaiser

Spring 2008

20 January 2009 Kaiser: COMS E6125 61

Download