Prof. Gail Kaiser
Spring 2009
20 January 2009 Kaiser: COMS E6125 1
• Information Management on/for the Web
• General hypertext and markup
• Web protocols and mechanics
• Structuring Web content
• Developing Web applications
Lectures will survey basics
20 January 2009 Kaiser: COMS E6125 2
• Networking
• Internet services
• Network security
• User interfaces
• Multimedia
• Mobile computing
• The latest greatest technology from facebook, google, yahoo,
<fill in here>
Students choose their own advanced topics
20 January 2009 Kaiser: COMS E6125 3
• Course website: http://bank.cs.columbia.edu/classes/cs6125
– Syllabus, lecture slides (ppt and pdf), assignments, everything else you need to know about the course
• We will also use CourseWorks
– Assignment submission, optional discussion board
(e.g., to find team members for project)
20 January 2009 Kaiser: COMS E6125 4
• Instructor: Prof. Gail Kaiser, kaiser+6125@cs.columbia.edu
(note the
+6125, its important!)
• TAs: Mr. Swapneel Sheth, sks2142@columbia.edu
, and
Mr. Suman Srinivasan, sumans@cs.columbia.edu
• Check website for office hours
20 January 2009 Kaiser: COMS E6125 5
• None - some sources will be referenced, but generally you should find your own technical materials
20 January 2009 Kaiser: COMS E6125 6
• First half of class sessions will consist of overview/survey lectures
(breadth)
• Second half of class sessions will consist of student presentations
• Students should choose one or more relevant areas of interest for their paper and project (depth)
20 January 2009 Kaiser: COMS E6125 7
• 45% individual research paper
• 45% individual or team project
• 10% individual presentation
20 January 2009 Kaiser: COMS E6125 8
• Sketch the topic you have in mind
• Include tentative reference list
(specific background reading to learn about the topic)
• Long list of suggested topics at http://bank.cs.columbia.edu/classes/cs
6125/topics.htm
, or invent your own
20 January 2009 Kaiser: COMS E6125 9
• Do not simply survey some topic
• Compare this to that, argue a position in favor or against something, evaluate something according to some meaningful criteria, etc.
• Explain why your topic is relevant to this course (this may be obvious to you but may
not be to me)
20 January 2009 Kaiser: COMS E6125 10
• List some specific materials you intend to read to learn about the topic
– Scholarly papers from conferences or journals
– White papers
– Third-party reviews or commentaries (blogs ok)
– System documentation
– Specifications of "standards" (or proposed standards)
– Not just advertising or publicity brochures
– Not wikipedia
• Should include materials from at least two different points of view (e.g., do not take all your references from the same website)
20 January 2009 Kaiser: COMS E6125 11
• Due Monday February 2 nd by 5pm
• Maximum two pages (not including optional figures and required reference list)
• Submit by posting in Paper Proposal folder on CourseWorks
• Must be in a format I can read, which means pdf, word, html, plain ascii text
(with all figures embedded or viewable in a browser without special “plugins”)
20 January 2009 Kaiser: COMS E6125 12
• Paper outline due Monday February
16 th
• Full paper due Friday March 13 th
20 January 2009 Kaiser: COMS E6125 13
• Project Proposal due Monday March 23 rd
• Optionally work in teams (see http://bank.cs.columbia.edu/classes/cs6125/team
_advice )
• Build a new system or extend an existing system – submit code, demo system
• OR evaluate/compare one or more existing system(s) – submit procedures and findings, demo evaluation harness
• You may "continue" your paper topic towards the project, or do something entirely different
20 January 2009 Kaiser: COMS E6125 14
• Individual ~10 minute talk in class during one of last few class sessions
• No formal proposal, clue me to the topic when you schedule the presentation
• May be based on paper, project, or some other topic (in the case of team members all presenting on the same project, please coordinate to avoid redundancy and discuss your plans with me in advance)
20 January 2009 Kaiser: COMS E6125 15
• History of Hypertext and the Web
Warning: The upcoming content is not terribly technical, intended just to introduce the historical context
20 January 2009 Kaiser: COMS E6125 16
• “ As We May Think ”, by Vannevar Bush, in
The Atlantic Monthly, July 1945
• Recommended that scientists work on inventing machines for storing, organizing, retrieving and sharing the increasing vast amounts of human knowledge
• He targeted physicists and electrical engineers - there were no computer scientists in 1945
20 January 2009 Kaiser: COMS E6125 17
• MEMEX = MEMory EXtension
• Create and follow “associative trails”
(links) and annotations between microfilm documents
• Technically based on “rapid selectors”
Bush built in 1930’s to search microfilm
• Conceptually based on human associative memory rather than indexing
20 January 2009 Kaiser: COMS E6125 18
20 January 2009 Kaiser: COMS E6125 19
• MIT Prof, advisor to US President Roosevelt
• Developed devices intended to detect submarines during World War I
• Organized government support of academic research for military applications in World
War II
• Called for continued federal support of academic research after the war, e.g.,
National Science Foundation ( NSF )
• Memex (never implemented) was one possible suggestion for what scientists should “do” for the government now that the war was over
20 January 2009 Kaiser: COMS E6125 20
• Ted Nelson coined term ~1965
• The prefix hyper- ("over" or
"beyond") signifies the overcoming of the linear constraints of written text
20 January 2009 Kaiser: COMS E6125 21
• Doug Engelbart at SRI starting ~1962
• Developed oN Line System (NLS) to crossreference research papers for sharing among geographically distributed researchers
• Invented the computer mouse
• Invented WYSIWYG word processing
• Invented windows-based desktop, including on-line help system
• Invented online teleconferencing
• All publicly demonstrated in 1968
20 January 2009 Kaiser: COMS E6125 22
"I don't know why we call it a mouse. It started that way and we never changed it."
20 January 2009 Kaiser: COMS E6125 23
• Ted Nelson defined sophisticated Xanadu allencompassing hypermedia publishing system
• Bi-directional links, versioning (no deletion, no broken links)
• Excruciatingly concerned with copyrights, permissions and micropayments for linking and access (“transcopyright”)
• Rails against Web that “trivializes” Nelson’s hypertext model and states “We fight on.”
• Version 1.0 finally released in June 2007
( http://xanarama.net/ )
20 January 2009 Kaiser: COMS E6125 24
20 January 2009 Kaiser: COMS E6125 25
20 January 2009 Kaiser: COMS E6125 26
• Numerous academic and a few commercial hypertext systems from ~1967 - 1980s
– Brown University Hypertext Editing System
– CMU ZOG
– Xerox PARC NoteCards
– Apple Hypercard
• Used for manuals/handbooks, museum exhibits, education, collaborative work
• First ACM Hypertext Conference in 1987
20 January 2009 Kaiser: COMS E6125 27
• Some systems “closed”, with links directly embedded in documents (markup)
• Others “open”, with separate linkbase (database of anchors and resource locators)
• Search (information retrieval)
• Cognitive overhead for authors
• Disorientation for users (“lost in hyperspace”)
Scaling to large numbers of documents and links, and/or to large numbers of users
20 January 2009 Kaiser: COMS E6125 28
• Series of academic workshops 1988-1990
• For comparison and interchange
• 3 layers: run-time, storage and withincomponent (anchors)
• Computed links, multi-headed links, links to links, typed links, links as components
• Extended to support multimedia synchronization, link context
Still didn’t scale
20 January 2009 Kaiser: COMS E6125 29
• Early to Mid-1990s
• Standard “ open hypermedia protocol ” (link service) supporting client interoperability
• Anchors and links maintained in a linkbase separate from the [read-only] documents
• Typically “wrap” document editors and viewers to define anchors and follow links
• No distinction between authors and users
Still didn’t scale
20 January 2009 Kaiser: COMS E6125 30
• A big step backwards?
– Embedded links (markup), unidirectional, untyped, not application-independent, etc.
– Readers cannot easily be authors, no private or group annotations and links over read-only (to readers) documents
• But it scaled, perhaps because it indeed allowed dangling links (a hypertext no-no)
• And attracted authors and users like no other hypertext system before or since
20 January 2009 Kaiser: COMS E6125 31
• By Tim Berners-Lee , then a Physicist at CERN
(Swiss National Physics Lab)
• TBL had earlier (~1980) developed another hypertext system for CERN, Enquire, which was little-used and eventually “lost” ( manual still available )
• Proposal written March 1989, more widely circulated May 1990
• Originally called “Mesh” or “Information Mesh”
20 January 2009 Kaiser: COMS E6125 32
• Persuaded CERN management to fund development of a “global” hypertext system
• Goal to manage information about accelerators and physics experiments as projects evolved and staff turned over
• Development started October 1990, by TBL,
Robert Cailliau and a couple visiting students, now called “World Wide Web”
20 January 2009 Kaiser: COMS E6125 33
• CERN involved several thousand people, with very high turnover, organized into a multiply connected "web" whose interconnections evolve
• Information about what physics experiment facilities (including software) existed and how to find out about them traveled informally
• Much information never recorded, or too hard or time-consuming to find
20 January 2009 Kaiser: COMS E6125 34
• Where is this module used?
• Who wrote this code? Where does he/she work?
• What documents exist about that concept?
• Which laboratories are included in that project?
• Which systems depend on this device?
• What documents refer to this one?
20 January 2009 Kaiser: COMS E6125 35
“CERN is a model in miniature of the rest of world in a few years time.
CERN meets now some problems which the rest of the world will have to face soon. In 10 years, there may be many commercial solutions to the problems above, while today we need something to allow us to continue.”
20 January 2009 Kaiser: COMS E6125 36
• Pool of information that could grow and evolve with the organization and the projects it describes
• "web" of nodes with links between them is far more useful than a fixed hierarchical system
20 January 2009 Kaiser: COMS E6125 37
• People
• Groups of people
• Software modules
• Projects
• Concepts
• Documents
• Types of hardware
• Specific hardware objects
20 January 2009 Kaiser: COMS E6125 38
• A depends on B
• A is part of B
• A made B
• A refers to B
• A uses B
• A is an example of B
20 January 2009 Kaiser: COMS E6125 39
System must allow any sort of information to be entered
Another user must be able to find the information, sometimes without knowing what he/she is looking for o System should be aware of the generic types of the links between items (e.g., dependencies), and the types of nodes
(people, things, documents…) without imposing any limitations
20 January 2009 Kaiser: COMS E6125 40
Remote access across networks
Platform heterogeneity
Non-Centralization - allow existing systems to be linked together without requiring any central control or coordination
Access to existing data and databases in hypertext form o Private links - one must be able to add one's own private links to and from public information, and also annotate links as well as nodes privately
20 January 2009 Kaiser: COMS E6125 41
“Storage of ASCII text, and display on 24x80 screens, is in the short term sufficient, and essential.
Addition of graphics would be an optional extra with very much less penetration for the moment.”
20 January 2009 Kaiser: COMS E6125 42
o Search for anomalies such as undocumented software or divisions which contain no people
Generate lists of people or devices for other purposes, such as mailing lists of people to be informed of changes o Look at the topology of an organization or a project, and draw conclusions about how it should be managed, and how it could evolve
20 January 2009 Kaiser: COMS E6125 43
“Imagine making a large three-dimensional model, with people represented by little spheres, and strings between people who have something in common at work. Now imagine picking up the structure and shaking it, until you make some sense of the tangle: perhaps, you see tightly knit groups in some places, and in some places weak areas of communication spanned by only a few people. Perhaps a linked information system will allow us to see the real structure of the organisation in which we work.”
20 January 2009 Kaiser: COMS E6125 44
Allow documents to be linked into "live" data so that every time the link is followed, the information is retrieved
The data to which a link (or a hot spot) refers may be very static, or it may be temporary
[If one sacrifices portability], make following a link fire up a special application, so that diagnostic programs, for example, could be linked directly into the maintenance guide
20 January 2009 Kaiser: COMS E6125 45
“Discussions on Hypertext have sometimes tackled the problem of copyright enforcement and data security. These are of secondary importance at CERN, where information exchange is still more important than secrecy.
Authorisation and accounting systems for hypertext could conceivably be designed which are very sophisticated, but they are not proposed here. In cases where reference must be made to data which is in fact protected, existing file protection systems should be sufficient.”
20 January 2009 Kaiser: COMS E6125 46
Development Project Documentation
Document Retrieval
Personal Skills Inventory
20 January 2009 Kaiser: COMS E6125 47
20 January 2009 Kaiser: COMS E6125 48
20 January 2009 Kaiser: COMS E6125 49
20 January 2009 Kaiser: COMS E6125 50
• Originally combination browser and editor only on NeXT cubes
• Later line-mode browser, GUI browsers for X and Mac
• First web server was nxoc01.cern.ch, later called info.cern.ch
20 January 2009 Kaiser: COMS E6125 51
20 January 2009 Kaiser: COMS E6125 52
• Line-mode browser released for use outside CERN in August 1991
• Submission to 1991 ACM Hypertext conference rejected
• Various GUI browsers released in 1992
• Mosaic released by NCSA in September
1993, developed by undergraduate Marc
Andreesen (who later founded Netscape )
20 January 2009 Kaiser: COMS E6125 53
• Internet != Web
• 1962 military packet switching network invented (on paper)
• 1969 ARPANET comes on line with 4 nodes
• 1976-1983 UUCP, BITNET, CSNET, etc.
• 1985 Merged Internet with 2k nodes
• 1988 56k nodes, 1992 1.1G nodes, 1996
15G nodes, …
20 January 2009 Kaiser: COMS E6125 55
• Lots of anonymous ftp resources available by mid 1970s, but had to know where to look
• 1989 McGill’s Archie (ARCHIvE) finds files by name using regular expressions
• 1990 Thinking Machine’s WAIS (Wide Area
Information Servers) adds content indexing
• 1991 U. Minn.’s Gopher (“go for” and school mascot) adds friendly menu-based UI, augmented by U. Nevada’s VERONICA spider indexing
20 January 2009 Kaiser: COMS E6125 56
• University of Minnesota announced that they would begin to charge licensing fees for Gopher's use in February 1993
• US government’s Acceptable Use Policy previously prohibiting commercial use of the Internet “re-interpreted” in March
1993
• CERN's directors announce in April 1993 that WWW technology would be freely usable by anyone, with no fees payable to
CERN
20 January 2009 Kaiser: COMS E6125 57
• Left CERN in 1994 for MIT to become the
Director of the new World Wide Web
Consortium ( W3C )
• Technically a research staff member, not an MIT professor (only has a BA, in
Physics)
• Knighted by Queen Elizabeth II in 2004, numerous other honors and awards
• Never got rich…
20 January 2009 Kaiser: COMS E6125 58
• Many people trace the Web’s origins to Vannevar
Bush, although there were other early attempts to introduce something hypertext-like over microfiche and/or paper documents
• TBL’s World Wide Web “succeeded” whereas numerous earlier and contemporary hypertext systems “failed” because it was simple and
scalable without trying to be perfect
• Many fancy ideas from other hypertext work are being re-introduced on top of Web (Web 2.0)
20 January 2009 Kaiser: COMS E6125 59
• Paper proposal due February 2 nd
• Project proposal due March 23 rd
• Paper must be individual, projects may optionally be done in teams
20 January 2009 Kaiser: COMS E6125 60
Prof. Gail Kaiser
Spring 2008
20 January 2009 Kaiser: COMS E6125 61