MyLifeBits: Realizing the Memex Vision Santa Clara University 13 May 2004 Gordon Bell, Jim Gemmell & Roger Lueder www.MyLifeBits.com www.research.microsoft.com/~gbell 1 Mylifebits collage 2 Outline … MyLifeBits Background…fulfilling the Memex vision Cyberizing everything File to database transition Use…beyond search Working with Media Center for home use Long-term agenda and outlook Archiving persons and things. 3 Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks 4 Capturing what you see 5 I am data 6 The guinea pig Gordon Bell is digitizing his life Has now scanned virtually all: Books written (and read when possible) Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) Photos Posters, paintings, photo of things (artifacts, …medals, plaques) Home movies and videos CD collection And, of course, all PC files Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Paperless throughout 2002. 12” scanned, 12’ discarded. Only 30 GB!!! 7 Capture and encoding 8 Quindi conference capture 9 I mean everything 10 Wearable & interactive jewellery LEDs flash according to sensor type triggered 11 Potentially useful trivia – but not normally photographed 12 GPS: tells where and when 13 Kentaro Toyama wwmx.org 14 gbell wag: 67 yr, 25Kday life 1,000,000 100,000 10,000 1,000 100 10 1 100 5KB Msgs 100 50.1 10 40Ks 0.1 150KB 100KB 1MB 400KB 1KBps 100MB 10GB pages Tifs Books jpegs sound songs Videos 15 Lifetime storage (GB) MyLifeBits organization: time and space Timeline/ Context (space) Archival (time) Working Personal (some $s) GB Co. (angel, etc.) Professional ACM, etc., … @Microsoft.com, New co’s. 16 MyLifeBits: Some Lives(t) Personal Parents, children, grandkids CGB himself GKB Close friends GB $s Personal incl. several legal structures Properties: autos, real estate, Investments & contracts Past prof. companies/organiz’ns DEC Carnegie-Mellon U. DEC, NSF, Encore, Ardent, Me Inc., CGB@ Microsoft MLB Clusters Telepresence WWW presence Computer History Museum BOD member Fund-raising CyberMuseum Startups & boards Bell-Mason Director Diamond & Vanguard Brds. 17 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 C,L m CGB... Where d GB SR KvMO mB,L d KF SB B ABosP B WCa 6-year --GS-HS---MIT DEC---+++++.+++---++++ Education KV-----mit,F cmu Work Bell Elec DECcmuDEC ComputerMuseum Books Computers E,NSF MSFT M B BN SBN SiValley HiTechVent 4-6 11 VAX E A Bell Lives timeline 18 Personal LifeLog Applications Self Diary/Journal Tutor Mentor Advisor Others Application used by: Babysitter Financial Manager Medical Manager Companion Caretaker Parole Officer Assistant for Elderly Pers Flight Recorder Meeting Prep Personal Assistant Photo Album Autobiography Captain’s Log Conservator Biography Baby Book Trustee Obituary Executor Others Application controlled by: Personal Proxy Self 19 MyLifeBits Software Radio capture tool TV capture tool Telephone capture tool MyLifeBits store Internet TV EPG download tool database Browser tool MyLifeBits Shell PocketPC transfer tool PocketRadio player Radio EPG tool MAPI interface Legacy email client files Legacy applications IM capture Voice annotation tool Text annotation tool Import files 20 MyLifeBits is: Memex and more (audio and video) Universal store for all personal stuff Guiding principles for the system: 1. Full text search & collections (> than hierarchy) 2. Visualizations for search, display, insight 3. Annotations and links add value and essential 4. Increase search ability and value of information. So make many kinds and them easy to create! Stories are the ultimate annotation Keep the links when you author: “transclusion” 21 MLB database: size and content? Database features are essential: Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication. Folders &Files were the starting point >> database into sets aka “collections” that are identical to the folder structure Outlook (msgs, attachments, calendar, contacts) Web trails including voice message annotation Journal (Outlook), trails: every document use & transaction What about? Money (transactions, payees, etc.)…is their lifelog/trail Streets and trips to cross-index to all docs Attributes for photos for retrieval? Location, time, settings Presentations as a report or trail. Each slide an object! 22 Why bother? An existence proof. The following exist in abundance: Shoeboxes full of photos Photo albums & framed photos Creative Memories is a thriving business selling resources for created high-end photo albums that are well laid out and highly annotated, using long-lasting materials. Home videos Bookshelves and filing cabinets Old bundles of letters Professional video/photo companies do capture at kids’ sports events and sell content like hotcakes Probably not accessed very often but TREASURED (what’s the one thing you would save in a fire?) 23 Why bother? ..more reasons To eliminate physical storage (paper, CDs…) It costs more (in time) to delete than the cost the storage You may only want to retrieve one of many items in the future, but cannot predict which one (which is why you file many things now) For posterity and nostalgia For memory enhancement & faster search (search your LifeBits rather than the web … a single source to look for anything you have ever seen) Let content analysis and data mining discover trends and correlations in your life 24 Extensible XML schemas Logical views Programmatic relationships Synchronization service Information agents people application specific data user application specific data infrastructure application specific data system application specific data application specific data Annotation like this… Voice Annotation 26 Pivot to look at all of MLB(t) Call, contact, pivot by time to find web page 28 Find brig, image, and look for 80 29 Here are the photos 30 Timeline view tells a story 31 Interface to xls 32 Statistics of use 33 Value of media depends on annotations “Its just bits until it is annotated” 35 Getting the user to tell a story is the ultimate in media value A story is a “layout” in time and space Most valuable content (by selection, and by being well annotated) Stories must include links to any media they use (for future navigation/search – “transclusion”). Cf: MovieMaker; Creative Memories PhotoAlbums Dapeng was an intern at BARC for the summer of 2000 We took him to lunch at our favorite Dim Sum place to say farewell At table L-R: Dapeng, Gordon, Tom, Jim, Don, Vicky, Patrick, Jim 36 Value of media depends on annotations “Its just bits until it is annotated” user-story user-basic auto-usage auto none Auto-annotate whenever possible e.g. GPS cameras Make manual annotation as easy as possible. XP photo capture, voice, photos with voice, etc Support gang annotation Make stories easy Annotations 37 Future work: Visualizations Don't give me a little card image and say, "That's all you've got, because that's what I thought you should want for your virtual shoebox." There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. Web Scout U. Maryland IN-SPIRE Next Media -Andy van Dam, Hypertext '87 Keynote Address 38 LifeLines (Plaisant et al.) www.cs.umd.edu/hcil/lifelines 39 University of Maryland Rethinking collections & files Date collections (“summer 99”) By Person (“Photos of Bill”) Better as links of type “photo of” to person “Bill” By Event (“Trip to UCLA”) Much better as a query Better as links to event in calendar Working set Better as query that figures it out for me so I don’t need to maintain it 40 Facets and people • • • • Time (& stage of life). Events… Location (lat/long vs home, vacation) Institution (relations including family, work, clubs,…) Role (student, professional, parent, owner, etc.) • Content type – Audio, graphics, photo, video aka moving picture – Document t type o(200) plus profession specific ad, bill…will, cards (calling, credit, grade, greeting), certificate (birth…death), correspondence, diary, essay, forms, legal (6), instructions, lists, resume, reservation, scrapbook, transcript, • Dissemination – Book, electronic, serial, unpublished, • Special collections (e.g. geology, stamps, species, places) 41 Facet Lists 42 Certificate facets 43 “By region” and “by time” should be facets! 44 Telephone, Television, and Radio in the Home of the Future 45 Evolution of media in the home Today: Yesterday: Analog storage and transmission on separate networks Physical space limitations Tedious management and manual search Digital storage (CDs, DVDs, PVRs, MPEG & WMA/V) Digital cable, internet radio, but phone is mostly analog Still limitations on what we can store Different stores for different stuff Tomorrow: All digital Everything connected Unlimited storage Everything in a database SQL 46 stereo Wfr L Spkr stereo CD 5 speakers Legacy Spkr IR LVCR egacy stereo Video* 5.1 digital Redundant DVD comp. Receiver Cassette egacy Set top Cable/ Satellite Ethernet Camera Mic stereo Video* Set top Media Center Computer Kbd Mse 5.1 digital SVHS-wide Cables/links Speaker 5+1 Plasma 2 or 3 Cable/Enet 2 IR 8 Stereo 4 5.1 digital 2 Comp./S-video 3 Plasma panel 1 Power 10 Kbd/mse 2 Monitor II (opt.) 4 Camera 2 Total 42 – 46 Things 18+remote Video* Plasma Panel *Video = composite or S-video 47 48 The Agenda for the Tbyte(s), Lifetime, PC: The killer app after office and mail. 1. 2. Guarantee that data will live forever! “dear appy” problem Cheap, easy, and data-rich (e.g. time, place) capture: GPS and time everywhere Paper capture has to be as easy as discarding (scanner/shredder) Personal meeting capture... E-book…e-magazines & journals need to have critical mass! Telephony and audio capture with indexing Media Center compatible for entertainment (photos, video, TV, radio) 3. 4. 5. 6. 7. 8. 9. Content analysis (critical for photo & video!) Information control: privacy, security, expunge/deniability,… Having to be schizophrenic or have a lobotomy when leaving a “life” One dbase for everything (articles, books, conversations, ... financial transactions) …vs. long-term use of hierarchical files. Is dbase intuitive? Annotations/meta-information add every-increasing value Easy annotation for aiding search and it becomes the content The “killer apps”: Alzheimer, immortality, surrogate memory? GUI’s to improve use (e.g. time to learn, use, retention) 50 The “dear appy” problem Dear Appy, How committed are you? Please come back to me, Lost and forgotten data Who’s responsible? media platform, file, and databases evolving standards and formats evolving and/or disappearing apps 51 Problems: “Amnesia” control & deleting corporate “life” bits Full sharing of bits that are mine I created them, OK to copy and distribute DRM: purchased for my own use “OK to look at, but I only own half the bits” Controlling forgetfulness Private, do not “demo” Expunge forever... “this never happened” The bits “belong” to a corporation or org. 52 The Content Analysis Problem 1. 2. 3. 4. “Cliplets”: Automatic segmentation of a pile of documents and video into individual documents and scenes. Item typing: Would like a minimal Dublin Core for each item: date, creator, title, source, abstract, and type “Type” classification: articles, letters, memos, etc. Ontology creation for collections 53 The End 54 Archiving persons and things… • www.oac.cdlib.org for 0(1K) corporations, people, places, things. – List of finders, usually -> paper boxes! – E.g. Apple collection at Stanford points to 600’ or say $1K/ft. • www.AlbertEinstein.org Einstein’s papers, etc. • diva.library.cmu.edu/Newell/ for Allen Newell • profiles.nlm.nih.gov/ Nobel Prize winners, Lederberg • www.ComputerHistory.org computing artifacts • www.MyLifeBits.com project to capture entire life 55 List of finding aids 56 Apple at Stanford 57 www.alberteinstein.info 58 Allen Newell page 59 Lederberg 60 Computer History Museum • 1401 Shoreline, Mountain View 61 Archiving computing artifacts • Charles Babbage Institute …Smithsonian is similar – 135 collections 8K cu.ft. (20 M pages; 2 TB) – 160 oral histories (30MB/hr =6000 MB) – 150 K photos (@1MB, 150 GB) • Computer history Museum – – – – – – 6 K physical objects: world’s best artifact collection 10 K photos 2 K videos (<1 TB); including recent DV taped interviews 12 M pages books, manuals, brochures, papers, (1.2 TB) ?? Of executable source & object codes 200 volunteers & many more world-wide Amateurs versus professionals. 62 Computer History Museum Artifact Collecting… the world is bits • Artifact (“the machine”) – Dormant or operating – Hardware or software • Project, people, plan – – – – – – – – Timeline of project Plan, schedule Specification, manuals Design Organization Communication Articles, books Interviews, talks, etc. • Business aspects – Plan, sales, marketing – Ads, brochures, etc. – Competitors • Use – User experience – Video about it’s use • Accessibility – Raw bits, finding aid – Interpreted story – Exhibit 63 ChM Software Acquisition 64