Systems to Capture Everything: Beyond cameras and desktops www.MyLifeBits.com Gordon Bell, Jim Gemmell, Roger Lueder Outline MyLifeBits aka Memex How has the project evolved? How do we use MyLifeBits? How is it built? Shape of the database? CARPE- Continuous archiving and recording of personal experience What is the vision? Relevance for devices and software? I am data History: Telepresence Tele-presentations Tele-meetings Ambience and Presence: Being there while being here Dining at home on the “Orient Express” History: The remote worker rediscovers the PERSONAL computer Oct 1998 Can we scan your books and put them online? Raj Reddy Sure! Don’t worry about copyright stuff. Microsoft has lots of lawyers 1999 – Scanning starts in earnest “we” start to scan, put content into folders & files My docs and archive Library/file cab X- Employer Active Employer Library/file cab Employer Self .. .. Biographical Project Employer Project Project Employer Business Invests, family $s, & Legal Library/file cab Library/file cab Library/file cab Library/file cab X-Employer Library/file cab Library/file cab Library/file cab Library/file cab Library/file cab <1980s Library/file cab Project Project Personal, including Medical Now that it’s in Cyberspace How do you remember the 20,000+ file names? Or in which of 1500 folders they live? What’s about a tool for finding stuff? Jan 2001 CACM “A Personal Digital Store” 16 GB; +2/yr A good place to stop Began search for search engines, especially for email. Jim suggests that we build a system that would be easier to use and have many more capabilities. 2001 Capture goes beyond paper Gordon, You should be using a database. Jim, I don’t need no stinkin’ database! Re-discovery of Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks Even more capture Telephone calls, more video, all web pages visited, keyboard and mouse usage logging, radio, TV… 2003 - SenseCam Feb 2005 Epiphany! Memex is a database & personal TP system Demo Clips & Screens 747 Screen… Vue de jour Timeline Pivoting: contact> call> t> web page GPS Photo location Reports The Stew family tree Copyright Mark Stewart, 2004 Vibe report Quindi Meeting Capture SenseCam SenseCam around Cambridge MyLifeBits Software Everything goes in a database MyLIfeBits need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, Replication) If we didn’t use one, we’ll eventually create one! Files as blobs; sync with file system for legacy apps We are part of Jim Gray’s Bay Area Research Lab SQL MyLifeBits Software Room Capture GPS import & Map display SenseCam Import files VIBE logging MyLifeBits Shell Text annotation tool Voice annotation tool Screen saver MyLifeBits store Radio capture & EPG Internet Browser tool Legacy applications database IM capture MAPI interface files PocketPC transfer tool Outlook interface TV capture tool PocketRadio player Telephone capture tool TV EPG download tool Legacy email client Common ground with WinFS: Items, Links & Meta-data Outlook_CalendarItems2 PK,FK1 item_id Subject Start End Description Location Creation Time Modified Photo of Event TAPI_PhoneCalls2 PK,FK1,I1 I1 Caller in Phone Call IMG_Images2 PK,FK1,I1,I2,I3 Annotates I1 I2 I3 item_id Width Height Date Taken Camera Make Camera Model Latitude Longitude Elevation item_id Phone Call Type CID CID Name CID # Begin End Seconds Connected Ended Roaming Trimmed Recorded Transcript PhotoFinder - Shneiderman and Kang The Shape & Size of Gordon’s LifeBits .PPT, 1815 Tiff, 2832 .pdf, 3527 .xls, 1455 Video, 1303 MyLifeBits 10/31/2005 Audio, 5083 Doc&Rtf, 13764 Pictures, 43812 Web pages, 70918 eMail, 97271 242K items 110 GB by number of Items. eMail, 343 Doc&Rtf, 1198 mny, 134 NULL, 127 PPT, 4637 MyLifeBits 10/31/05 PDF 5027 Web pages 5791 Tiff 8078 Size (MB) by Type Pictures, 8998 Video, 62735 Audio, 12502 110 GB 242 K items By Size (GB) Bell Growth: 1GB/month =1.1 TB/lifetime 10000 15,000 photos 1000 100 Year 1997 1999 2001 2002 2003 2005 Mpix .25 1 2 3 4 5 Manufacturer Ricoh Kodak Canon Sony Sony Panasonic 10 1 1895 1905 1915 1925 1935 1945 1955 1965 1975 1985 1995 2005 Monthly & Lifetime Storage Use Item 1 MB Books|reports 5KB Emails 0.1 100 Total* MB|GB Month|Life 3 13 100 KB Image scans 0.4 MB Photos 75 KB Web pages|docs 100 MB Music 5 10 100 0.1 13 100 188 250 40,000 1,000 1,000 4 1,250 200,000 1 KB/s Listened audio, speech 50 KB Daily photos 2 GB/hr TV Daily number Observations about use(rs) 1. 2. 3. 4. 5. 6. 7. Cell phone sized device (CPSD) will be the platform! On Applications… think about CPSD as the platform and context Search is the “killer app” pretty much as Bush described. Screen savers “memory refreshers” also provide ambience Where did my day to? Users are unwilling to spend time managing their computers or data. Meta-data, classification, etc. must be automatic User-input meta-data e.g. Dublin Core – naïve’ Librarian’s dream. We have nice scheme for classification using facets. It requires work. Time is the most important meta-data. Photos: place (GPS), subject. Folders are a good and bad idea. Most users don’t know what they are or how they work If used, over time, they become useless: too many, miss-file, etc. User should put “every” information fragment into the system. e.g., to dos, call backs, business cards numbers, attention events. It pays. Same information in multiple places always becomes obsolete. Capturing Everything: Phone calls in context of cell phone as a platform for communication and capture Formal Meetings Rooms Everything in daily life Personal health and medical monitoring Memex for scientists and engineers BodyMedia Output Polysomnogram for sleep apnea. Real time health monitoring Microsoft Research SensCam II Sensors: VGA camera w/ wide-angle lens light level in R,G,B and white ambient temperature passive infrared for person detection accelerometers three, programmable buttons, LEDs, sounder audio level & audio recording USB 2 and SD memory. 1-2 K photos/day Not GPS SenseCam University Grant Program MSFT supplies money, software, SenseCams Memex vision: Notebook for engineers & scientists Medical & health: observations & memory recall, including diet and exercise Education: How do people learn? Help me learn/remember! Tourist e.g. museum experience Plumbing Security Filtering many images, voice & location annotation More real time experience capture Real time medical & health monitoring MIT. Deb Roy home capture to understan how his children learn U. of Tokyo. Ubiquitous home Columbia U. Voice & sound record & profile MIT. iDat. Electronic lab that records everything into your notebook Experience Retrieval in a Ubiquitious Home Experience Retrieval in a Ubiquitous Home (chamds, byon, yamasaki, aizawa)@hal.k.u-tokyo.ac.jp MIT iDAT Project aka notebook Samsung challenge Going beyond plain old photography and videography Print, view, and file in scrapbook or shoebox Digitized bits offers worldwide sharing and easy sharing Screensaver is useful, but is it a killer app? The cell phone sized device (CPSD)… one device Next generation platform Phones and messaging e.g. sms, mail, web, iM, blogging Audio, photo, video record and viewing (incl. broadcast) Within 5 years and with supplemental devices, will take on the PC Capture, storage, retrieval, and display Challenge putting them together Capture …. Storage Cell phone sized devices (CPSD). The “killer app”!! Consumer… photo, video, audio… experience Professional Capture Archival Retrieval = f(use). Archive… ambience Display Personal: Cell phone PC Wall www.MyLifeBits.com BONUS SLIDES Challenges Data-types Quantity expanding i.e. info explosion New capabilities e.g. real time create new data-types Meta-data to increase value & provide pivots Going beyond a PC to a distributed environment Network environment, including media center Into the cloud. Especially important for social aspects Periphery… smart buildings, objects, Backup, migration, and caching for beyond a Terabyte Expanding network: PC > LANs > web > p2p(eer) Schema sharing among disparate systems CARPE (real time data capture) Rooms, phone calls, SenseCam, Health transducers, etc. Security, privacy, forgetfulness, deniability, etc. More challenges Dear Appy: Monitoring and automatic migration of files that are unlikely to be understood on future platforms as well as platform migration. Get What I Need: GWIN…Endless, but evolutionary improvements in search: misspellings, stemming synonyms Endless frontier of schema and extensions to them for new applications e.g. making org charts, family relationships. CARPE… a whole new game! Versioning is essential Scaling.. We don’t know what happens at a Terabyte What can, should be, or will be in the cloud? Books… videos Will we be allowed to use such systems? Copyright laws vary: E.g. ripping CDs, copy of anything, photos, conversations The “dear appy” problem Dear Appy, How committed are you? Please come back to me. Forever yours truly, Lost and forgotten data Who’s responsible? Media or 8 track cassette, 8” floppy Evolving platform, file, and database Evolving, incompatible standards & formats for legacy data that disregard ancestors Evolving and/or disappearing apps Is Cyberspace a safe store? Don’t your physical records e.g. paper last forever? What about information on your CDs, tapes, hard drives, solid state devices? Automatic classification problem XML on bills and imported content… transactions We need to download classifications rather than build them Definitions & synonyms should help find what I want Today it is too expensive to manually classify scanned paper. E.g. “right time” meta-data is critical! We hope “the system” can classify papers and other documents e.g. bills. Ideally, build Dublin Core In 10 years we need all documents to appear electronically & classified with a little help from me