MyLifeBits Jim Gemmell February, 2005 Conclusion We have entered an era of virtually unlimited storage, enabling the lifetime store To make the store useful we need annotation, typed links, and database features More capture, more correlation – less work by the user Collaborators Chief inspiration & guinea pig: Gordon Bell Software development lead: Roger Lueder MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers Interns: Josh Blumenstock, Evan Salomon, Aleks Aris Outline What is MyLifeBits History/Motivation MyLifeBits system outline Demo Future work MyLifeBits is: An experiment in lifetime storage Digitizing Gordon Bell’s past Capturing more of his future A software system Capture Storage & retrieval Organization & annotation Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks I am data The guinea pig Has now scanned virtually all: Books written (and read when possible) Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) Photos Posters, paintings, photo of things (artifacts, …medals, plaques) Home movies and videos CD collection And, of course, all PC files Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Paperless throughout 2002. 12” scanned, 12’ discarded. Only 44 GB, incl. 10 wma, 14 SQL!!! Video: o(100) + 500 mov The 1 TB Life 1TB gives you 65+ years of: 100 email messages a day (5KB each) 100 web pages day (50KB each) 5 scanned pages a day (100KB each) 1 book every 10 days (1 MB each) 10 photos per day (400 KB JPEG each) 8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) 1 new music CD every 10 days (45 min each at 128 Kb/s) It will take you 5 years to fill up your 80 GB drive Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video) Trying to fill a terabyte in a year Gordon’s lifetime collection < 30 GB (12 GB is music CDs) Item Per TB Per day Photo (400 KB JPEG) 2.7M photos 7.3K photos 1 MB document 1.0M docs 2.9K docs 128 kb/s audio 18.6K hours 51 hours 256 kb/s video 9.3K hours 26 hours 1.5 Mb/s video 1.6K hours 4 hours “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely” -Vannevar Bush, 1945 So you’ve got it – now what do you do with it? Can you find anything? Can you organize that many objects? Once you find it will you know what it is? Once you’ve found it once, could you find it again? “A record if it is to be useful … must be continuously extended, it must be stored, and above all it must be consulted” “The difficulty seems to be, not so much that we publish unduly … but rather that publication has been extended far beyond our present ability to make real use of the record” - Vannevar Bush MyLifeBits Software Import files GPS import & Map display SenseCam VIBE logging MyLifeBits Shell Text annotation tool Voice annotation tool Screen saver MyLifeBits store Radio capture & EPG Internet Browser tool Legacy applications database IM capture MAPI interface files PocketPC transfer tool Outlook interface TV capture tool PocketRadio player Telephone capture tool TV EPG download tool Legacy email client Entities & Links Photo of Event Caller in Phone Call Annotates Transcludes MyLifeBits Schema (simplified) Relationship types Images Music Event types Event log Phone calls Relationships Events Resources Tasks People Notes Email Messages Saved searches Resource entities Entity types DEMO Future work: new capture modes/devices Deja View SenseCam Quindi Body Media Future work: Visualizations Don't give me a little card image and say, "That's all you've got, because that's what I thought you should want for your virtual shoebox." There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. -Andy van Dam, Hypertext '87 Keynote Address Web Scout U. Maryland IN-SPIRE Next Media Future work: UI UI Improvements User studies Future work: Content analysis & Data Mining “Creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids” – Vannevar Bush Is MyLifeBits just enough rope to hang yourself with? MyLifeBits must become MyPersonalAssistant Content analysis and data mining Doc similarity & “clean living” Document meta-data extraction Future work: scaling Just starting to hit performance problems Stress tests & design modifications www.MyLifeBits.com http://research.microsoft.com/CARPE2004 BONUS SLIDES Everything goes in a database You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) If you don’t use one, you will find yourself creating one! Files as blobs, also sync with file system for legacy apps SQL CARPE ’04 The First ACM Workshop on Continuous Archival & Retrieval of Personal Experiences October 15th 2004 Columbia University, New York, NY, USA Dear Appy, How committed are you? Signed, Lost and Forgotten Data By Gordon Bell http://research.microsoft.com/~gbell Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support. But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric... A Storocratic Oath Do no harm to dates (File creation, Photo taken) Do no harm to device created & other meta-data. 1. 2. • Support & aid the creation of critical metadata. 3. • • 4. Camera data & location data are sacred. When/how the user feels like it Auto-magically! Maintain user confidentiality Classification wish list Download classifications rather than build them Definitions & synonyms should help find what I want Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! Next year I hope “the system” can classify papers and other documents e.g. bills In 10 years I expect all documents to appear electronically & classified with a little help from me Personal Search is not Professional or Web search System sees every entry & access Everything, not just a professional life Limited to SIS, not an infinite amount, covers a profession & personal life MyLifeBits Professional user Depth e.g. information item types & coverage Web as seen by search engines Knowledge breadth e.g. Dewey classification