PowerPoint - Microsoft Research

advertisement
MyLifeBits
Jim Gemmell
February, 2005
Conclusion
We have entered an era of virtually
unlimited storage, enabling the lifetime
store
 To make the store useful we need
annotation, typed links, and database
features
 More capture, more correlation – less work
by the user

Collaborators


Chief inspiration & guinea pig: Gordon Bell
Software development lead: Roger Lueder

MSR Collaborators: Lyndsay Williams, Ken Wood,
Kentaro Toyama, Ron Logan, Steve Drucker, Curtis
Wong, Mary Czerwinski, Brian Meyers

Interns: Josh Blumenstock, Evan Salomon, Aleks Aris
Outline





What is MyLifeBits
History/Motivation
MyLifeBits system outline
Demo
Future work
MyLifeBits is:

An experiment in lifetime storage
Digitizing Gordon Bell’s past
 Capturing more of his future


A software system
Capture
 Storage & retrieval
 Organization & annotation


Minimum requirement: fulfill Vannevar
Bush’s 1945 “Memex” vision
Memex
As We May Think, Vannevar Bush, 1945
“A memex is a device in which an individual stores all
his books, records, and communications, and which
is mechanized so that it may be consulted with
exceeding speed and flexibility”
 Full-text search, text & audio annotations, and
hyperlinks
I am data
The guinea pig

Has now scanned virtually all:










Books written (and read when possible)
Personal documents (correspondence including
memos and email, bills, legal documents, papers
written, …)
Photos
Posters, paintings, photo of things (artifacts,
…medals, plaques)
Home movies and videos
CD collection
And, of course, all PC files
Now recording: phone, radio, TV (movies),
web pages… conversations and meetings to
come
Paperless throughout 2002. 12” scanned,
12’ discarded.
Only 44 GB, incl. 10 wma, 14 SQL!!! Video:
o(100) + 500 mov
The 1 TB Life

1TB gives you 65+ years of:









100 email messages a day (5KB each)
100 web pages day (50KB each)
5 scanned pages a day (100KB each)
1 book every 10 days (1 MB each)
10 photos per day (400 KB JPEG each)
8 hours per day of sound - e.g. telephone,
voice annotations, and meeting recordings (8 Kb/s)
1 new music CD every 10 days (45 min each at 128 Kb/s)
It will take you 5 years to fill up your 80 GB drive
Want video? Buy more cheap drives (1 TB/year lets
you record 4 hours/day of 1.5 Mb/s video)
Trying to fill a terabyte in a year

Gordon’s lifetime collection < 30 GB
(12 GB is music CDs)
Item
Per TB
Per day
Photo (400 KB JPEG)
2.7M photos
7.3K photos
1 MB document
1.0M docs
2.9K docs
128 kb/s audio
18.6K hours
51 hours
256 kb/s video
9.3K hours
26 hours
1.5 Mb/s video
1.6K hours
4 hours
“yet if the user inserted 5000
pages of material a day it
would take him hundreds of
years to fill the repository, so
that he can be profligate and
enter material freely”
-Vannevar Bush, 1945
So you’ve got it – now what do you
do with it?
Can you find anything?
 Can you organize that many objects?
 Once you find it will you know what it is?
 Once you’ve found it once, could you find
it again?

“A record if it is to be useful … must be
continuously extended, it must be stored,
and above all it must be consulted”
“The difficulty seems to be, not so much that
we publish unduly … but rather that
publication has been extended far beyond
our present ability to make real use of the
record”
- Vannevar Bush
MyLifeBits Software
Import files
GPS import &
Map display
SenseCam
VIBE
logging
MyLifeBits
Shell
Text
annotation
tool
Voice
annotation
tool
Screen saver
MyLifeBits
store
Radio
capture
& EPG
Internet
Browser
tool
Legacy
applications
database
IM capture
MAPI
interface
files
PocketPC
transfer
tool
Outlook
interface
TV capture
tool
PocketRadio
player
Telephone
capture tool
TV EPG
download tool
Legacy
email client
Entities & Links
Photo of Event
Caller in Phone Call
Annotates
Transcludes
MyLifeBits Schema
(simplified)
Relationship types
Images
Music
Event
types
Event log
Phone calls
Relationships
Events
Resources
Tasks
People
Notes
Email
Messages
Saved searches
Resource
entities
Entity types
DEMO
Future work:
new capture modes/devices
Deja View
SenseCam
Quindi
Body Media
Future work: Visualizations
Don't give me a little card
image and say, "That's all
you've got, because that's
what I thought you should
want for your virtual
shoebox." There have got
to be multiple modalities
and the designers have to
be able to deal with that.
… don't metaphor me in,
don't give me only one
way of looking at things.
-Andy van Dam, Hypertext '87 Keynote
Address
Web Scout
U. Maryland
IN-SPIRE
Next Media
Future work: UI


UI Improvements
User studies
Future work:
Content analysis & Data Mining
“Creative thought and essentially repetitive
thought are very different things. For the
latter there are, and may be, powerful
mechanical aids” – Vannevar Bush





Is MyLifeBits just enough rope to hang yourself with?
MyLifeBits must become MyPersonalAssistant
Content analysis and data mining
Doc similarity & “clean living”
Document meta-data extraction
Future work: scaling
Just starting to hit performance problems
 Stress tests & design modifications

www.MyLifeBits.com
http://research.microsoft.com/CARPE2004
BONUS SLIDES
Everything goes in a database

You need all the features of a database
(Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup,
replication)


If you don’t use one, you will find yourself creating
one!
Files as blobs, also sync with file system for legacy
apps
SQL
CARPE ’04
The First ACM Workshop on
Continuous Archival & Retrieval of Personal
Experiences
October 15th 2004
Columbia University, New York, NY, USA
Dear Appy,
How committed are you?
Signed,
Lost and Forgotten Data
By Gordon Bell
http://research.microsoft.com/~gbell
Dear Appy,
I'm having trouble with long-term commitment -- not on my end,
heaven knows, but from the apps that created me and with whom I
like to associate. Over time, these pesky apps evolve and they
simply don't recognize the data that they once helped create! But,
we data progeny -- and there are lots of us -- feel that as our
creators, these apps should be responsible for eternal support.
But the little problem with recognition isn't the worst of it –
sometimes the apps even disappear altogether. I ask you, is it
expecting too much for 20-something year old data like me to be
interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office,
Quicken, or RealNetworks), or am I just associating with
irresponsible apps?
If things continue on their current path, it seems I will be completely
un-interpretable within 20 to 50 years! My apps will move to other
platforms, or evolve to be more Internet- or Next-Big-Thing-centric...
A Storocratic Oath
Do no harm to dates
(File creation, Photo taken)
Do no harm to device created &
other meta-data.
1.
2.
•
Support & aid the creation of critical metadata.
3.
•
•
4.
Camera data & location data are sacred.
When/how the user feels like it
Auto-magically!
Maintain user confidentiality
Classification wish list





Download classifications rather than build them
Definitions & synonyms should help find what I want
Today it is too expensive to manually classify my
scanned paper. E.g. “right time” meta-data is critical!
Next year I hope “the system” can classify papers
and other documents e.g. bills
In 10 years I expect all documents to appear
electronically & classified with a little help from me
Personal Search is not
Professional or Web search
System sees every entry & access
 Everything, not just a professional life
 Limited to SIS, not an infinite amount,
covers a profession & personal life

MyLifeBits
Professional user
Depth e.g.
information
item types
& coverage
Web as seen by search engines
Knowledge breadth e.g. Dewey classification
Download