Seven Years of Online Video: The Internet Archive Experience Rick Prelinger Cardiff, 20 July 2007 A brief timeline 1983-2002: Prelinger Archives engages in licensing 1999: the challenge; research into encoding; big, topheavy project; no scoping 2000: telecine; MPEG-2 encoding; uploading; hammerand-chisel website; 260 objects 2001: Slashdotted; ≈500,000 downloads, 1001 objects 2002: slump in footage sales; first million downloads; other moving image collections created; first large CC collection 2003: 1965 objects; switch to digitizing contractor; ≈2 million PA downloads; two-tier system successful 2004: 4 million PA downloads; many other collections up 2005: ≈200,000 video objects; advent of YouTube 2007: staff of 4; ≈270,000 video objects; 7 million PA downloads; PA sales increase ≈110%; Flash 9 introduced Major collections Moving images: 77,887 itemsLive Music Archive: 41,384 concertsAudio: 160,391 recordingsTexts: 229,703 textsOurmedia: 191,386 items (mostly moving images)Web: 85 billion pages TV: approx. 120 channel/years recorded 24x7x365 by Television Archive, an independent nonprofit organization Figures as of 17 July 2007 Subcollections Animation & Cartoons (951 items)Arts & Music (635 items)Computers & Technology (1,353 items) Education (1,269 items)Ephemeral Films (282 items)Movies (2,140 items)News & Public Affairs (5,528 items)Non-English Videos (159 items)Open Source Movies (49,865 items)Prelinger Archives (1,987 items)Sports Videos (362 items)Video Games (3,498 items)Vlogs (1,840 items)Youth Media (446 items) Formats MPEG-2 (typically 720x480 or 360x480, 2.5-3.5 Mbps) MPEG-1 MPEG-4 (64 kb) MPEG-4 (256 kb) MPEG-4 (“hi-res”) Flash RealMedia (256 kb) RealMedia (64 kb) QuickTime (various bitrates) Windows Media (various bitrates) DV HD? Workflow (grossly simplified) User upload (or IA upload) Derivation Curation Publication Moderation Annotation Reuse Pluses Longterm nonprofit presence in highly commercialized field Non-revenue (ad-free) model Massive downloadable offerings Large Creative Commons-licensed collection Uncountable derivative works Supportive of deeplinking Promise of storage forever Frugal operation Stimulated other access projects Loyal user base and growing body of annotation Self-formed, relatively small social networks Many have found ways to use IA for their own purposes Very DIY Designed by geeks Minuses People deserve downloads, but get confused Growth has slowed because of YouTube and others QoS difficult and expensive to maintain Sustainability untested Longterm digital preservation strategy only just emerging Reactive more than proactive Understaffed No affordable solution for digitizing direct from film No editing tools (Swiss Army Knife) Segmentation unsupported, though working on it The portal problem (fan size fits all) Embryonic community features Recent growth in downloads of uncertain provenance Very DIY Designed by geeks In less than one year, YouTube built an easy-to-access online collection (≈12 million videos) that I'd argue has become the world's default media archives. Everything anyone does to bring archives online will now be measured against YouTube's ambiguous legacy. It offers a sense of completeness: a massive collection of old and new video, from video of Malcolm X's complete speeches to clips of the moose I saw wandering in people's front yards in Anchorage. It sticks to preview mode, presenting visually degraded Flash video, so users feel no transgression. It’s being sued right and left, but most rightsholders will rightfully regard what it does as promotion. Best of all, it allows users to upload almost anything, annotate with relative freedom and network with one another. rick@archive.org