Paper authors: Thomas Drugeon Valentine Frey Jérôme Thièvre Matteo Treleani Context Sensitive Archiving of Videos on the Web JTS 2010, 3 May 2010 Ina collections Current collections 60 years of TV program and 70 years of radio program Legal deposit since 1992 4,500,000 hours of TV and radio + 1,000,000 hours captured live from 102 TV and radio channels each year → Preserve, promote, transmit Extension to the Web Web legal deposit law (2006), shared between BnF and Ina, as an extension to their current collections Ina is developing specialized tools and methods to collect, archive, preserve, and give access to this archived web collection Context sensitive archiving on the web| 2 mai 2010 2 Web Legal Deposit Archiving French audiovisual information on the web → Focus on audiovisual contents Why not only archive video and audio contents from the web? The web is not just a way to access contents, it is a media → Archiving websites related to French audiovisual media Operational since February 2009 as of april 2010: 6000 websites (3000 at start) 2,500,000,000 “objects”, 260 TB 10,000,000 video objects, 100 TB 19,000,000 autio objects, 100 TB → 260 TB compressed to only 21 TB of storage (DAFF) Context sensitive archiving on the web| 2 mai 2010 3 Methods The web is not a broadcast media: no stream to capture, no explicit path to follow The web responds to interactions We have to discover and recreate these interactions to archive it → crawling Websites grow and change in heterogeneous ways We have to visit a page to know it was updated → sampling Accessing the archive means browsing it We have to recreate the interactions to make the archive browsable → simulating Context sensitive archiving on the web| 2 mai 2010 4 Limits Crawling Some interactions cannot be crawled, and thus some contents will be missing or altered in the archive (pages or parts of pages) Sampling Some updates will be missing Linked pages are crawled at a different date from the original page Simulating Dead web (train reservation, google search, etc.) Some interactions are lost (crawling issues) Temporal inconsistencies between pages (sampling issues) Context sensitive archiving on the web| 2 mai 2010 5 Web Archaeology The consequence of technical problems: Non-Integrity of web documents Integrity: the document hasn’t been altered (Lynch, 1994) DlWeb archives traces How to preserve authenticity and reliability without depending on material integrity? Authenticity: the document is what it pretends to be (Duranti, 2001) Reliability: we can trust the document and its content (Bachimont, 2009) Reconstructing the meaning of the document through traces (a sort of archaeological practice) Context sensitive archiving on the web| 2 mai 2010 6 Web Archiving: pre-eminence of the meaning Meaning precedes the material form. We thus have to find the elements influencing the meaning. Example Preserving the meaning of a video posted on the web means to preserve the significant elements of the context Context influences the meaning of a video posted on the web But not all the items of the context have the same impact on interpretation. Context sensitive archiving on the web| 2 mai 2010 7 Example: The relocation of The Eiffel Tower Ina.fr posted a news programme from 1964: the Eiffel Tower was to be relocated. The video provoked a buzz on the Web. Context sensitive archiving on the web| 2 mai 2010 8 Example: The relocation of The Eiffel Tower How to find which elements of the context to preserve in order to safeguard the archival value of the video (its correct interpretation) ? A methodological approach: The commutation test (from linguistics): The substitution of an item of the expression can cause a possible modification of the meaning Ex. changing a phoneme of a word (peer – beer). Context sensitive archiving on the web| 2 mai 2010 9 How to reconstruct the meaning in complex documents? Web Documents are often complex and referring to a large spectre of cultural elements. Where is the document and where the context? Hypothesis We can reconstruct the meaning through a narrativization. Narrativization can be based on the research of clues It’s the critical historical approach called by Ginzburg “evidential paradigm” (clues are in this case the significant elements found through the commutation test). A Sherlock Holmes’ approach… Context sensitive archiving on the web| 2 mai 2010 10 Example: narrativization based on clues The Dailymotion channel of Gameblog.fr posts a news report on France 2 from the 21st of November 2004, and explains that the content was an amalgam of fake news. It announces a collective suicide in Japan: 147 people committed suicide because of a delay in the release of a videogame (Dead or Alive). They swallowed some sachets silicon… of Context sensitive archiving on the web| 2 mai 2010 11 Example: narrativization based on clues A link in a comment allows us to better understand what happened. France 2 cited an articled which appeared in the newspaper Libération, reporting a collective suicide in Mars 2004. The source of the article was a Blog post. Context sensitive archiving on the web| 2 mai 2010 12 Example: narrativization based on clues The post was satirical: it appeared on the webzine Xbox Mag to mock the excessive interest in the release of this product by videogamers. Context sensitive archiving on the web| 2 mai 2010 13 Example: narrativization based on clues The editors of Xbox Mag advised France 2 and Libération about the error. The 25th of November Libération presented a rectification. The 26th of November France 2 announces the error blaming the “Anglo-Japanese press” (their only source was Libération) Context sensitive archiving on the web| 2 mai 2010 14 The complexity of a web document The example reveals: The Intrinsic Value of a Web Document Web Archiving is the most complete way to reconstruct these events (TV and press are not sufficient) The problem of the completeness of traces To understand the facts we need no less than 3 web pages often not interrelated: -The video posted on Dailymotion -The original post on Xbox Mag -The post on Xbox Mag explaining the errors The Web always refers to (and remediates) other medias: -The archival video of France 2 (conserved at Inathèque) -The press: Libération Context sensitive archiving on the web| 2 mai 2010 15 How to help reconstructing the narration? Improve completeness DlWeb archives traces Give access to the researcher to all available technical and methodological information (ie archiving context) → clues Develop tools to help the researcher to organise and exploit these clues → Methodological DlWeb workshops with audiovisual researchers, archivists and documentalists Context sensitive archiving on the web| 2 mai 2010 16