ppt - JTS 2010

advertisement
Paper authors:
Thomas Drugeon
Valentine Frey
Jérôme Thièvre
Matteo Treleani
Context Sensitive Archiving of Videos on the Web
JTS 2010, 3 May 2010
Ina collections
Current collections
60 years of TV program and 70 years of radio program
Legal deposit since 1992
4,500,000 hours of TV and radio
+ 1,000,000 hours captured live from 102 TV and radio channels each year
→ Preserve, promote, transmit
Extension to the Web
Web legal deposit law (2006), shared between BnF and Ina, as an extension to
their current collections
Ina is developing specialized tools and methods to collect, archive, preserve,
and give access to this archived web collection
Context sensitive archiving on the web| 2 mai 2010
2
Web Legal Deposit
Archiving French audiovisual information on the web
→ Focus on audiovisual contents
Why not only archive video and audio contents from the web?
The web is not just a way to access contents, it is a media
→ Archiving websites related to French audiovisual media
Operational since February 2009
as of april 2010:
6000 websites (3000 at start)
2,500,000,000 “objects”, 260 TB
10,000,000 video objects, 100 TB
19,000,000 autio objects, 100 TB
→ 260 TB compressed to only 21 TB of storage (DAFF)
Context sensitive archiving on the web| 2 mai 2010
3
Methods
The web is not a broadcast media:
no stream to capture, no explicit path to follow
The web responds to interactions
We have to discover and recreate these interactions to archive it
→ crawling
Websites grow and change in heterogeneous ways
We have to visit a page to know it was updated
→ sampling
Accessing the archive means browsing it
We have to recreate the interactions to make the archive browsable
→ simulating
Context sensitive archiving on the web| 2 mai 2010
4
Limits
Crawling
Some interactions cannot be crawled, and thus some contents will be
missing or altered in the archive (pages or parts of pages)
Sampling
Some updates will be missing
Linked pages are crawled at a different date from the original page
Simulating
Dead web (train reservation, google search, etc.)
Some interactions are lost (crawling issues)
Temporal inconsistencies between pages (sampling issues)
Context sensitive archiving on the web| 2 mai 2010
5
Web Archaeology
The consequence of technical problems:
Non-Integrity of web documents
Integrity: the document hasn’t been altered (Lynch, 1994)
DlWeb archives traces
How to preserve authenticity and reliability
without depending on material integrity?
Authenticity: the document is what it pretends to be (Duranti, 2001)
Reliability: we can trust the document and its content (Bachimont, 2009)
Reconstructing the meaning of the document through traces
(a sort of archaeological practice)
Context sensitive archiving on the web| 2 mai 2010
6
Web Archiving: pre-eminence of the meaning
Meaning precedes the material form.
We thus have to find the elements influencing the meaning.
Example
Preserving the meaning of a video posted on the web
means to preserve the significant elements of the context
Context influences the meaning of a video posted on the web
But not all the items of the context have the same impact on interpretation.
Context sensitive archiving on the web| 2 mai 2010
7
Example: The relocation of The Eiffel Tower
Ina.fr posted a news programme from 1964: the
Eiffel Tower was to be relocated.
The video provoked a buzz on the Web.
Context sensitive archiving on the web| 2 mai 2010
8
Example: The relocation
of The Eiffel Tower
How to find which elements
of the context to preserve in
order to safeguard the
archival value of the video
(its correct interpretation) ?
A methodological approach:
The commutation test
(from linguistics):
The substitution of an item of the
expression can cause a possible
modification of the meaning
Ex. changing a phoneme of a word
(peer – beer).
Context sensitive archiving on the web| 2 mai 2010
9
How to reconstruct the meaning in complex documents?
Web Documents are often complex and referring to a large spectre of
cultural elements.
Where is the document and where the context?
Hypothesis
We can reconstruct the meaning through a narrativization.
Narrativization can be based on the research of clues
It’s the critical historical approach called by Ginzburg “evidential paradigm”
(clues are in this case the significant elements found through the commutation test).
A Sherlock Holmes’ approach…
Context sensitive archiving on the web| 2 mai 2010
10
Example: narrativization based on clues
The
Dailymotion
channel
of
Gameblog.fr posts a news report on
France 2 from the 21st of November
2004, and explains that the content
was an amalgam of fake news.
It announces a collective suicide in
Japan: 147 people committed
suicide because of a delay in the
release of a videogame (Dead or
Alive).
They swallowed some sachets
silicon…
of
Context sensitive archiving on the web| 2 mai 2010
11
Example: narrativization based on clues
A link in a comment allows us to
better understand what happened.
France 2 cited an articled which
appeared
in
the
newspaper
Libération, reporting a collective
suicide in Mars 2004.
The source of the article was a Blog
post.
Context sensitive archiving on the web| 2 mai 2010
12
Example: narrativization based on clues
The post was satirical: it appeared
on the webzine Xbox Mag to mock
the excessive interest in the
release of this product by
videogamers.
Context sensitive archiving on the web| 2 mai 2010
13
Example: narrativization based on clues
The editors of Xbox Mag advised
France 2 and Libération about the
error.
The 25th of November Libération
presented a rectification.
The 26th of November France 2
announces the error blaming the
“Anglo-Japanese press”
(their only source was Libération)
Context sensitive archiving on the web| 2 mai 2010
14
The complexity of a web document
The example reveals:
The Intrinsic Value of a Web Document
Web Archiving is the most complete way to reconstruct these events
(TV and press are not sufficient)
The problem of the completeness of traces
To understand the facts we need no less than 3 web pages often not interrelated:
-The video posted on Dailymotion
-The original post on Xbox Mag
-The post on Xbox Mag explaining the errors
The Web always refers to (and remediates) other medias:
-The archival video of France 2 (conserved at Inathèque)
-The press: Libération
Context sensitive archiving on the web| 2 mai 2010
15
How to help reconstructing the narration?
Improve completeness
DlWeb archives traces
Give access to the researcher to all available technical and
methodological information (ie archiving context)
→ clues
Develop tools to help the researcher to organise and exploit
these clues
→ Methodological DlWeb workshops with audiovisual
researchers, archivists and documentalists
Context sensitive archiving on the web| 2 mai 2010
16
Download