Making Digital Artif..

advertisement
Making Digital Artifacts on the Web
Verifiable and Reliable
Abstract:
The current Web has no general mechanisms to make digital artifacts such
as datasets, code, texts, and images verifiable and permanent. For digital artifacts
that are supposed to be immutable, there is moreover no commonly accepted
method to enforce this immutability. These shortcomings have a serious negative
impact on the ability to reproduce the results of processes that rely onWeb
resources, which in turn heavily impacts areas such as science where
reproducibility is important. To solve this problem, we propose trusty URIs
containing cryptographic hash values. We show how trusty URIs can be used for
the verification of digital artifacts, in a manner that is independent of the
serialization format in the case of structured data files such as nanopublications.We
demonstrate how the contents of these files become immutable, including
dependencies to external digital artifacts and thereby extending the range of
verifiability to the entire reference tree. Our approach sticks to the core principles
of the Web, namely openness and decentralized architecture, and is fully
compatible with existing standards and protocols. Evaluation of our reference
implementations shows that these design goals are indeed accomplished by our
approach, and that it remains practical even for very large files.
EXISTING SYSTEMS:
 Our approach sticks to the core principles of the Web, namely openness and
decentralized architecture, and is fully compatible with existing standards
and protocols.
 There are a number of existing approaches to include hash values in URIs
for verifiability purposes, e.g. for legal documents.
 This reversibility is needed once an existing trusty URI resource containing
self-references should be verified.
 We transformed these nanopublications into the formats N-Quads and TriX
using existing off-theshelf converters.
DISADVANTAGE:
 The same input always leads to exactly the same hash value, whereas just a
minimally modified input returns a completely different value.
 The downside of such custom-made solutions is that custom-made software
is required to generate, resolve, and check the hash references.
 Here, approach that could replace such specific ones, thereby establishing
interoperability of systems and standard infrastructure for creating,
resolving, and checking hash references.
PROPOSED SYSTEMS
 propose trusty URIs containing cryptographic hash values. We show how
trusty URIs can be used for the verification of digital artifacts, in a manner
that is independent of the serialization format in the case of structured data
files such as nanopublications.
 we propose an approach to make items on the (Semantic) Web verifiable,
immutable, and permanent.
 This approach includes cryptographic hash values in Uniform Resource
Identifiers (URIs) anadheres to the core principles of the Web, namely
openness and decentralized architecture.
 Nanopublications have been proposed as a new way of scientific publishing.
ADVANTAGE:
 Nanopublications can cite other nanopublications via their URIs, thereby
creating complex citation networks.
 Published nanopublications are supposed to be immutable,
but there is
currently no mechanism to enforce this.
 It is well-known that even artifacts that are supposed to be immutable tend to
change over time, while often keeping the same URI reference.
CONCLUSION:
We have presented a proposal for unambiguous URIreferences to make digital
artifacts on the (Semantic Web verifiable, immutable, and permanent. If adopted,
it could have a considerable impact on the structure and functioning of the Web,
could improve the efficiency and reliability of tools using Web resources, and
could become an important technical pillar for the Semantic Web, in particular for
scientific data, where provenance and verifiability are important. Scientific data
analyses, for example, might be conducted in the future in a fully reproducible
manner within “data projects” analogous to today’s software projects. The
dependencies in the form of datasets could be automatically fetched from the Web,
similar to what Apache Maven does for software projects, but decentralized and
verifiable.
Download