understand the meanings and biases hidden in our professional tools, practices and
theories. "Recognizing the presence of an underlying paradigm and understanding
the values it conveys is not difficult when we deal with concepts, principles and
categories, while it may be tricky when we deal with technical, apparently neutral
standards. In fact, different technologies may rely on different philosophies."
(Michetti, 2015, p. 155) So far, archivists and records managers have focused on the
documentary object as a whole. RDF and Linked Data are almost a Copernican
revolution, because they rely on information atoms that - in theory - can be
aggregated and manipulated at will. This is the perfect solution for those like Greg
Bak who advocate an item-level thinking (Bak, 2012). However, the adoption of
XML, RDF, Linked Open Data and other technologies is more than a technical
option: it is rather the choice of a specific knowledge paradigm, not at all neutral.
In the case of Linked Data, the graph is not only the symbolic representation of the
network of relationships among the entities that make up the archival description.
It is also the form taken by data, the structure that houses the descriptions, the
container that gives shape to our vision of the world. To paraphrase Bowker and Star,
there is nothing wrong with that. However, we need to understand the profound
significance of this approach.
The graph offers many advantages, but its strength - that is, the potential to create a
network of connections that can be expanded indefinitely - can prove to be a limit.
For example, if we consider EAD, it is evident that its limit resides in its design, that
is, in thinking and designing an archival description as a document. As a matter of
fact, EAD provides a digital replica of the paper object. However, it is also true that
this approach has still some reasons, when we recognize that archival description is
an autonomous work. In fact, in addition to practical and operational purposes,
archival description has also a fundamental function of mediation between sources
and users, and supports the authenticity of the sources. In a graph, it can be difficult
to recognize the boundaries of a given archival description. With Linked Data,
Anyone can say Anything about Anything25: once we accept this so-called Principle
of the triple A, links explode - that's the beauty of Linked Data -, boundaries
disappear and users can access directly from anywhere in the graph. In a sense, this
is a profound form of disintermediation that is destined to grow as visualizations
techniques and strategies occupy the archival space, dominated so far by written
word, narrative and hierarchical diagrams. The complex network of relationships
underlying - rather, making up - an archive can be now be represented in a myriad
of ways. This is not a criticism of Linked Data: the graph paradigm is indeed a
promising data architecture. This is rather an exploration of the possible limits and
dangers of this paradigm. In short, archivists should investigate this transformation
process that is slowly moving archival description in a direction that leads to
bibliographic description: high fragmentation of information, and reduction of the
narrative dimension.
Finally, it should be noted that the effects of the principle of triple A are multiplied
when added to the Open World Assumption (OWA). Roughly speaking, this
assumption states that that the absence of a statement does not imply a declaration
on the absence (for example, the absence of date of birth does not mean that the
person is not born).26 Under these conditions, what value should be attributed to
the statements (i.e., the triples)? The question is not trivial and indeed takes us back
to issues such as source of authority and technical expertise, which have a deep
connection with provenance and thus should be taken into account when designing
new models for archival description. Strategies are needed to assess users' trust in
relation to the quality of information on provenance. After all, this brings us back to
the trust issue that Tim Berner-Lee already identified at the top of the Semantic Web
stack (Berners-Lee, 2000).
Conclusions
As already stated and discussed, the Principle of Provenance is a pillar of Archival
Science, originally intended to prevent the intermingling of documents from
different origins, in order to maintain the identity of a body of records. Peter Scott
challenged such a view. As a consequence, provenance in the archival domain moved
from a simplistic ono-to-one relationship to a multi-dimensional approach, and
started being understood as a network of relationships between objects, agents and
functions. Conceptual debate pushed the boundaries of provenance further: the
established orthodoxies cracked under the weight of societal, parallel and
community provenance. The digital environment and new technologies have
presented unpredictable challenges to the concept of provenance: not only are
digital objects often the result of an aggregation of several different pieces, but it also
is extremely easy to mix and re-use them, to a point where it may be very difficult to
trace their provenance. Cloud Computing has complicated the picture further, due
to the little control that it is possible to exercise over the Cloud service providers and
their procedures. As a result, the archival functions are compromised, since objects
get their meaning from their context, and provenance plays a major role in
identifying and determining such context: whenever provenance is flawed, so is
context, hence the overall meaning of an object. Moreover, any lack of control over
provenance determines uncertainty, which in turn affects trust in digital objects,
thus hindering the implementation of the top level of the Semantic Web stack
designed by Tim Berners-Lee.
However, new technologies provide a solution to cope with such complexity.
Resource Description Framework (RDF) and ontologies can be used to represent
provenance through new standards and models in a granular and articulated way
that was not conceivable before the advent of computers. Provenance is slowly
taking the form of a network of triples, that is, a complex set of interrelated
statements that is apparently distant from the original Principle of Provenance, yet
archives in liquid times
25 "To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to say
anything about anything. In general, it is not assumed that all information about any topic is available.
A consequence of this is that RDF cannot prevent anyone from making nonsensical or inconsistent
assertions, and applications that build upon RDF must find ways to deal with conflicting sources of
information." World Wide Web Consortium, Resource Description Framework (RDF): Concepts and Abstract
Data Model, W3C Working Draft 29 August 2002, eds. Graham Klyne and Jeremy Carroll, accessed October
6, 2017, https://www.w3.org/TR/2002/WD-rdf-concepts-20020829/#xtocid48014.
242
giovanni michetti provenance in the archives: the challenge of the digital
26 The Open World Assumption codifies the informal notion that in general no single agent or observer has
complete knowledge. Not surprisingly, the Semantic Web makes the Open World Assumption.
243