between source and user. However, search is not the only strategy for exploring
archives-users may discover archival materials by moving around in the digital
environment. Search is great if we know what we are looking for, but discovery
reveals what we did not know existed, it generates new relationships. Linked Data
support discovery thanks to their intrinsic nature: the underlying graph is not only a
data architecture, but also a network of nodes that can be used as a path to explore
freely the vastitude of online resources.
Linked Data: risks
Unfortunately, the granularity of Linked Data runs counter to current descriptive
practices, characterized by the abundant use of free text in archival descriptions, a
condition that severely limits the possibilities for interoperability and perpetuates
the isolation of archival data, preventing integration with other types of data. This is
an inherent limitation of the most prevalent forms of archival representation
(inventories and guides in particular), which makes the adoption - rather, the
exploitation - of the RDF model difficult. As a matter of fact, all archival descriptive
models, including EAD, favor the narrative character of the finding aid. As noted
many years ago by Elizabeth Yakel, "the concentration on the finding aid as
document rather than as one of many potential representations of discrete data
elements has led to problems of reusing archival data archival across the archival
continuum and problems in the development of true collection management
systems for archives." (Yakel, 2003, p. 18).
Trying to move a step further, the International Council on Archives initiated years
ago a process of revision of its standards for archival description. This initiative has
led to the publication of a new conceptual model in September 2016, clearly and
explicitly driven by the RDF data architecture (ICA, 2016). Therefore, this model
takes into account the technological developments of recent years, and builds on the
idea of graph as the ideal architecture for conveying information on the context:
"Modelling description as a graph accommodates the single, fonds-based, multilevel
description modelled in ISAD(G), but also enables addressing the more expansive
understanding of provenance described above." (ICA, 2016) ICA intends to move
archival description from a multi-level to a multi-dimensional approach: "The
multidimensional model sees the fonds existing in a broader context, in relation
to other fonds. In a multidimensional approach to description, the Records and Sets
of Records, their interrelations with one another, their interrelations with Agents,
Functions, Activities, Mandates, etc., and each of these with one another, are
represented as a network within which individual fonds are situated." (ICA, 2016)
This initiative has adopted the key words in current information architecture: graph,
multi-dimensionality, networks of interrelations. However, this document raised
some relevant objections in the archival community, with regard to different
aspects.24 In particular, InterPARES Trust, a large community of hundreds of
researchers from all over the world, laid down a set of critical comments about the
fairness and transparency of the process, the methodology adopted for developing
the model, and the model itself. The concluding statements of the document
submitted by InterPARES Trust are quite explicit:
In short, we find that RiC-CM is weak as a model, in that it neither defines
the structures it uses (entity, property, relation) nor provides a rationale for
their use. A conceptual model should identify and define the fundamental
bricks used to build the model. Ultimately, the document fails to
adequately address a model for discovery of archival resources, a model that
accommodates multiple users and uses. EGAD and ICA should re-start
the development process on a new, transparent and fair basis
(InterPARES Trust, 2016)
It will be interesting to see whether and how these concerns will be addressed in the
future, and - in case - where this will lead the concept of provenance. As noted
before, in the past twenty years the International Council on Archives has changed
its approach to provenance a few times, interpreting it first as an agent, then as a
single relationship, later as a set of relationships, and now as a multi-dimensional
concept. Therefore, there is some reason to believe this is neither the perfect solution
nor the final step.
Another issue to consider when dealing with Linked Data is expressed outright by
Hay Kranen in his blog: "Linked data is all nice and dandy, but if your SPARQL
endpoint is only up 50% of the time and it takes a minute to do a query, how do you
suppose a developer builds a stable app on top of it?" (Kranen, 2014) The post dates
back to 2014, but it still holds true: keeping an endpoint up can be challenging. In a
comment to the same post, Marcus Smith noted: "It's almost become an in-joke that
six simultaneous users of a SPARQL endpoint constitutes a DDOS attack." In
fairness, it should be recognized that endpoints and triple-store technologies are
young, so it is likely that the situation will improve in the course of time.
The fact that the Semantic Web technologies are rather difficult to implement and
require high skills is another issue to consider when dealing with Linked Data.
However, this too is a problem related to technologies that are still not completely
mature: probably it still needs some time before both technologies and skills become
less esoteric.
Most of all, the fundamental problem of Linked Data lies in their very structure. The
critical problem is the graph. As Bowker and Star note, "[e]ach standard and each
category valorizes some point of view and silences another. This is not inherently a
bad thing - indeed it is inescapable. But it is an ethical choice, and as such it is
dangerous - not bad, but dangerous." (Bowker, Leigh Star, 2000, p. 5-6) We need to
archives in liquid times
24 Some critical comments have been posted to both the ICA mailing list devoted to this initiative (ICA-EGAD-
RiC Mailing List, http://lists.village.virginia.edu/mailman/options/ica-egad-ric) and the ICA mailing list
(ICA Mailing List, http://www.ica.org/en/ica-list-serv). Chris Hurley has published on his blog a dense
critique on RiC opening his post with a short yet effective consideration: "RiC is a conceptual model in
search of a concept." See Chris Hurley, "RiC at Riga," Chris Hurley's Stuff, August 2017, http://www.
descriptionguy.com/images/WEBSITE/ric_at_riga.pdf. William Maher, in his role of Chair of the ICA
Section on University and Research Institution Archives, has raised some reasonable and thoughtful
240
giovanni michetti provenance in the archives: the challenge of the digital
doubts about RiC in relation to archivists' missions and mandates. See William J. Maher, ICA-SUV 2017
Conference Summary, accessed October 6, 2017, https://icasuvblog.wordpress.com/2017/09/13/ica-suv-
2017-conference-summary/. RiC describes as much as seventy-three "potential record-to-record relations".
Instead of "seeking an exhaustive list of every relation that might exist between two records," Ross Spencer
has taken a different approach and has outlined in his essay eight relations only. See Ross Spencer, "Binary
trees? Automatically identifying the links between born-digital records," Archives and Manuscripts 45 (2017):
77-99.
241