which may lead to an ever-expanding set of statements taking the form of a graph. Therefore, it is not surprising that Linked Data are disseminated on the Web more and more, and both archivists and records managers are slowly following this trend, creating and distributing information in the form of Linked Data, thus changing system designs and descriptive practices. However, the archival community has not yet developed an ontology modelling and representing provenance, whereas the data science community has already created its own ontology for representing entities and relationships with respect to the origin and provenance: the PROV ontology defines provenance as "information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness." (World Wide Web Consortium, 2013). The basic model of the PROV ontology is simple and recursive, allowing for great complexity and expressiveness. Its core concepts are Entities, Agents, and Activities (Figure 1).19 Call them Objects, Agents and Functions, and the picture in Figure 1 will make perfect sense in the archival domain.20 The PROV ontology focuses on lineage, that is, on data's origins and history, in order to provide a tool that may help tracing the objects back to their creation, which is a critical issue for the verification of the process used to generate the data. Provenance, particularly in the archival domain, is a broader concept that serves to identify and possibly document any input, entity, system and process that has affected data.21 Therefore, while the PROV ontology may be a good starting point, there remains much space for discussion on how to best model archival provenance. What is quite clear though is that we are very likely to adopt the Linked Data paradigm. Therefore, it is worth highlighting some pros and cons of Linked Data, in order to understand the effects on provenance. Linked Data: benefits The benefits of Linked Data are quite evident: due to their characteristics (i.e., the compliance with the requirements established by Tim Berners-Lee) they are inherently shareble, extensible and re-usable.22 Resources - be they physical or conceptual - are identified by language-agnostic URIs and connected through properties that can be identified by a URI as well. This leads to the creation of a gigantic network: this mechanism allows anyone not only to add further descriptions (i.e., statements) that increase the size and the density of the original network of relationships; but also, to establish new connections among different, separated networks. Once a connection is made, the two networks become a whole, that is, an enlarged and enriched network of relationships, which eventually leads to a Giant Global Graph, as Tim Berners-Lee called it (Berners-Lee, 2007). In addition, semantic enrichment - that is, the process of adding layers of metadata to content - allows computers to process data and make sense of it, so that they can find, filter, and connect information. Linked Data in the archival domain allow for the creation of new pathways not only for exploring archival descriptions, but also for accessing the resources themselves. In fact, on the one hand Linked Data makes it possible to explore the complex web of relationships that make up and define the context in which objects are placed, and that has a fundamental value in Archival Science. In this sense, we could say that Linked Data increase the possibilities for exploring the metalevel (that is, the descriptive dimension). On the other hand, the fragmentation of information operated by Linked Data creates new and numerous points of direct access to resources. In theory, Linked Data are an unlimited source of access points.23 This is perfectly in line with the non-specialist research practices suggested by search engines: more and more users get to the archival sources through Google and other search engines, with an approach that favors the process of disintermediation archives in liquid times wasDerivedFrom Entity wasAttributedTo Agent used actedOnBehalfOf wasAssociatedWith startedAtTime endedAtTime wasInformedBy xsd:dateTime xsd:dateTime Activity Figure 1. The basic classes and properties of the PROV ontology. 19 The PROV ontology defines the classes Entity, Activity and Agent, and link such classes through properties, as shown in Figure 1. An Entity is a "physical, digital, conceptual, or other kind of thing with some fixed aspects; [it] may be real or imaginary." An Activity is "something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities." An Agent is "something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity." See World Wide Web Consortium, PROV-O: The PROV Ontology, W3C Recommendation 30 April 2013, eds. Timothy Lebo, Satya Sahoo and Deborah McGuinness, accessed October 6, 2017, https://www.w3.org/TR/2013/REC-prov-o-20130430/. 20 The diagram included in ISDF is not much different. See International Council on Archives, ISDF: International Standard for Describing Functions (Paris: ICA, 2007), 36. 238 giovanni michetti provenance in the archives: the challenge of the digital 21 Some authors distinguish between different types of Provenance, such as Why-Provenance, How- Provenance, Where-Provenance, and Workflow-Provenance. See for example James Cheney, Laura Chiticariu and Wang-Chiew Tan, "Provenance in Databases: Why, How, and Where," Foundations and Trends in Databases I (2009): 379-474; Peter Buneman, Sanjeev Khanna and Wang Chiew Tan, "Why and Where: A Characterization of Data Provenance," in Proceedings of the 8th International Conference on Database Theory, London, 2001, eds. Jan Van den Bussche and VictorVianu (London, UK: Springer, 2001), 316-330; Susan B. Davidson and Juliana Freire, "Provenance and Scientific Workflows: Challenges and Opportunities," in Proceedings of the SIGMOD International Conference on Management of Data, Vancouver, 2008, ed. Jason Tsong-Li Wang (New York, NY: ACM, 2008), 1345-13 50. According to this perspective, lineage is a type of Provenance. 22 This is especially true for Linked Open Data (LOD), that is, "Linked Data which is released under an open licence, which does not impede its reuse for free." Berners-Lee, Linked Data. 23 Even with an advanced descriptive model like EAD we must go through a tag like <controlaccess> to identify the access points. 239

Periodiekviewer Koninklijke Vereniging van Archivarissen

Jaarboeken Stichting Archiefpublicaties | 2017 | | pagina 121