preserving, and as such the organisation needs a set of policies to determine which validation errors can be ignored and which must be corrected prior to storage. At the end of the validation process, the metadata that accompanies the digital thing will be updated to carry details of the validation, the policy choices made and the results of the checking. Positive and negative metadata may be included and once again, this is a choice of policy. Characterisation involves taking information about the thing and the nature of its content to build a richer context beyond the purely technical, structural details. Many formats hold a wealth of information concerning the content - even before looking at the content itself. For example, a word document usually records the author, the time of creation, the editing time, word count and more. Sometimes there is a specific summary of the document as well. And a JPEG image file may carry information about the camera used to take it, time and day and often even geolocation information for its precise location. However, the librarian or archivist needs to make a series of decisions about such data. The instinctive response might be to extract everything, going to limits in a jpeg for example finding colour profile, or raw image sensor info. Such information it could be argued is invaluable in the case of checking quality should the file format require migration. The right level of detail needed depends on the intention and processes in place for the organisation. Once sufficient information has been taken to ensure that the original file can be adequately indexed and examined in future, any requirements for further descriptive data could be extracted at a later date. However, the presence of such data in the file and the policy for data to be extracted and duplicated outside the file is itself a key piece of metadata to be stored alongside the digital thing. Package and cross check is a process not yet explicitly covered by OPF tools, but usually carried out through workflows and processes and report formatting in each organisation. This is where the metadata gathered is prepared for taking forward to storage in the database. In effect one of the final stages of forming and wrapping the thing to be stored into an appropriate archive information package. There are a range of metadata formats and standards such as PREMIS9 and METS10 and others that proscribe a structure for metadata. Librarians and archivists need to determine the most suitable form for their purposes and which elements of that to populate. Having chosen that format and the required elements, the metadata information gathered so far must be placed into that format in readiness to archive the digital object. However, there is one other aspect considered here. There will be occasions where the found metadata may conflict with provided information, for example sometimes the stated author may differ from the authors name as discovered through characterisation of the digital thing. This aspect of cross-checking prevents future ambiguity. However, such as cross-check implies yet further decisions as to which data is to be trusted? Or should both be stored? Once again, the librarian or archivist has to make a choice depending on their purposes. Without choosing the right level of metadata to store, the danger is that the volume of metadata can easily exceed that of the original item. Items are usually selected based on a clear preservation need, and a fine balance is needed between minimalist data and keeping everything. OPF Reference toolset and the digital preservation process map Practical tool usage, scalability and quality control OPF monitors the number of downloads of the tools that it supports, and is in discussion with the users to design and implement a more sophisticated monitoring framework. Naturally much of the operations in which the tools are used are sensitive and OPF is mindful of preserving that confidentiality. However, building a set of usage data will enable OPF and tool users to build statistics on the types of formats processed and format errors found. This will help eliminate systematic errors in generated documents and will quantify the importance of various formats amongst the tool users. The monitoring framework will also enable OPF to respond to any issues found with the toolset itself. Today OPF runs a comprehensive testing process for each release of a tool, and holds a set of test data of each format to ensure reliability and preserve backwards compatibility wherever possible or practical. hoofdstuk 1 9 PREMIS Data Dictionary: a comprehensive, practical resource for implementing preservation metadata in digital archiving systems; accompanying report (providing context, data model, assumptions); special topics, glossary, usage examples; set of XML schema which was developed to support use of the Data Dictionary. 10 Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema. The standard is maintained as part of the MARC standards of the Library of Congress, and is being developed as an initiative of the Digital Library Federation (DLF). 52 martin wrigley, becky mcguinness, carl wilson the open preservation foundation reference toolset Tool Name Part of the process File types handled Fido Identification Based on list of signatures in PRONOM Format Sniff Identification Based on list of signatures in PRONOM VeraPDF Validation, Characterisation PDF/A Jpylyzer Validation. Characterisation JPEG 2000 Jhove Validation, Characterisation PDF, JPEG, TIFF, WAV, PNG, WARC, AIFF, XML, HTML, GZIP, ASCII text, UTF8 Text. MP3, GIF, JPEG 2000 xcorrsound Quality Check Sound files DPF Manager Validation, Characterisation TIFF PDF Test suite Test Corpus PDF/A Table 1. OPF Tools and how they map 53

Periodiekviewer Koninklijke Vereniging van Archivarissen

Jaarboeken Stichting Archiefpublicaties | 2018 | | pagina 27