many years ago you would need to get each object out and examine it to re-create
that metadata. Happily, with a box of books it is fairly clear to see the physical object
and to read its title and even glance inside to see what it is about and when it was
published.
Once that single box of books becomes a vast warehouse full of boxes of books - the
task becomes daunting in scale, but the principle is still valid even though it might
take an army of people to read all of the books in the boxes.
Imagine however that your books have no covers and bindings. The box now
contains a set of paper sheets. If the sheets remain in series it is still possible to
recreate the complete book. And whilst that might turn into a massive jigsaw puzzle
- finding the right sheets in the right order for each book - there may be clues such
as page numbers, chapter numbers and the like to help. A lengthy and tedious
process but still achievable.
Catalogues and Metadata and digital objects (blobs)
In the digital world the problem is similar, but worse. Instead of books, imagine we
have a virtual representation of a book. A digital object made up from a collection of
bits and bytes.1
This digital object may be made up of many smaller pieces (much like pages in a
book, or pictures in an album) and a term that is used to describe a generic storage
object that can take a variety of formats is a blob.
If we want to store that blob, like the books we might put it into a box and place it in
the attic. We now have a blob in a box. Indeed, we may have a whole bunch of blobs
in a box, left in the attic.
Books have an established form. Book covers and binding have a recognised physical
shape and can usually be clearly recognised. Computer storage technology is still in
relative immaturity, and has a rate of change that is showing no signs of slowing
down. Simple word- processed files from 20 years ago are utterly incompatible with
modern programmes, even ignoring the alterations in the physical media upon
which it is stored.
So now we have a box of blobs from the attic and we need to look at some data.
Unlike before we can't just open the box and read the books. When we look into the
box of blobs, we aren't even able to identify any books, covers or pages, let alone the
order they should be in. And what is worse the 'language' is something we don't
recognise.
To find meaning in this data you would need to be a digital archaeologist and to have
a modern version of the Rosetta Stone.2 Even then, this is a task much less attractive
than piecing to together the pages of a physical book.
Metadata is key to using digital data on a computer. The structure of files,
directories and naming hides a vast amount of data that we take for granted. So
much so that modern day personal computers don't often even show the file
extension (.PDF, .DOCX etc.) that is critical to how the machine interprets the data.
In this digital world, how does an archivist or librarian determine what is important
to store about a digital object?
What is it that may be necessary to provide the information to interpret the
collection of bits and bytes that we store away?
Assumptions of today's technology may not be valid nor even rational assumptions
in the future based on the rate of change that we have seen over past decades, so how
do we make sure that we record the relevant metadata so that we don't need to
guess?
Data becomes information only when it has the relevant context. The ultimate
context is the entire world at a given date, so in this respect the art of the digital
preservation is to determine the appropriate and relevant context for a digital object
so that it can be interpreted as information at some future date yet to be determined.
The Open Preservation Foundation
The EU sponsored a four-year (2006-2010) project called PLANETS3 (Preservation
and Long-term Access through Networked Services) that aimed to build practical
services and tools to help ensure long-term access to digital cultural and scientific
assets. Following that project the Open Preservation Foundation (OPF) was set up
to provide a long-term sustainable home for such open source digital preservation
tools.
The PLANETS consortium, coordinated by the British Library, brought together
expertise across Europe from national libraries and archives, leading research
universities and technology companies. Most of the members of that consortium
remain members of the OPF today. Over the past eight years OPF membership has
grown, as has the size of the organisation.
OPF continues to enable knowledge sharing and best practice work in addition to
adopting and sustaining digital preservation tools that are used by the international
community. OPF relies on its members for guidance and practical feedback on the
tools and resources under its stewardship and welcomes new members on a regular
basis.
Today OPF manages a reference set of tools for digital preservation, and to put them
into context has developed a generic preservation process map.
hoofdstuk 1
1 Computer data storage and processing architecture is broken down into: bits (each bit being a one or a zero),
gathered together in bytes (recent convention is 8 bits make a byte, but historically that may differ in 1995
the DEC PDP8 had a 12 bit architecture), gathered together in words (several bytes handled in one
operation).
46
martin wrigley, becky mcguinness, carl wilson the open preservation foundation
reference toolset
2 The Rosetta stone, found in 1799 and dated back to 196BC, carried three versions of the same decree issued
by King Ptolemy V. with one version in Egyptian Hieroglyphics and another in Ancient Greek this allowed
interpretation of other hieroglyphic texts.
3 https://planets-project.eu/ last accessed 3/10/18
47