Digital Preservation Process Map
Once a librarian or archivist has identified that a 'thing' should be preserved, they
need to catalogue and classify the item. This is as true with a digital 'thing' as a
physical thing. One key question is how much context around the thing do you need
to record to be able to interpret the information at a later date.
A great advantage with a digital thing is that much of the structural and technical
context can be collected programmatically. At the time of wanting to put the thing
into a box, we have the computer context to hand.
The process has several distinct stages. Principal of these are unpacking, identifying,
validating and characterising the thing and the data in that thing.
One golden rule is that the digital thing to be preserved should not be changed or
amended, unless a deliberate decision is made to make a copy of the original. Policies
for translation, migration or application of fixes or redactions are highly dependent
on the purpose and goal of the preserving organisation. This then implies that each
thing has its associated metadata stored alongside it. This second item - a metadata
thing - must then be stored away in the box too. The structure and standards4 for
such a metadata thing is beyond the scope of this article.
The usual high-level model in use in digital preservation is the OAIS5 and the process
map described here is part of the ingest pre-ingest function within that model. It
accepts a Submission Information Package (SIP) and processes it for storage as an
Archival Information Package (AIP) to be later retrieved as a Dissemination
Information Package (DIP).
There are many ways that the OAIS model has been implemented in various
organisations, and the standards continue to develop. The European E-Ark project is
part of that development and standardisation process covering a wider scope of the
model than described in this article.
The existing landscape of tools, processes and policies has grown up from practice
and research in a number of major organisations across the world. Typically however
this has been an under resourced area, and to help it develop there have been a
number of European Union funded projects to deliver aspects, and individual tools
and elements contributed from individuals and community minded organisations.
The Open Preservation Foundation was born from one such European funded
project - PLANETS and has participated in several more along the way and now
exists to support a set of open source tools - freely available and long term
sustainable - to support a section of the ingest pre-ingest function of the OAIS
model.
Identify
Validate
Characterise
Package,
Quality
Assurance,
Review,
Cross Check
Put into a
Box
(turn into
an AIP)
Figure 1 Digital Preservation process map for OPF Reference Toolset
Before we can even start looking at a thing, there is a stage in the process that solves
any issues with physical media. For example extraction from a floppy disc, JAZZ
drive, micro tape or any other storage media device. There are many challenges in
that, but these are not addressed in this article, nor by the work of OPF. We start this
process at the point of having a digital thing that we can programmatically examine.
One important factor to point out here is that the sub set of the OAIS process
identified by OPF is applicable to virtually every organisation, and doesn't vary
much depending on the purpose or specific needs of a digital preservation
organisation. Everyone needs to do these common things, although the details,
order of processing, policies and choices may well differ.
The typical first step is to identify what the thing is. It may be a single file or a
complex structure (such as a ZIP file) that contains other discreet structured files
and objects. Personal computers today work a very straightforward identification of
a file format by use of the filename extension. The reader may like to try an
experiment of renaming a file with a filename extension of a different file format
type (e.g. rename .docx file as .jpg file) and see that the system is largely unable to
cope with such a change (don't forget to rename it back!).
hoofdstuk 1
4 Metadata standards include PREMIS, METS, MARC, Dublin Core and more
5 OAIS - Open Archival Information System is an ISO standard model with six functional stages described in
ISO 14721:2012
48
martin wrigley, becky mcguinness, carl wilson the open preservation foundation
reference toolset
Validation
policies
Fix/
transform*
(redact...)
*Quality
Fix/
check
-<transform
derivative
(migrate
Thing
Meta
Thing
M
T+
Container
explosion
recursive
Characterisation
policies
Packaging
policies
Fix/
transform*
T
M
T+
Quality
cross check
policies
Periodic
re-check
Thing is (or is becoming) a Submission Information Package
49