Etymologically the notion of the library stems from the Greek word for the storage
site of "book" rolls (bibliotheké). It thus addresses one format and one dimension,
storage space rather than time (which is the case for the magnetic "roll": spools and
tapes). The library belongs to what Marshall McLuhan called The Gutenberg Galaxy
(1962), the age of the book and the printing press, as opposed to the modern media
age. Since Thomas Edison's invention of the phonograph culture has been enabled
to store audiovisual signals directly, physically real (the indexical according to
Peircean semiotics), bypassing the translation and abstraction into the symbolical
code of the alphabet.
Towards the end of the twentieth century, a radical extension of the traditional,
book-oriented task of national libraries took place. The Institut National
Audiovisuel in France receives reference copies of every audiovisual medium
produced in France since 1995. In Norway the legal deposit act has been extended;
it includes at least one copy of any information available to the public, regardless of
medium. This procedure is automated and is governed by a computer programme.
Suddenly, the institution of the library is thrown into the modern technological
media age. Algorithmic machines have automated a series of complex operations
like "harvesting" the national domain name websites. In terms of running time, the
National Library of Norway, after less than twenty years, now holds an entire 180
years of playing time of audiovisual material. The consequences for the aesthetics of
knowledge result in new methods of research like "Digital Humanities".
Search "within its own medium": Towards content-based audiovisual
retrieval
Very often legal paranoia (like the copyright mania) leads to progress in developing
technomathematical knowledge; mighty algorithmic tools have been developed for
fingerprinting of copyright identification, of locating metadata for media content
without metadata annotation. One of the most advanced mass applicable content
based search engines for audio data is firmly implemented in the iPod. Listening to a
song, the device can be directed to the sonic source with the menu option "Music is
being analysed", leading to an almost immediate recognition of the song and the
option for (paid) downloads of this very song.
The traditional way of audio or image retrieval used to be the manual annotation of
such media with text to create a text-based management system to perform the
retrieval. Such a literal transcription of audiovisual evidence into symbolic notation
is an asymmetrical transformation, reducing the richness of aesthetic signals to
verbal semantic. The alternative way (content-based retrieval systems according to
the MPEG-7 standards) is to retrieve audiovisual evidence in its own media (that is
the aisthetic regime): based on such analysis, it is possible to describe sound or
music by its spectral energy distribution, harmonic ratio or fundamental frequency"
(Kim, Moreau, Sikora, 2005, p. 2), allowing for a comparative classification of
sound categories.
Automatic systems that process audiovisual information allow for search engines to
query the descriptions stored in a database. Thus, it is possible to identify identical,
similar or dissimilar audiovisual content. As long as such low- or high-level
168
wolfgang ernst order by fluctuation? classical archives and their
audiovisual counterparts
descriptions are (automatically) extracted from the audiovisual record itself,
depicting the variation of properties of audiovisual signals over time or frequency,
it makes sense to call the resulting database an "archive" rather than a "library
catalogue".
The time domain description by the waveform represents a genuine option of
multimedia archives, media-archaeologically revealing characteristics of the original
audiovisual signal in its very aesthetic existence: the harmonicity of a signal, its tone
or image quality, down to discrete segments such as the pixel itself.
Such a very analytic iconic turn makes visual memory mathematically accessible;
search engines like QBIC allow for image-based image retrieval by similarity or query
by image content. A technical dispositive gains power over the human imaginary,
opposite to the classical, paper- and text-based archive as the realm of the symbolic.
By far the largest image collection, without saying, is the World Wide Web. In order
to efficiently retrieve pictorial data from this database, content-based methods are
an attractive alternative to the traditionally used method of manual textual indexing
Müller, Wallhoff, Eickeler, Rigoll, 1999, p. 12-1).
Classification by autocorrelation
Speaking to the archive does not achieve a real dialogue with the dead; what we hear
is rather the echo of our own voice. Computing now allows to subtract voices from
other sound sources by automatic subtraction (folding); "silence detection" itself
(the silence of archival space, its absence of voices) is a feature in the current
MPEG7 standard for multimedia, especially AudioPower.
The detection of voices is achieved by autocorrelation, i.e. the comparison of a signal
with itself when shifted on the time axis. The programming language SuperCollider
thereby allows the reconnaissance of periodic signals, which is, in other words,
phonetic language (where vowels and their formants represent harmonic signals, as
opposed to aperiodic consonants), and to separate this from non-harmonic
acoustics. To classify the sonosphere surrounding us automatically is a feature of
this new classification aesthetics. For the video area, feature extraction (as defined
by MPEG7 standard) is already at work, but practically not yet implemented - for
both epistemological reasons (the cultural lag of "archiving" practices) and for
technological difficulties.
Most of current shot transition detection focuses on detecting simple boundaries:
cuts. In most software tools for temporal video segmentation the time-evolving
media event is transformed by shot detection (key-frames) into static, storyboard
like spatial (rather than temporal) arrangement. Remarkably, "memorisation" here
is not based on identification (the identity of the positive image), but on the
"kinem", the image-difference, a Cartesian (and de Saussurean) aesthetics of
calculation.
Michel Foucault, in his analysis of The Order of Things, has elaborated on this
epistemological transition between the époque of similarity to the époque of
differences.
169
archives in liquid times