The following aspects of video archiving environments will inevitably change as
the result of moving to an all-digital environment.
First, the widening horizon of the archive induces a perceived loss of accuracy.
In the foreground, a larger part of the archive is more readily available. In the
background, resulting from the increased connectivity in the world provided by
internet, conflicts between the archival codes of archives that developed in
isolation (e.g. thesauri, ontologies) will demand conversion and merging of
coding systems. This will inevitably induce a perception of loss of accuracy in the
user confronted with code systems different from those to which he or she is
accustomed.
Second, where one was already used to heterogeneity in data sources, computer-
aided search will emphasize variety and integration of data types. The target
content can be distinguished by type: visual (stills, photographs, graphics, logo),
audio (speech, music, noise), and text (scripts, summaries, transcripts, reviews,
letters, instructions, literature), and combined versions of these. As almost all
subtypes can be combined with one another, the list of integrated information
objects to be analyzed is beyond complete formalization, but will require
awareness.
Third, computer-aided systems will stimulate differentiation in search patterns.
The new search facilities will be well outside the paradigm of key-word search.
Interactive systems will allow faster response, leading to an earlier transition
from well-posed questions to more open-ended browsing by the user. In addition
to precise target search, the user will frequently conduct an open-ended browsing
search and use different kinds of interaction and presentation techniques to
view the result. Searching through larger and more heterogeneous, possibly
remote, archives requires different search patterns including the acceptance of
working with different code systems. Archives will be under pressure to provide
better performance, however abstract the initial formulation of the search.
Fourth, computer-aided archival systems will put pressure on the user's
expectation. As we argue later on, there is no realistic possibility of achieving the
same completeness and accuracy in the automatic annotation of an archive as in
the manually generated counterpart. But that is precisely what the user will
expect, since all information is 'in the computer'. The question is what to do
with that expectation: to combat it, to compromise on accuracy, or to accept
automation only when it delivers the same quality. The practice of use will
change, and hence inevitably the practice of archiving. The good news is that
there is still ample time to prepare for that change.
If the frustration of archivists and users is to be avoided, there are some
additional challenges ahead in the development of automatic systems:
1. There is the need to design and implement systems that fit a daily work
process.
Note: the fit with a daily process may seem to have no value, but
experience in many other information systems application areas has
frequently taught this lesson.
2. To that end, to deliver computerized archiving systems that are fast and
accurate in their retrieval results.
Note: for a computerized system, accuracy in the result is not necessarily
the same as accuracy in the annotation.
3. And, to do so with robust methods for automatic understanding of
non-ideal data.
Note: experience with early systems has led to considerable cynicism and
misunderstanding of the general applicability, due to the fact that
systems have been tested for only a small set of perfect data.
In our view, the practices of users and system designers, as well as of archivists,
will change considerably before effective systems are introduced.
Constituent elements of video archival systems
The interpretation of multimedia requires attention from a wide variety of
disciplines, currently usually operating separately. The analysis of visual
information is studied in the areas of image processing and computer vision.
The first of these has an emphasis on image in/image out processes, whereas
computer vision studies the interpretation of static or dynamic scenes. As well as
for speech recognition, audio signals are studied for music recognition. Natural
language processing aims to deliver an interpretation of the content of a text.
By the nature of the information it processes, natural language processing starts
from semantically meaningful units, namely words. So, it is no surprise that
understanding a multimedia object relies heavily on the success of the
interpretation of the linguistic elements, either written or spoken. The latter
requires detection of speech and conversion to text as an intermediate step, but
still lends itself better to understanding than the visual part. Visual information
is so rich in content and variety, even for one single object, that it appears
difficult to deal with it using automatic analysis. As a consequence of the
difference in progress in these fields, their practices are quite distinct. But
whereas they have grown in separation for twenty or thirty years, current
progress is fastest when based on interdisciplinary cooperation.
Automatic interpretation of visual, audio, or textual information is greatly
helped by detailed understanding of the content when the description is based
on ontologies or other formal domain descriptions. Automatic interpretation is
also supported if general background knowledge is available on things such as
word combinations, pronunciation, faces, shouts, and their admissible
variations, such as the morphological variants of words, the variety in visual
appearances, and the variations in the background.
CATALOGUS
98
ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION
TECHNOLOGY AND THE ANNOTATION OF VIDEO
mroftnalicn
visual
auU o
taxL
Figure 2.
TTiïïXïïaTs iiorlr ril
product an
carlcv:
Irscbon
interene I a lion
Knowledge lerrral
lea. Ting
prc&cnuhcn
ilftfriwjar on
cxpcncrvitL.
Systems cnrnpUt ng
fbmats
aartaüaxcs
irtemel
99