They require a detailed description of the content of the message if they are to
meet the user's expectation of ready availability even when the context is
unknown or open-ended.
The word multimedia can also be read with the emphasis on media. It then
alludes to the multiplicity of channels by which we can deliver a message to the
public. Multimedia of the future encompasses both broadcasting and
narrowcasting. In fact, the success of television in covering a broad audience has
led to more channels. And, in turn, the multitude of channels has led to the
need for differentiation and narrow casting if a channel is to survive.
The point we want to make here is that whatever technological advances are
made in digital television and internet, more detailed meta-data and knowledge
of the target audience are needed to make it possible to match user profiles to
the meta-data of the archived content. So digital media will increase the need for
detailed meta-data.
The dominant type of information in information systems is still of the
numerical and coded type. These information systems are successful because
the message is directly encoded in the bit patterns. Hence, data processing is
equivalent to managing bit patterns. Multimedia information systems are
distinctly different. In this context, multimedia refers to visual information,
audio information, or textual information, whether or not in combination.
Multimedia information systems require elaborate information analysis of the
content. To the user, digital multimedia information is immediately available
(visible, audible, or readable), and most often also understandable - but not to
the machine. The discrepancy between the digital encoding and its semantic
interpretation is known as the semantic gap.
Because of the semantic gap, a completely automatic multimedia analysis cannot
be expected. One can wait for high quality and complete coverage before starting
to use automatic aids in annotation, but that will take quite a while. For a long
time yet, the performance of automatic annotation when measured against
manual annotation quality will appear to be abysmal at best. But in any case,
copying the manual annotation process is not the ultimate goal of a computer-
assisted search. And hence, manual annotation is not a good performance
indicator for machine annotation.
What does matter is whether it is possible to create an effective methodology in
which man and machine work together in an integrated way to successfully find
a target. In this paper, we review the state-of-the-art in multimedia analysis to
show how the latter can contribute to a process for the automatic annotation of
video content.
The challenge
The prime motivation for introducing automation in the generation of meta
data is that an all-digital recording process and post-process will enable faster
re-use. At the same time, the scope of re-use will be much broader than current
practice. And, computer networks will permit extension of the archive with
other virtual archives. This requires annotation of a much larger volume of data
as well as extending the number of topics to be covered, while at the same time
the anticipated response time decreases. In other words, the archive is under
pressure from all sides. Automatic analysis is an essential ingredient in meeting
present requirements.
We argue that automatic or computer-aided annotation cannot be seen as
separate from the work practices in which it will function. It has to be part of a
complete process of storing, enriching, and delivering multimedia information.
All of these elements will change when the archive becomes all-digital. For
digital archives, one cannot expect the flow of items into the archive, nor the
exchange of information in a search, to stay the same. The point to decide in a
multilateral view is what needs to be done to achieve a proper workflow around
the digital archive, or for that matter, an effective digital system around the
archivist.
CATALOGUS
iui!aiii&iu<iniocfliniiii[ii'ji
&IÜJÜ»l>j|l>IÜIÜJÜJLII 1(11(111
■LOlololoiaiü mui mini uL4.1l
iLÜIMHÜOiaiü IÜ1ÜJÜIÜLÜLGI
iLDIOIOIOIOIOMlOlOIMOLOl
LÜI-0IQIQI4UJ In in kul Ml-,M
HOlCLil»|i»:|iHDlDIDKiLtiL
4irKi|QIW[r.ilfllQIQ|i}lU|i)lQ
OlOIMDHHWIM 1DIDLHWI
ii nmieicioio in in in" iii him
lHIK4KHh IIHII11111 10-111 III III
11 i:imi'Ii:iii 11110|4 hi |0 Iii
HIMMKHILIII1111 IClOlOULHIl
(b)
96
ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION
TECHNOLOGY AND THE ANNOTATION OF VIDEO
Figure 1. The semantic gap. (a) Original image (b) A small part of an image as perceived
by a computer, (c) Display of a most simple approach to distinguish automatically the
foreground in the image from its lighter background as an essential step in the
interpretation of the image. Note that the result erroneously indicates that the windows
and the gutters are part of the background. Note also that the human eye easily restores
the proper segmentation result. Humans cannot help but identify a house in spite of the
distorted grey values and in spite of the fact they have never seen such a house before,
(d) An alternative approach shows the most clear contrast transitions in the image.
Where the house is relatively simple to identify by the contrast to the background, the
chimney in this example is hardly distinguishable as the foreground figure against the
house as the background. The semantic interpretation is easily made for humans whereas
it is incredibly hard to come up with a set of general rules (or computer programs) to
describe what makes a chimney, simply because the visual evidence is meager at best.
Semantic interpretation requires lifelong experience with meaning.
97