They require a detailed description of the content of the message if they are to meet the user's expectation of ready availability even when the context is unknown or open-ended. The word multimedia can also be read with the emphasis on media. It then alludes to the multiplicity of channels by which we can deliver a message to the public. Multimedia of the future encompasses both broadcasting and narrowcasting. In fact, the success of television in covering a broad audience has led to more channels. And, in turn, the multitude of channels has led to the need for differentiation and narrow casting if a channel is to survive. The point we want to make here is that whatever technological advances are made in digital television and internet, more detailed meta-data and knowledge of the target audience are needed to make it possible to match user profiles to the meta-data of the archived content. So digital media will increase the need for detailed meta-data. The dominant type of information in information systems is still of the numerical and coded type. These information systems are successful because the message is directly encoded in the bit patterns. Hence, data processing is equivalent to managing bit patterns. Multimedia information systems are distinctly different. In this context, multimedia refers to visual information, audio information, or textual information, whether or not in combination. Multimedia information systems require elaborate information analysis of the content. To the user, digital multimedia information is immediately available (visible, audible, or readable), and most often also understandable - but not to the machine. The discrepancy between the digital encoding and its semantic interpretation is known as the semantic gap. Because of the semantic gap, a completely automatic multimedia analysis cannot be expected. One can wait for high quality and complete coverage before starting to use automatic aids in annotation, but that will take quite a while. For a long time yet, the performance of automatic annotation when measured against manual annotation quality will appear to be abysmal at best. But in any case, copying the manual annotation process is not the ultimate goal of a computer- assisted search. And hence, manual annotation is not a good performance indicator for machine annotation. What does matter is whether it is possible to create an effective methodology in which man and machine work together in an integrated way to successfully find a target. In this paper, we review the state-of-the-art in multimedia analysis to show how the latter can contribute to a process for the automatic annotation of video content. The challenge The prime motivation for introducing automation in the generation of meta data is that an all-digital recording process and post-process will enable faster re-use. At the same time, the scope of re-use will be much broader than current practice. And, computer networks will permit extension of the archive with other virtual archives. This requires annotation of a much larger volume of data as well as extending the number of topics to be covered, while at the same time the anticipated response time decreases. In other words, the archive is under pressure from all sides. Automatic analysis is an essential ingredient in meeting present requirements. We argue that automatic or computer-aided annotation cannot be seen as separate from the work practices in which it will function. It has to be part of a complete process of storing, enriching, and delivering multimedia information. All of these elements will change when the archive becomes all-digital. For digital archives, one cannot expect the flow of items into the archive, nor the exchange of information in a search, to stay the same. The point to decide in a multilateral view is what needs to be done to achieve a proper workflow around the digital archive, or for that matter, an effective digital system around the archivist. CATALOGUS iui!aiii&iu<iniocfliniiii[ii'ji &IÜJÜ»l>j|l>IÜIÜJÜJLII 1(11(111 ■LOlololoiaiü mui mini uL4.1l iLÜIMHÜOiaiü IÜ1ÜJÜIÜLÜLGI iLDIOIOIOIOIOMlOlOIMOLOl LÜI-0IQIQI4UJ In in kul Ml-,M HOlCLil»|i»:|iHDlDIDKiLtiL 4irKi|QIW[r.ilfllQIQ|i}lU|i)lQ OlOIMDHHWIM 1DIDLHWI ii nmieicioio in in in" iii him lHIK4KHh IIHII11111 10-111 III III 11 i:imi'Ii:iii 11110|4 hi |0 Iii HIMMKHILIII1111 IClOlOULHIl (b) 96 ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION TECHNOLOGY AND THE ANNOTATION OF VIDEO Figure 1. The semantic gap. (a) Original image (b) A small part of an image as perceived by a computer, (c) Display of a most simple approach to distinguish automatically the foreground in the image from its lighter background as an essential step in the interpretation of the image. Note that the result erroneously indicates that the windows and the gutters are part of the background. Note also that the human eye easily restores the proper segmentation result. Humans cannot help but identify a house in spite of the distorted grey values and in spite of the fact they have never seen such a house before, (d) An alternative approach shows the most clear contrast transitions in the image. Where the house is relatively simple to identify by the contrast to the background, the chimney in this example is hardly distinguishable as the foreground figure against the house as the background. The semantic interpretation is easily made for humans whereas it is incredibly hard to come up with a set of general rules (or computer programs) to describe what makes a chimney, simply because the visual evidence is meager at best. Semantic interpretation requires lifelong experience with meaning. 97

Periodiekviewer Koninklijke Vereniging van Archivarissen

Jaarboeken Stichting Archiefpublicaties | 2005 | | pagina 50