Knowledge can be acquired by formalization, but more success has been achieved by learning rules from large datasets. In effect, a general rule of machine learning is: the more specific, the larger, and the more reliable the datasets, the better the result. More importantly, when learning from realistic datasets, the result is also more robust, being able to cope with non-ideal circumstances. Modern natural language processing frequently uses techniques from the area of information retrieval to capture the content of the message. And, modern computer vision frequently uses machine learning techniques and statistical pattern recognition to understand the content of a scene. When designing real systems, a few aspects of the state of the art in system technology need to be considered. Proper choice of formats guarantees added value in the ease of exchange as well as in proper storage. Formats cast a long shadow into the future as new systems have to adapt to the old formats to be useful. Therefore, the selection of a new format has to be done with care (but even then the predictability which formats will become popular is limited). Databases are useful in not losing information while delivering optimal handling speed. Truly multimedia databases with integrated formal knowledge descriptors of multimedia are a hot topic of research. Computer-aided video archives demand enormous computing and storage capacity to handle a stream of video data. A text stream is relatively condensed in its semantic content, but learning facts from text streams requires large datasets, which in turn require large computing power. Analysis of the audio signal requires more power; but real time or near real time processing of the visual component is the most demanding. Computing power will continue to be an important consideration in practical video analysis for some time to come. The solution to the storage and computing capacity needed for archiving and learning lies in grid computing, internet based distributed processing power. Interaction is the key to the user and hence to the system. Interaction is still poorly developed. Interrogation encompasses solicitation of the search either by specification, browsing, analogy, or by question and answer. Any interaction requires carefully designed presentation of the result, which, in the case of video, requires various kinds of summarization since the screen offers only limited space. The interactive component of systems will be useful only when they become able to remember the preferred behavior as well as the preferred presentation in the interaction experience learned by the system from previous sessions. On the threshold of high-speed wireless technology, there is enough opportunity to insert the meta-data at the production site. Interacting with video archives Interaction is an essential ingredient in any video archival system. It can serve both the video archivist in annotating the wealth of information as well as the user accessing the archive. In the future these functions will merge, since a digital archive will eventually learn from the pattern of interaction of the users, as well as from user annotations of the data. To assist the archivist, the aim is to limit the time needed for the annotation work. The major assumption underlying tools for this purpose is that similar video content is likely to have the same annotation. Hence, after the archivist has provided some initial annotations, the system can provide collections of similar items that have a high probability of having the same annotation. By manually filtering out the small percentage of incorrectly labeled items, the archivist can completely annotate collections of items. This strategy for limiting annotation time is particularly suited for simple bulk annotations. An expert can perform more elaborate annotation better, one at a time. We turn to the information needs of the user. There are various types of exchange of information, leading to various types of query: Query from a controlled vocabulary In this query mode, the user inputs query terms from the controlled vocabulary used by the archivist for the annotation of the data. In this case, specification of the query should be aided by a visual representation of the meta-data model used in annotation. When multimedia analysis tools are employed to automatically index the video with a set of controlled terms from the meta-data model, this approach can still be followed, with the essential difference that, in the interac tion, both the system and the user should be aware that annotations have an associated probability of correctness. Query by keywords or descriptors It is impossible to foresee all possible annotations on which a user might query the archive. Hence the user should also have the possibility to query on the content of the archive directly. For text this is a simple comparison of the word the user has provided with the words in the document. This is still a feasible approach when the text in the archive is the result of speech recognition from the audio channel, but fuzzy matching techniques have to be used, since errors are frequently found in the speech recognition result. For audio and video data it is clear that one will not query for a specific set of sample or pixel values, as they don't make sense to the user. Descriptors of the data are required, which summarize and emphasize specific characteristics. It is difficult to decide what these should be if the purpose is not known beforehand. Hence, query by descriptors is often limited to rather general descriptors such as pitch value or average volume for audio, and color texture and motion distributions for video. Query by full text, full audio, or full visual examples Keywords or descriptors entered by the user provide the system with only limited information. Only in context can such queries lead to the desired information. The computer does not understand the context by itself, nor does it have experience unless programmed, nor does it have a good feel for purpose. Therefore, computer search profits from more information in the query. One way to achieve this is by giving examples of similar items. So, when the query is an item of full text, computer retrieval has a better chance to be on target. Similarly, several pictures should be presented in a query rather than just one. And it is best in computer search to include counter examples, as they help to convey the intentions of the user much better than just positive examples. CATALOGUS 100 ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION TECHNOLOGY AND THE ANNOTATION OF VIDEO 101

Periodiekviewer Koninklijke Vereniging van Archivarissen

Jaarboeken Stichting Archiefpublicaties | 2005 | | pagina 52