Jaarboeken Stichting Archiefpublicaties | 1 januari 2005 | pagina 57 - Periodiekviewer Koninklijke Vereniging van Archivarissen

accessible through the Internet. An example of a very narrow domain is a logo recorded by scanning a document: the view is frontal and the illumination is perfect. When searching for logos in a general video, for example to record the exposure time of the Coca Cola logo during the Super Bowl, the domain is no longer narrow. The image of the logo is distorted by a skew viewing angle, partially occluded from sight, with changing illumination and in shadow, and with varying magnification. So the repertoire of images admissible as countable Coca Cola logos is magnified enormously. At least 100 easily detectable viewing angles, a similar number of realistic illumination patterns, 1000 different ways to occlude the logo and still recognize it, and 10 different magnifications, yielding some millions of views of one well-defined and simple object. In general, in automatic analysis, the chances of success are better in systems working in narrower domains. Consider the following list of narrow versus broad visual domains. The distinction between broad versus narrow domains also exists for speech recognition tasks. Consider the following examples: As is well known among archivists, the reduction of a video to keywords and key features implies a severe information reduction in the message, implemented at a time when the archival codes had to be small. This is the key-word or key- feature funnel. In computerized systems there is no real need to go for the minimal set of features. In the absence of an automatic understanding of context, larger sets of features will carry information about the context, which is implicit in manual search. In the same way, the recognition of similarities is almost automatic to humans. For computers, however, similarity is a mystery until it is fully specified. In fact, similarity is a complex notion requiring detailed analysis. A few major differences in similarity are indicated in the following table. The degree and measure of similarity is an essential part of the query definition. CATALOGUS Trademark detection in letters standard camera, standard illumination recognition success rate: reasonable Station identification in video (edits) standard camera, noisy background recognition success rate: good Trademark search in stadium skew view, shadow, occlusion, fixed objects recognition success rate: state of the art Face detection frontal view, well-determined object class recognition success: good depends on pose. VIP identification well-lit conditions, skew view, abundant data of widely varying class; hard problem. Face identification any condition, very large class minute visual differences among the members of the class: extremely hard problem. Object retrieval (this train) any recording condition, relatively narrow class, success depends on learned properties, state of the art. Object class retrieval (a train) for most object classes poorly defined: a broad class. Poor detectors, useful when combined with other ones. Figure 8. Topics that are difficult and those that are no longer difficult at the current state of the art of computer vision 110 ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION TECHNOLOGY AND THE ANNOTATION OF VIDEO Speaker identification feasible with studio quality, prepared speech, known acoustic profile of speaker, quiet background, standardized intonation Speaker recognition requires classification of acoustic profile and language use; allows speaker tracking; Large vocabulary recognition requires language models with broad lexical coverage; poorly-defined background Recognition of read vs. possibly overlapping speech makes recognition hard spontaneous speech Speaker independent recognition unknown speakers; training for acoustic profiles not feasible Distorted voice dialects, non-native speakers, covert speech Music detection vs. classification complex rhythms, quiet background Figure 9. At the current state of the art in audio processing, what is relatively easy to process and what is not. literal similarity literal perceptual similarity nearly identical appearance same station logo same painting object subject similarity same person picture same story similar regardless appearance Bill Clinton High jacking flight 203 genre the same subgenre the same genre same class soccer, weather, dialogue sports, game show semantically similar the same logical unit the same topic identical meaning anchor presents highlights politicians discuss Figure 10. Types of similarity important in computer-aided search. Ill

Vorige Volgende