accessible through the Internet. An example of a very narrow domain is a logo
recorded by scanning a document: the view is frontal and the illumination is
perfect.
When searching for logos in a general video, for example to record the exposure
time of the Coca Cola logo during the Super Bowl, the domain is no longer
narrow. The image of the logo is distorted by a skew viewing angle, partially
occluded from sight, with changing illumination and in shadow, and with
varying magnification. So the repertoire of images admissible as countable Coca
Cola logos is magnified enormously. At least 100 easily detectable viewing
angles, a similar number of realistic illumination patterns, 1000 different ways
to occlude the logo and still recognize it, and 10 different magnifications,
yielding some millions of views of one well-defined and simple object. In general,
in automatic analysis, the chances of success are better in systems working in
narrower domains.
Consider the following list of narrow versus broad visual domains.
The distinction between broad versus narrow domains also exists for speech
recognition tasks. Consider the following examples:
As is well known among archivists, the reduction of a video to keywords and key
features implies a severe information reduction in the message, implemented at
a time when the archival codes had to be small. This is the key-word or key-
feature funnel. In computerized systems there is no real need to go for the
minimal set of features. In the absence of an automatic understanding of
context, larger sets of features will carry information about the context, which is
implicit in manual search.
In the same way, the recognition of similarities is almost automatic to humans.
For computers, however, similarity is a mystery until it is fully specified. In fact,
similarity is a complex notion requiring detailed analysis. A few major
differences in similarity are indicated in the following table. The degree and
measure of similarity is an essential part of the query definition.
CATALOGUS
Trademark detection in letters standard camera, standard illumination recognition
success rate: reasonable
Station identification in video (edits) standard camera, noisy background recognition
success rate: good
Trademark search in stadium skew view, shadow, occlusion, fixed objects
recognition success rate: state of the art
Face detection frontal view, well-determined object class recognition
success: good depends on pose.
VIP identification well-lit conditions, skew view, abundant data of
widely varying class; hard problem.
Face identification any condition, very large class minute visual
differences among the members of the class:
extremely hard problem.
Object retrieval (this train) any recording condition, relatively narrow class,
success depends on learned properties, state of the art.
Object class retrieval (a train) for most object classes poorly defined: a broad class.
Poor detectors, useful when combined with other ones.
Figure 8. Topics that are difficult and those that are no longer difficult at the current state of
the art of computer vision
110
ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION
TECHNOLOGY AND THE ANNOTATION OF VIDEO
Speaker identification
feasible with studio quality, prepared speech, known acoustic
profile of speaker, quiet background, standardized intonation
Speaker recognition
requires classification of acoustic profile and language use;
allows speaker tracking;
Large vocabulary recognition
requires language models with broad lexical coverage;
poorly-defined background
Recognition of read vs.
possibly overlapping speech makes recognition hard
spontaneous speech
Speaker independent recognition unknown speakers; training for acoustic profiles not feasible
Distorted voice
dialects, non-native speakers, covert speech
Music detection vs. classification complex rhythms, quiet background
Figure 9. At the current state of the art in audio processing, what is relatively easy to process
and what is not.
literal similarity
literal
perceptual similarity
nearly identical appearance
same station logo
same painting
object subject similarity
same person picture
same story
similar regardless appearance
Bill Clinton
High jacking flight 203
genre
the same subgenre
the same genre
same class
soccer, weather, dialogue
sports, game show
semantically similar
the same logical unit
the same topic
identical meaning
anchor presents highlights
politicians discuss
Figure 10. Types of similarity important in computer-aided search.
Ill