a
Hi
-
To evaluate search performance, TRECVID has defined an interactive search task
based on 25 topics. Users were given 15 minutes to find as many relevant items
as possible. Typical examples of a search include people walking with their dogs,
congressman Henry Hyde, people moving a stretcher, Benjamin Netanyahu, and
moving bicycles. To determine the performance, for each search NIST considers
the precision and recall figures of the best 100 results returned by the system.
The precision is defined as the number of correct items divided by 100, and the
recall as the number of correct items divided by the total number of relevant
items. For a top ranking performance [Snoek 2004], in 15 minutes, an expert
user, combining keyword search and query by similarity with a set of 32
automatically detected high-level concepts, can yield the following scores:
Automatic video annotation is still a difficult problem, and varies between very
poor on some topics to reasonable on others. And, not all topics of this year's
competition may be equally relevant in practice, but the progress made each year
is considerable. Even poor quality descriptors help in automatic annotation, and
they will improve through learning from larger data sets. When automated
analysis is combined with interaction, a useful new search paradigm will emerge.
In this paper we have indicated where progress is to be expected in automated
analysis, and which solutions are much further away. We have done so at the
risk of being ridiculed by our fellow researchers for painting a too simplistic
view. Nevertheless, as is always the case at the frontiers of technology, you often
gets answers to the questions that you haven't asked. The answer is more
complicated than desirable, but this is inevitable as the leading edge of progress
follows its own internal logic.
Nevertheless, we hope we have been able to whet your appetite for the future
that computer-aided annotation will bring. We look forward to communicating
with you on where our vision of the modern archive needs amendment.
References
CATALOGUS
I
Figure 11. The interface for the interactive retrieval system [Snoek 2004]. The lefthand
screen is used to define a query based on keywords and concepts. Results are presented on
the righthand screen and can be used as visual examples in query by example.
Topic
precision
Recall
people walking with their dogs
28%
42%
tennis player contacting the ball
10%
19%
moving bicycles
41%
59%
Bill Clinton with at least part of a US flag visible
3 5%
36%
114
ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION
TECHNOLOGY AND THE ANNOTATION OF VIDEO
[Fergus 2003] R. Fergus, P. Perona, A. Zissermann, Object class recognition by unsupervised scale
invariant learning. Proc. CVPR 2003, IEEE Press.
[Dejong 2000] F.M.G. de Jong, J.-L. Gauvain, D. Hiemstra K. Netter, Language-Based
Multimedia Information Retrieval, in: Proceedings RIAO 2000: Content-Based Multimedia
Information Access (Paris, April 2000), ISBN 2-905450-07-X, 713-722.
[NIST] TREC Video retrieval evaluation, 2001-2004.
http://www-nlpir.nist.gov/projects/trecvid/
[Renals 2005] S. Renals, J. Goldman, F.M.G. dejong et. al., Accessing the spoken word, to
appear in: International Journal on Digital Libraries.
[Schmid 2002] K. Mikolajczyk, C. Schmid, Scale and affine invariant interest point detectors.
Nt. Journ. Comp. Vis 63 - 86, 2004.
[Snoek 2004] C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, and F.J. Seinstra,
The MediaMill TRECVID 2004 Semantic Video Search Engine, in: Proceedings of the 13th Text
Retrieval Conference (TREC), Gaithersburg, USA, November 2004.
[Smeulders 2000] Content-based image retrieval at the end of the early years, IEEE transactions
PAMI, 1349 - 1380, 2000.
[Trieschnigg 2005] D. Trieschnigg, W. Kraaij, Hierarchical topic detection in large digital news
archives, in Proceedings of the 5th Dutch Belgian Information Retrieval workshop (DIR), 2005.
[Wayne 2000] C. Wayne, Multilingual Topic Detection and Tracking: Successful Research
Enabled by Corpora and Evaluation, in: Proceedings of Language Resources and Evaluation
Conference (LREC), 1487-1494, 2000.
[Worring 2001] M. Worring, A. Bagdanov, J.C. Van Gemert, J.-M. Geusebroek,
Minh A. Hoang, T. Augustus, G. Schreiber, C.G.M. Snoek, J. Vendrig, J. Wielemaker,
A.W. M. Smeulders, Interactive Indexing and Retrieval of Multimedia Content, Proc. SOFSEM,
Springer-Verlag LNCS 2540, 135-148, 2002,
http://www.science.uva.nl /~mark/pub/2002/WorringSofSem02.pdf
115