Jaarboeken Stichting Archiefpublicaties | 1 januari 2005 | pagina 55 - Periodiekviewer Koninklijke Vereniging van Archivarissen

Recognition CATALOGUS characteristics, and would therefore be applicable for rapid subtitling of news broadcasts, but not for general video speech understanding. In the context of audio access, the main technology of interest is speech transcription. In principle, transcription technology detects which words were spoken in what order and at what point in time. Because of the time information, transcripts are the basis for generating a time-coded index, and therefore provide a good basis for spoken document retrieval: the search of audio or video fragments on the basis of the spoken content [Renals, 2005], Figure 6. ketch of the flow in querying by audio example. detect a word detect a word text features text features similarity [interact] [interact] audio collection query feedback interaction feature files audio files The models applied in speech transcription have to capture various aspects: recurring variations in the acoustics of speech, the set of sounds for a specific language, the combinations of sounds (syllables, words), and the possible combinations of words. The latter requires large amounts of textual training data and, as a consequence, the volume of the available sets determines the success of the statistical language models. The more variation that is absorbed in the model, the better can the proper word combinations be sieved out of all candidate word combinations suggested by the acoustic models. Current focus in the development of transcription technology is on tuning the existing methods to more difficult domains and conditions, such as spontaneous speech, non-native speakers, and spoken content that is less dense than news. Another ingredient for content-based search is machine learning and the in promptu version of it: interaction. Interaction has absorbed user relevance feedback, interactive visualization of the results of a query, and adaptable 106 ARNOLD W.M SMEULDERS, FRANCISKA DE JONG AND MARCEL WORRING MULTIMEDIA INFORMATION TECHNOLOGY AND THE ANNOTATION OF VIDEO similarity measures [Worring 2001], yet a major advance in tools and machine power is required to benefit fully from the interaction. The application of machine learning techniques overcomes the incidental variations within a concept. A successful line of a machine learning concepts is to combine many weakly performing classifiers into stronger ones. All of these approaches have brought a substantial improvement in the capabilities of machine learners to recognize concepts. The situation is improving all the time in all the above respects, except in terms of the amount of data. More data demands more effort in annotation, until the point at which the data set gets so big that annotation is no longer feasible. Annotating thousands and eventually hundreds of thousands of pictures is hard to achieve. Where the machine power to do increasing numbers of computations is available, the manpower for annotation will become the bottleneck. In this paper, we make a distinction between visual information, audio information, and textual information. In this section we discuss recognition, defined as the unambiguous, context-free denotation of signs. In all practical circumstances, the visual representation A refers to the first letter in the alphabet, so A is recognized rather than interpreted. Bit stream Figure 7. feedback Visual information Visual stream print file Signs Text stream Audio information Audio stream speech file signs speech speech spotting sign spotting OCR Textual information may take a visual form when it is printed on paper or held in a pdf-file. It requires a computer function known under the generic name of optical character recognition, OCR, to convert the printed version of a text to a stream of characters. OCR is in wide-use, and is built in to many search programs, with the result that paper scans and texts in computer files are now easily accessible. Depending on the quality of the scan data, the quality of the method of the OCR program, and its ability to recognize the font of the text, OCR will deliver near-perfect results. However, a guarantee that all information 107

Vorige Volgende