Odyssey presentations indexed in Brno University of Technology's superlectures.com

Josef Zizka, Igor Szoke and Honza Cernocky

SLTC Newsletter, February 2011

Superlectures.com is an innovative lecture video portal that enables users to search for spoken content. This brings a significant speed-up in accessing lecture video recordings. The aim of this portal is to make video content easily searchable as any textual document. The speech processing system automatically recognizes and indexes Czech and English spoken words.

The main features of the Brno University of Technology (BUT) lecture browser are:

  • Intuitive web-based interface
  • Search in speech
  • Automatically generated transcriptions, subtitles
  • Synchronization of slides
  • Various components (slides, speech transcript, links, related recordings, etc.)
  • Video streaming

Search in speech

The BUT lecture browser enables users to search for what was spoken during all lectures or the search can be restricted to a specific one. If the search is performed globally, a list of talks matching the search query is shown. Then, the user can select a list of results for each talk. The results are displayed along with their confidence scores where those with the highest confidence are shown first. The user can begin playing the video from any result. In the playback mode, the words are highlighted as they are spoken. The results are accompanied by the transcript of the surrounding segment and shown on the timeline to help the user navigate around.

The transcripts are generated using the BUT speech recognition system. The system architecture design was a contribution of the members of the BUT Speech@FIT group, as well as the training of the ASR language models, while the recognition software is based on BS-CORE library produced in cooperation of BUT and its spin-off Phonexia. The indexing and search system is built on Apache Lucene.

Odyssey2010

In summer 2010, Odyssey: The speaker and language recognition workshop took place at the Faculty of Information Technology, Brno University of Technology. All talks were recorded using a fixed camera facing the projection screen and positioned in the middle of the lecture room. This set-up ensured that the recorded video included both a lecturer and projected slides. Each speaker was asked to fill a consent form in which he/she confirmed what can be done with the recordings. Most of them agreed to make them publicly available. However, some restricted the availability of their recordings to the workshop attendees and some did not want to make them public at all.

As the language used during the conference usually contains a lot of technical terms, the recognition vocabulary had to be extended to include new words. They were extracted mainly from various scientific papers in the field of speech processing and from the slides presented. Special attention was given to pronunciation of various abbreviations, such as JFA, GMM, etc.


Search results page with hits for "MLLR system"

Data processing

The lecture browser works with preprocessed data. As soon as a video recording is available, a sequence of scripts prepares data for the lecture browser. First, the audio track is extracted from the video recording. Then, it is normalized, converted into a suitable format and merged back with the video. After that, the video recording is converted into Flash video format and, afterwards, image thumbnails and an MP3 file are created. The audio track is processed using our speech processing system. Finally, files with transcription, subtitles and other information are generated.

Lectures, particularly in the academic environment, are mostly based on the presentation of slides. Identifying when a particular slide was presented in a video is an important cue for navigating in the recordings and necessary for providing the user with a high-quality version of the projected slide. If the PPT or PDF file is available, it can be automatically synchronized with the video recording.


Lecture page with transcription and synchronized slides

Conclusion

A possibility to search in video can dramatically help the users to navigate the recordings. Using a web-based interface, the lecture browser runs on many computers without a need to install any special software. The demo version that contains recordings of the Odyssey workshop is available at: http://www.superlectures.com/odyssey. This website also includes more information on the BUT lecture browser.

Primarily, the lecture browser was developed to help students prepare for their final examinations at our faculty, however, there are plenty of other cases where the BUT lecture browser can be of a great help. We will be happy to hear about them.

Josef Zizka is staff member of BUT Speech@FIT group. He is responsible for superlectures.com system development and user interface. Email: zizkaj@fit.vutbr.cz

Igor Szoke is researcher at BUT Speech@FIT group. He is responsible for BUT's keyword spotting and spoken term detection technologies. Email: szoke@fit.vutbr.cz

Honza Cernocky is Head of Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, and managing head of BUT Speech@FIT group. Email: cernocky@fit.vutbr.cz