Scanning the Special Issue of the Proceedings of the IEEE on Speech Information Processing

Douglas O'Shaughnessy, Li Deng, Haizhou Li

SLTC Newsletter, August 2013

In May 2013, Proceedings of the IEEE published a Special Issue (Vol. 101, No. 5) which is dedicated to Speech Information Processing: Theory and Applications. The Special Issue includes 10 Invited Papers, contributed by an international cadre of 26 technical leaders in this field. The electronic version is now available at IEEE Xplore.

It has been 13 years since we saw the last special issue on speech processing in the Proceedings, which focused on the lower level, signal processing aspects. In this Special Issue, authors pay special attention to the recent progress on the higher level, information processing aspects.

In the past decade, we have seen tremendous progress in speech information processing technology that has helped people gain access to information (e.g., voice-automated call centers and voice search) and overcome information overload (e.g., spoken document retrieval, speech understanding and speech translation). The research has been spurred by initiatives such as international benchmarking and standardization, and by increasingly fast and affordable computing facilities. The scope of speech information processing has gone beyond the basic techniques such as speech recognition, synthesis and dialogue and has been extended to semantic understanding, translation and ranked information retrieval. The industry has made significant headway in speech technology adoption as well, leveraging the recent advances to address the real-world problems.

This Special Issue includes tutorial style articles that have the breadth and depth that graduate students, researchers, scientists and engineers need to understand the research problems and to implement the specific algorithms. The articles also provide intensive literature review and future perspectives on the important aspects of speech information processing. Next we catch a glimpse of the Special Issue.

  1. D. O'Shaughnessy, Acoustic Analysis for Automatic Speech Recognition, Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1038-1053

    This paper presents the theory and practice for methods of speech analysis, as used for automatic speech recognition.

  2. E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, Conditional Random Fields in Speech, Audio, and Language Processing, Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1054-1075

    This paper provides a tutorial overview of conditional random fields -a discriminative sequence model - and their applications in audio, speech, and language processing.

  3. H. Hermansky, Multistream Recognition of Speech: Dealing With Unknown Unknowns, Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1076-1088

    Analysis of data on human auditory processing suggests machine recognition paradigm, in which parallel processing streams interact to deal with unexpected input signals.

  4. C-H. Lee and S. M. Siniscalchi, An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition, Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1089-1115

    This paper presents an integrated detection and verification approach to information extraction from speech that can be used for speech analysis, and recognition of speech, speakers, and languages.

  5. X. He and L. Deng, Speech-Centric Information Processing: An Optimization-Oriented Approach, Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1116-1135

    The authors present a statistical framework for the end-to-end system design where the interactions between automatic speech recognition and downstream text-based processing tasks are fully incorporated and design consistency established.

  6. H. Li, B. Ma, and K. A. Lee, Spoken Language Recognition: From Fundamentals to Practice Proceedings of the IEEE, Vol 101, No. 5, May 2013, pp.1136-1159

    This paper provides an introductory tutorial on the fundamentals and the state-of-the-art solutions to automatic spoken language recognition, from both phonological and computational perspectives. It also gives a comprehensive review of current trends and future research directions.

  7. S. Young, M. Gasic, B. Thomson, and J. D. William, POMDP-Based Statistical Spoken Dialog Systems: A Review, Proceedings of the IEEE, Vol. 101, No. 5, May 2013, pp.1160-1179

    This paper presents the theory and practice of belief tracking, policy optimization, parameter estimation, and fast learning.

  8. B. Zhou, Statistical Machine Translation for Speech: A Perspective on Structures, Learning, and Decoding, Proceedings of the IEEE, Vol. 101, No. 5, May 2013, pp.1180-1202

    The author takes a unique and unified perspective for key structure, learning, and decoding problems of statistical machine translation (SMT) models, noting connections and contrasts to automatic speech recognition (ASR), to help the understanding of SMT and catalyze tighter integration of ASR and SMT.

  9. S. Narayanan and P. G. Georgiou, Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language, Proceedings of the IEEE, Vol. 101, No. 5, May 2013, pp.1203-1233

    Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

  10. K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, Speech Synthesis Based on Hidden Markov Models, Proceedings of the IEEE, Vol. 101, No. 5, May 2013, pp.1234-1252

    This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis; it has great flexibility in changing speaker identities, emotions, and speaking styles.

    Douglas O'Shaughnessy is the Chair of the Speech and Language Processing Technical Committee.

    Li Deng is a Principal Researcher at Microsoft Research and Editor in Chief of IEEE Trans. Audio, Speech, and Language Processing.

    Haizhou Li is the Head of the Department of Human Language Technology at Institute for Infocomm Research, Singapore. He is a Board Member of International Speech Communication Association.