Book announcements
To post a book announcement, please email speechnewseds [at] listserv (dot) ieee [dot] org.
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
Joseph Keshet and Samy Bengio, Eds.
John Wiley & Sons, March 2009, 268 pp., Hardcover. ISBN: 978-0-470-69683-5
This is the first book dedicated to uniting research related to speech and speaker recognition based on the recent advances in large margin and kernel methods. The first part of the book presents theoretical and practical foundations of large margin and kernel methods, from support vector machines to large margin methods for structured learning. The second part of the book is dedicated to acoustic modeling of continuous speech recognizers, where the grounds for practical large margin sequence learning are set. The third part introduces large margin methods for discriminative language modeling. The last part of the book is dedicated to the application of keyword-spotting, speaker verification and spectral clustering.
The book is an important reference to researchers and practitioners in the field of modern speech and speaker recognition. The purpose of the book is twofold; first, to set the theoretical foundation of large margin and kernel methods relevant to speech recognition domain; second, to propose a practical guide on implementation of these methods to the speech recognition domain. The reader is presumed to have basic knowledge of large margin and kernel methods and of basic algorithms in speech and speaker recognition.
Contributors: Yasemin Altun, Francis Bach, Samy Bengio, Dan Chazan, Koby Crammer, Mark Gales, Yves Grandvalet, David Grangier, Michael I. Jordan, Joseph Keshet, Johnny Mariethoz, Brian Roark, Lawrence Saul, Fei Sha, Shai Shalev-Shwartz, Yoram Singer, Nathan Srebo.
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470696834.html
Theory and Applications of Digital Speech Processing
Lawrence R. Rabiner and Ronald W. Schafer
Pearson Publishers, 2010 (1042 pages)
Speech signal processing has been a dynamic and constantly developing field for more than 70 years. A steady stream of books have been written about speech processing, beginning with the classic texts of Fant (Acoustic Theory of Speech Production) and Flanagan (Speech Analysis, Synthesis and Perception) in the early 1970s and including the early textbook by Rabiner and Schafer (Digital Processing of Speech Signals).
The Rabiner and Schafer textbook was published in 1978 and, while it covered basic topics such as speech modeling, linear prediction, and cepstrum, entire areas of current interest were completely missing. This new book is intended to unify current theory and practice of digital speech processing and present the material in a new framework which we termed the "speech stack", namely a series of layers that build up knowledge and understanding of speech processing systems. The speech stack consists of the following four layers:
- fundamentals,
- digital representations of speech,
- speech algorithms and
- speech applications.
The fundamentals layer encompasses the basics of speech processing, including acoustics, linguistics, pragmatics, and speech perception. The second layer includes the theory behind time-domain, frequency-domain, cepstral-domain and linear prediction-domain speech representations. The third layer discusses algorithms for speech processing including methods for separating speech from background signals, methods for labeling regions of speech as being voiced or unvoiced speech, methods for estimating the pitch period (or pitch frequency) of a voiced speech region, and methods for estimating the resonances (formants) of the vocal tract during speech regions. The fourth and final layer discusses broad speech applications areas including coding, synthesis, recognition and understanding of speech.
The table of contents is:
- Introduction to Digital Speech Processing
- Review of Fundamentals of Digital Signal Processing
- Fundamentals of Human Speech Production
- Hearing, Auditory Models, and Speech Perception
- Sound Propagation in the Human Vocal Tract
- Time-Domain Methods for Speech Processing
- Frequency-Domain Representations
- The Cepstrum and Homomorphic Speech processing
- Linear Predictive Analysis of Speech Signals
- Algorithms for Estimating Speech Parameters
- Digital Coding of Speech Signals
- Frequency-Domain Coding of Speech and Audio
- Text-to-Speech Synthesis Methods
- Automatic Speech Recognition and Natural Language Understanding
The material in this book can be taught in a one-semester course in speech processing, assuming that students have taken a basic course on digital signal processing (DSP). To aid in the teaching process, each chapter contains a set of representative homework problems that are intended to reinforce the ideas discussed in each chapter. (A solutions guide is available from Pearson Publishers for instructors who use the book as a textbook for their courses.) Much of speech processing is, by its very nature, empirical; hence we have chosen to include a series of MATLAB exercises in each chapter (either within the text or as part of the set of homework problems) so as to reinforce the student’s understanding of the basic concepts of speech processing. Also provided on the course website (http://www.pearsonhighered.com/rabiner) are the required speech files, databases, and MATLAB code required to solve the MATLAB exercises, along with a series of demonstrations of a range of speech processing concepts.

