Book announcements

To post a book announcement, please email speechnewseds [at] listserv (dot) ieee [dot] org.

Statistical Language Models for Information Retrieval

ChengXiang Zhai
Synthesis Lectures on Human Language Technologies #1 (Morgan & Claypool Publishers), 2009, 141 pages

As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central research problem in information retrieval for several decades. In the past ten years, a new generation of retrieval models, often referred to as statistical language models, has been successfully applied to solve many different information retrieval problems. Compared with the traditional models such as the vector space model, these new models have a more sound statistical foundation and can leverage statistical estimation to optimize retrieval parameters. They can also be more easily adapted to model non-traditional and complex retrieval problems. Empirically, they tend to achieve comparable or better performance than a traditional model with less effort on parameter tuning. This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. All the relevant literature has been synthesized to make it easy for a reader to digest the research progress achieved so far and see the frontier of research in this area. The book also offers practitioners an informative introduction to a set of practically useful language models that can effectively solve a variety of retrieval problems. No prior knowledge about information retrieval is required, but some basic knowledge about probability and statistics would be useful for fully digesting all the details.

Table of Contents: Introduction / Overview of Information Retrieval Models / Simple Query Likelihood Retrieval Model / Complex Query Likelihood Model / Probabilistic Distance Retrieval Model / Language Models for Special Retrieval Tasks / Language Models for Latent Topic Analysis / Conclusions

http://dx.doi.org/10.2200/S00158ED1V01Y200811HLT001

This title is available online without charge to members of institutions that that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$40 or local currency equivalent.

Dependency Parsing

Sandra Kübler, Ryan McDonald, and Joakim Nivre
Synthesis Lectures on Human Language Technologies #2 (Morgan & Claypool Publishers), 2009, 127 pages

Dependency-based methods for syntactic parsing have become increasingly popular in natural language processing in recent years. This book gives a thorough introduction to the methods that are most widely used today. After an introduction to dependency grammar and dependency parsing, followed by a formal characterization of the dependency parsing problem, the book surveys the three major classes of parsing models that are in current use: transition-based, graph- based, and grammar-based models. It continues with a chapter on evaluation and one on the comparison of different methods, and it closes with a few words on current trends and future prospects of dependency parsing. The book presupposes a knowledge of basic concepts in linguistics and computer science, as well as some knowledge of parsing methods for constituency-based representations.

Table of Contents: Introduction / Dependency Parsing / Transition- Based Parsing / Graph-Based Parsing / Grammar-Based Parsing / Evaluation / Comparison / Final Thoughts

http://dx.doi.org/10.2200/S00169ED1V01Y200901HLT002

This title is available online without charge to members of institutions that that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$40 or local currency equivalent.

Introduction to Linguistic Annotation and Text Analytics

Graham Wilcock
Synthesis Lectures on Human Language Technologies #3 (Morgan & Claypool Publishers), 2009, 159 pages

Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML, with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures, GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at http://sites.morganclaypool.com/wilcock.

Table of Contents: Working with XML / Linguistic Annotation / Using Statistical NLP Tools / Annotation Interchange / Annotation Architectures / Text Analytics

http://dx.doi.org/10.2200/S00194ED1V01Y200905HLT003

This title is available online without charge to members of institutions that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$40 or local currency equivalent.

Introduction to Chinese Natural Language Processing

Kam-Fai Wong, Wenji Li, Ruifeng Xu, Zheng-sheng Zhang
Synthesis Lectures on Human Language Technologies #4 (Morgan & Claypool Publishers), 2009, 148 pages

This book introduces Chinese language-processing issues and techniques to readers who already have a basic background in natural language processing (NLP). Since the major difference between Chinese and Western languages is at the word level, the book primarily focuses on Chinese morphological analysis and introduces the concept, structure, and interword semantics of Chinese words.

The following topics are covered: a general introduction to Chinese NLP; Chinese characters, morphemes, and words and the characteristics of Chinese words that have to be considered in NLP applications; Chinese word segmentation; unknown word detection; word meaning and Chinese linguistic resources; interword semantics based on word collocation and NLP techniques for collocation extraction.

Table of Contents: Introduction / Words in Chinese / Challenges in Chinese Morphological Processing / Chinese Word Segmentation / Unknown Word Identification / Word Meaning / Chinese Collocations / Automatic Chinese Collocation Extraction / Appendix / References / Author Biographies

http://dx.doi.org/10.2200/S00211ED1V01Y200909HLT004

This title is available online without charge to members of institutions that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$40 or local currency equivalent.

Spoken Dialogue Systems

Kristiina Jokinen and Michael McTear
Synthesis Lectures on Human Language Technologies #5 (Morgan & Claypool Publishers), 2009, 151 pages

Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human-machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the system's communicative competence by including aspects of error correction, cooperation, multimodality, and adaptation in context.

This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems. It provides an overview of the basic issues such as system architectures, various dialogue management methods, system evaluation, and also surveys advanced topics concerning extensions of the basic model to more conversational setups.

The goal of the book is to provide an introduction to the methods, problems, and solutions that are used in dialogue system development and evaluation. It presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research.

Table of Contents: Preface / Introduction to Spoken Dialogue Systems / Dialogue Management / Error Handling / Case Studies: Advanced Approaches to Dialogue Management / Advanced Issues / Methodologies and Practices of Evaluation / Future Directions / References / Author Biographies

http://dx.doi.org/10.2200/S00204ED1V01Y200910HLT005

This title is available online without charge to members of institutions that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$40 or local currency equivalent.

Semantic Domains in Computational Linguistics

Alfio Gliozzo, Carlo Strapparava
Springer, 2009, IX, 131 p., Hardcover. ISBN: 978-3-540-68156-4

Semantic fields are lexically coherent - the words they contain co-occur in texts. In this book the authors introduce and define semantic domains, a computational model for lexical semantics inspired by the theory of semantic fields. Semantic domains allow us to exploit domain features for texts, terms and concepts, and they can significantly boost the performance of natural-language processing systems. Semantic domains can be derived from existing lexical resources or can be acquired from corpora in an unsupervised manner. They also have the property of interlinguality, and they can be used to relate terms in different languages in multilingual application scenarios. The authors give a comprehensive explanation of the computational model, with detailed chapters on semantic domains, domain models, and applications of the technique in text categorization, word sense disambiguation, and cross-language text categorization. This book is suitable for researchers and graduate students in computational linguistics.

http://www.springer.com/linguistics/computational+linguistics/book/978-3-540-68156-4

A resource-light approach to morpho-syntactic tagging

Anna Feldman and Jirka Hana
Amsterdam/New York, NY 2010. XIV, 185 pp. (Language and Computers 70). ISBN: 978-90-420-2768-8 (Bound). ISBN: 978-90-420-2769-5 (E-Book).

While supervised corpus-based methods are highly accurate for different NLP tasks, including morphological tagging, they are difficult to port to other languages because they require resources that are expensive to create. As a result, many languages have no realistic prospect for morpho-syntactic annotation in the foreseeable future. The method presented in this book aims to overcome this problem by significantly limiting the necessary data and instead extrapolating the relevant information from another, related language. The approach has been tested on Catalan, Portuguese, and Russian. Although these languages are only relatively resource-poor, the same method can be in principle applied to any inflected language, as long as there is an annotated corpus of a related language available. Time needed for adjusting the system to a new language constitutes a fraction of the time needed for systems with extensive, manually created resources: days instead of years.

This book touches upon a number of topics: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and Natural Language Processing (NLP). Researchers and students who are interested in these scientific areas as well as in cross-lingual studies and applications will greatly benefit from this work. Scholars and practitioners in computer science and linguistics are the prospective readers of this book.

Contents: List of tables / List of figures / Preface / Introduction / Common tagging techniques / Previous resource-light approaches to NLP / Languages, corpora and tagsets / Quantifying language properties / Resource-light morphological analysis / Cross-language morphological tagging / Summary and further work / Bibliography / Appendices: Tagsets we use; Corpora; Language properties / Citation Index

Online info: http://www.rodopi.nl/senj.asp?BookId=LC+70


Add A Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

Code in the picture:
Title:
Your Name(*):
Email:
Notify me of any further comments to this thread:
Website:
Comment(*):