Some thoughts on Language, Dialect and Accents in Speech and Language Technology

Martin Russell, Abualsoud Hanani and Michael Carey

SLTC Newsletter, February 2011

Accent and Dialect in Speech and Language Technology

-->

In recent years the topics of 'accent' and 'dialect' have become common in speech and language technology research. A search of the Interspeech 2010 proceedings for the word 'dialect' returns 74 references, of which 64 refer to some extent to work concerned with variations caused by dialect or accent (this is approximately 8% of the total number of papers presented at Interspeech). Of these, 40% are concerned with speech science, referring to 'dialect' in the contexts of 18 different languages, and 60% with technology. In speech technology the most common references are to dialect as a source of variability in speech recognition, however five papers address the problem of dialect recognition directly. There is some ambiguity in the speech technology literature between the terms 'dialect' and 'accent' and some authors also use the term 'variety' (for example, Koller, Abad, Trancoso and Viana discuss varieties of Portuguese in [1]). In British English 'accent' normally refers to systematic variations in pronunciation associated with particular geographic regions, while 'dialect' also includes the use of characteristic words in those regions. So for example, when a speaker from Yorkshire in the North of England pronounces "bath" to rhyme with "cat" rather than with "cart" they are exhibiting a Yorkshire (or at least northern English) accent, but when they use the word "lug" to mean "ear" or "flag" to mean "paving stone" these are examples of Yorkshire dialect [2]. These issues are discussed in more depth in the three volumes by Wells, probably the best known works on the accents of English [3].

In these terms, most speech technology is concerned with accent. A search for the word 'accent' returns 1771 instances in 117 documents in the Interspeech 2010 proceedings, but of course, accent certainly has more than one meaning in the context of speech and language science.

Early work on accent recognition follows the well-known characterisation of a language as a "dialect with an army and a navy" [4], with researchers typically using GMM-SVM (Gaussian Mixture Model / Support Vector Machine) and PRLM (Phone Recognition – Language Modelling) methods from Language Identification. However, some recent research has exploited specific properties of accents. In [5] Biadsy, Hirschberg and Collins use the fact that, at least to a first approximation, accents share the same phone set, but the realisation of these phones may differ. They build phone-dependent GMMs which in turn are used to create 'supervectors' for classification using an SVM, and report improved performance compared with a conventional GMM-SVM-based language recognition system. Huckvale [6] takes this a step further with his ACCDIST measure, by exploiting the fact that British English accents can be characterised by the similarities and differences between the realisations of vowels in specific words (for example, from the previous paragraph, for a northern English accent the 'distance' will be small between the vowels in "bath" and "cat" but large between the vowels in "bath" and "cart", whereas for a southern English accent the opposite will be the case). Huckvale reports an accent recognition accuracy of 92.3% on the 14 accent Accents of the British Isles speech corpus [7]

In more recent work in our laboratory we have applied techniques from language identification to the problem of discriminating between speakers from different groups from within the same accent, specifically between second-generation Asian and white speakers born in Birmingham. We achieved a recognition accuracy of 96.51% using 40s of test data with a Language Identification system that fuses the outputs of several acoustic and phonotactic systems. This is much better than we expected and compares well with the 90.24% achieved by human listeners [8].

The fact that it is possible to decide automatically which accent group, or even which social group within a particular accent group, an individual belongs to, and to achieve this using as little as 40s of data, has interesting implications for automatic speech recognition. First, it confirms that there are significant acoustic and phonotactic differences even within a "homogeneous" accent group. Second, it shows that these differences are sufficiently large be detected automatically. Hence it may be possible to identify suitable acoustic, lexical and even grammatical models automatically for rapid adaptation. It will also be interesting to see if the ideas that are developing in the context of dialect and accent recognition can be 'pulled back' to achieve improved results in Language Identification.

References

  1. Oscar Koller, Alberto Abad, Isabel Trancoso and Ceu Viana, "Exploiting variety-dependent Phones in Portuguese Variety Identification applied to Broadcast News Transcription" Proc. Interspeech 2010, pp 749752.
  2. The Yorshire dialect website
  3. J. C. Wells, "Accents of English", volumes 1, 2 and 3, Cambridge University Press, 1982.
  4. "A language is just a dialect with an army and navy" - wikipedia reference
  5. Fadi Biadsy, Julia Hirschberg, Michael Collins, "Dialect Recognition Using a Phone-GMM-Supervector-Based SVM Kernel", Pro. Interspeech 2010, pp 753-756.
  6. Mark Huckvale, "ACCDIST: An Accent Similarity Metric for Accent Recognition and Diagnosis", in Speaker Classification II, Lecture Notes in Computer Science, 2007, Volume 4441, 2007
  7. ABI website
  8. Abualsoud Hanani, Martin Russell and Michael Carey, "Speech-Based Identification of Social Groups in a Single Accent of British English by Humans and Computers", to appear in Proc. IEEE ICASSP 2011

Martin Russell is a Professor, Michael Carey an Honorary Professor and Abualsoud a Research Student in the School of Electronic, Electrical and Computer Engineering at the University of Birmingham, UK. Email: {m.j.russell@bham.ac.uk, m.carey@bham.ac.uk, aah648@bham.ac.uk