Hy Murveit, William Labov, Bruce Millar, and Wolfgang Hess talk to the Saras Intitute

Olga Pustovalova, Antonio Roque, and Tadesse Anberbir

SLTC Newsletter, September 2009

We continue the series of excerpts of interviews from the History of Speech and Language Technology Project. In these segments Hy Murveit, William Labov, Bruce Millar, and Wolfgang Hess discuss how they became involved with the field of speech and language technology.

Hy Murveit

Transcribed by Olga Pustovalova, Moldova State University, Moldova

Q: We are going to start off by asking a question that we have asked everybody, which is: how did you get into speech?

A: Well, I'm pretty sure I know the answer to that one: I was in Berkeley, and I got my Master's degree and my PhD in Berkeley. I got sent to Berkeley by Bell Labs, they generously paid for my Master's degree, and I was looking for a Master's project. And in Berkeley you have to do something to write your Master's thesis, and you have to figure out what that was. So basically you go from office to office, and talk to professors, and say, you know, 'what have you got?' I went to professor Messerschmitt's office, Dave Messerschmitt, who taught a lot of signal processing at Berkeley, probably still does, and I remember he just suggested speech recognition, and he had, I forget which book, it was some kind of cepstral analysis chapter. Dave is a great guy, and I am sure I would have done fine with Dave, but for whatever reason I kept on walking and also talked to Bob Brodersen who became my thesis advisor. So Dave sort of lit this idea of speech recognition in me, but with Brodersen I connected... I think it was I walked in his office, and there was a pile of two feet of paper on his desk, and the place was a mess, and I said, 'That's me! I would do that too.' (laughs)

So Bob was an IC guy, not really a speech recognition guy at all, but you know, you've got to build chips to do something. If the goal is to build integrated circuits, you've got to find an application for 'em, and he was looking for one in speech recognition, it was probably also in his mind. So that's what I wound up doing for my PhD, but first for my Master's, and so I connected with Bob, but I know it was Dave Messerschmitt who first planted the seed, I remember that vividly... Nelson Morgan from ICSI -- he was also, by the way, a graduate student at the same time -- and he says the same thing about how he saw the mess in Bob's office and said, 'This must be the advisor for me,' too. I don't know if Bob is that messy anymore, and I know Morgan is not that messy anymore, but I still am.

Q: That's great. And why don't you just tell us a little bit about what you did from there... and then how you proceeded after you got your doctorate.

A: Bob's focus in these days was Special Purpose Integrated Circuits... the focus was if you make special purpose integrated circuits they could outperform computers of those days, and perhaps be done cheaper, and the idea would be to push this kind of intelligence to the periphery and have all sorts of smart things and devices that could go around. So the goal was to make a speech recognition chip, and I became sort of the algorithm guy in the group, because somebody actually had to figure out how to do speech recognition, and other people were going to build the chips. I was sort of the software guy too, you know it was in the early days. This was 1978, I think, or probably the summer '78 when I started, because I got to Berkeley in September '77, and I got my thesis eventually in '83. So believe it or not in like '78 most of the grad students in electrical engineering didn't program, you know, few of them started programming. But I was actually programming, and stuff like that, so I was the sort of a software guy, and also I did experiments on dynamic time warp speech recognition, and learned a lot about that. There wasn't really an expertise in that when I joined Bob's group, but I sort of learned it and he's a smart guy, he may not have worked in the field, but he helped me a lot and got me connected with the right people. He introduced me to George White, who at the time was doing a lot of speech recognition work and George sent me code, and coached me in speech recognition. Anyway, so, eventually, I figured out with their help the dynamic time warping technology, and we built a chip that did a thousand word, continuous speech DTW, level building kind of recognizer.

William Labov

Transcribed by Antonio Roque, University of Southern California

Q: One of the questions we've asked everybody is how they got into the field. And I notice that on your website you actually have a very interesting essay on that very point.

A: Yeah, I wrote it for undergraduates, but it's been pretty widely read. Basically I came to Harvard as an undergraduate. My advisor first said to me 'Where did you get this idolatry of science?' Because I was taking this one course, Chemistry B, and I walked out and I said, 'That man is intelligent, how did he know that I have an idolatry of science? But I do. And I think I'd like to be a part of it.' But I was majoring in English and Philosophy, and I went to work for a company... and I spent 11 years as a practical chemist formulating silkscreenings, and I was pretty good at it, but it gave me two kinds of experience. One, I learned that there's such a thing as being right or wrong. And if you spray a panel with a coating that you hope will last outdoors and you come back six months later and it's all cracked up you know you're wrong. You may not know why. But it gave me a sense of knowing that there's a reality out there that can prove you right or wrong. And I think that's different from a number of my colleagues who spent their lives in the university and they've never been tested that way. And when I decided to leave that field, it was because it didn't have the generality, I couldn't publish any of my results though I thought I was pretty good at what I was doing.

So I went back to this field of linguistics, and I remember writing an essay, a notebook full of notes, essays for an experimental science, and I found that the whole field, because these are very bright people arguing with each other vigorously about interesting questions, but the data was entirely their own reactions or what they got in a book, or formal elicitations from people, and all around me I saw people walking and talking and it occurred to me that we could build a science based upon the actual data from everyday life. And that's what I tried to do and I think it's still a minor part of linguistics, but it's had some continuity. So the whole idea is that you want to be tested by the world and find out with hard data whether you're right or wrong. So that's one part of it. And that also involved quantification, because there's so much variation in everyday life, so we had to introduce mathematical tools, and another way to look at it is this part of linguistics, at least dealing with change and variation, has become a quantitative study with a number of fairly sophisticated tools, though I don't think it's possible for all of linguistics to become quantitative, because it's not likely. But we have now the 34th annual meeting on new ways of analyzing variation, and it's become a major part of the field.

Now it just so happened that the second big research project we got funded from was from the Office of Education, and we were trying to answer the question, 'Is there any connection between the reading difficulties, reading failure, in the inner cities, and the difference in language between Black and White population?' So I got involved in that enterprise and some very interesting results about the difference between Black and White and showed that African-American English, as we now call it, is a very different system, and very systematic, but we didn't actually succeed in improving reading levels, instead we developed a whole line of scientific work, of which the work on African-American English is a very large part. But we keep returning to the question, 'Can we use our knowledge to change and improve the world?' And since I'm interested in language change and variation, I've done studies in a number of areas to show how language is changing, it became natural to ask, 'What are the practical applications of this kind of work?' And immediately you can see that speech recognition is an area that is important. The little bit that I knew about it, I knew that the problem for speaker independent recognition the big problem was dialect diversity. And gradually we've gotten a stronger and stronger hold on these topics and now we're just publishing this atlas of North American English, which covers all of North America for the first time, and what we found and continue from the earlier findings is linguistic change is very active and increasing diversity is the rule rather than increasing convergence. So that the dialects of North America are more different from each other than they ever were.

Bruce Millar

Transcribed by Antonio Roque, University of Southern California

Q: [How did you get into the field of speech research?]

A: I didn't realize it at the time, but I actually went for an interview with the scientific civil service, in Britain, and ended up being interviewed by Walter Lawerence, of PAT fame and speech synthesis. That didn't have a great impact at the time, but I was later that year given an opportunity to do a PhD in a new multidisciplinary program at the University of Keele in the UK, under the supervision of Professor Donald Mackay. He was very much into human sensory processing of various kinds, with a heavy emphasis of his work in vision. It just happened that he had a postdoc who had previously done a PhD under him in the vision area, and was returning from a postdoc in the US, and this was Bill Ainsworth, who recently has left us unfortunately. And so we put together a small speech group and I started working in 1964, looking at very primitive forms of speech analysis. We had no computers then, we had simply the best we could do with analog electronics. Not being even an expert in electronics, but learning as I went along, really I got involved in doing things which were little more than the processing of microphone signals, looking at the time interval structure of speech, and generating patterns on cathode ray oscilloscopes, using photography, and then visual comparison between sounds and looking at the variability that occurred and the consistency that also occurred with different vowel sounds. And it was mostly vowel-based analysis at that stage. So that was how I got into it, and I completed a PhD in 68 in that area.

Q: Great. So you mentioned analog hardware. What kind of hardware?

A: We were just on the change from valve-based electronics into transistor work, in fact there was very little transistor expertise around in the laboratory where I was working, so it was basically looking at measurements of the time interval structure of clipped speech.

Q: Of clipped speech? Why were you interested in clipped speech?

A: Well, I mean the earlier work by Licklider and co. who had showed the intelligibility of clipped speech so we knew there was information there. Of course we may rather naively at that point thought, 'Ah, it must be in the time intervals,' ignoring the fact that there's a fair bit of formant information available, and the formant harmonics by the clipping. But it was really driven by the limitations on how we could process things. I don't recall there being anything, not at that stage, even like filter banks around for us to use. And I guess funding wasn't all that plentiful. So raw research students were thrown into the lab with a bundle of components and boards and batteries and good wishes to see what you could do. So we did this. And we published a paper in 1965. My first exposure to the international community was in 1965, which looking back was really quite early. We went to the International Congress on Acoustics in Liege, and was rather, again looking back, amazed to realize that a fellow student and I actually managed to sit next to Homer Dudley on a bus on an outing, and so we had an opportunity to connect with one that we now recognize as a leader in the field, way before we actually got into it. Also Jim Flanagan was there as a relatively, not junior, but certainly rising star, so I remember those people.

Wolfgang Hess

Transcribed by Tadesse Anberbir, Ajou University, Korea

Q: It is a great honor to be here with Professor Wolfgang Hess, and we appreciate your coming and agreeing to be interviewed. We are asking everybody the same question as a starting question, which is: How did you get into this field? What brought you into the field of speech and language?

A: My diploma thesis. I studied Electrical Engineering at the University of Stuttgart in the 60s, and I was a student of the late Eberhard Zwicker, who was involved in hearing research and he had developed a functional model of the ear which was, say, crudely speaking something like a channel vocoder or a filter bank with filters adapted to the characteristics of the critical bands. And he wanted to prove that this device was able to give a good preprocessor for speech recognition and especially for the recognition of 10 digits. And they looked for students to implement an interface between their analog filter and a computer to process these things on that.

Q: Which computer?

A: That was an absolutely old one it was called ER56, a development by ALCATEL, at that time ACL standard electrical loans in German, which still worked with valve tubes and had something like a memory of 10k and it worked with punched cards or punched tapes. And, so... it could be only programmed in machine code. Well, it was my task to make this interface and then I also had to be contacting the Engineer who was in charge of that computer. Then he told me, 'Why are you going and trying to make all this like some kind of time normalization which would be necessary to detect the beginning and the ending of these numbers, why are you doing that trying to do that in hardware? Come here and program it.' And so, as a diploma thesis, I made the device just fill in the frames into the computer, and did the rest by software. And that what I did as a, say, six month research project, and I've stayed in that field ever since.

Acknowledgements and more information

These interviews were conducted by Dr. Janet Baker in 2005 and were transcribed by members of ISCA-SAC as described previously: 1, 2, 3, 4, 5, 6. Sunayana Sitaram of ISCA-SAC coordinated transcription efforts.


Add A Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

Code in the picture:
Title:
Your Name(*):
Email:
Notify me of any further comments to this thread:
Website:
Comment(*):