Phonetic Arts

Matt Stuttle

SLTC Newsletter, July 2010

Phonetic Arts supplies speech synthesis technologies to the games industry. Although huge technological advances have led to a continuous improvement in the quality and complexity of the graphics generated by computer games, games audio is only just now moving beyond the concept of replaying a large library of static prerecorded lines. Phonetic Arts’ core product is a suite of software that enables games developers to build parametric voices from existing speech recordings and to recombine lines of dialog on-the-fly to massively increase the variability of speech in games. Our product suite, PA Studio, has been developed in partnership with many leading game studios, ensuring that we can provide the best possible speech solutions to the games industry. Phonetic Arts was formed in late 2006 and has grown to 14 staff, all based at Phonetic Arts HQ in Cambridge. It was founded by Paul Taylor, Ian Hodgson and Anthony Tomlinson. The original idea of bringing dynamic speech to games came after a meeting with the developers of one of the leading sports titles a number of years ago.

Currently, there are two main released products. PA Generator is a toolset for making parametric synthetic voices from an existing set of recordings. This has two uses - for generating placeholder dialog or for use with the low-footprint in-game engine. The other product is PA Composer, which provides games designers with an off-line method for recombining the existing recordings at the word level to generate high-quality speech files.

Phonetic Arts isn’t set up as a service provider - the tools to build these voices are designed for use directly by the customers. This allows the customers to build and prototype voices much faster than using an intermediary company. Indeed, with A-list actors being used increasingly frequently in games, it would often not be possible to send recorded dialog to a different company. Additionally, the current games market is increasingly based around characters and franchises, with more than half of the current games chart being sequels. In this situation, there already exists a large corpus of recordings before a project has even started, with as many as 20,000 lines of dialog for a single voice actor.

Together, Generator and Composer combine to form a useful bookending process in the game dialog recording process. Currently, placeholder dialog is often recorded ad-hoc by the audio programmer, or simply not added until near the end of the game. Getting realistic dialog in the game earlier makes functional testing (memory budgets are a key sticking point in games development) as well as design and pacing much easier. When the core voice artists have recorded their lines, the Composer wave-splicing technologies can eliminate the need for additional pickup sessions.

The potential advantages of these technologies for the games industry are numerous. Even when thousands of lines have been recorded, canned lines can become extremely repetitive for key game events, and variation of words and prosody would improve the game experience. For the sports games, anaphora are currently used (“He’s passed it back to his team-mate”) as the full set of possibilities are too large to store or even feasibly record. In the more extreme cases, there are stories of game releases held up for weeks due to a single copyrighted word used in the narration, or of entire voice-over systems being removed after poor feedback at the alpha play-test stage. Eliminating these sorts of mistakes will help games to be developed faster and with higher-quality, better integrated voice dialog.

There are still a number of challenges: games dialog is frequently much more expressive than any existing speech synthesis solution, and as expected with parametric speech, there is a constant call for higher quality synthesis. However, with a dynamic focused team and strong links with the biggest studios and developers in the industry, the possibilities ahead should be exciting and (literally) game-changing.

Matt Stuttle is Vice President, Research at Phonetic Arts. matt.stuttle@gmail.com