Leave a comment

Speech recognition for adventure games

On linkedin, Someone posted a link to an example of Unity’s speech recognition API for Windows 10. It sounded simple to set up, so I decided to try it out. I’d done some basic research a while back to see if speech input might be an decent alternative to the text parser input in Cascade Quest.

With Unity 5.5 and Windows 10, it was just a matter of minutes before I had something very basic working.

With the Unity APIs, there are basically three ways to configure speech input:

  • A dictation recognizer, which can recognize general phrases. This requires internet connectivity though, and so it probably not suitable for the quick responses needed for a game.
  • A keyword recognizer, which simply recognizes single words (or series of words) from a fixed list. This is also not very useful for Cascade Quest, since I need to be able to recognize natural language, like the text parser can.
  • A grammar recognizer, which is provided a list of grammar rules to base its output on. This is really the only viable solution for me.

Unfortunately the grammar needs to be in the form of an SRGS XML file. This isn’t ideal, since I’m constructing the grammar (including the words to be used) from data available at runtime.

There were two main challenges to overcome:

  1. Turning the actual grammar rules into SRGS.
  2. “Cleaning” the word list so it can be used in the grammar.

The grammar

The form the text parser grammar was in was challenging to convert to SRGS. To add the problem, there is no way to debug issues with a broken grammar.

The API takes an xml file, loads it and then either works or doesn’t. It gives no indication if the SRGS is invalid, or if something just isn’t recognized. There are no errors or status (other than “everything is ok, I’m running”). On top of that, if a grammar is loaded that is invalid, all future attempts with a correctly-functioning grammar will also fail to work (until Unity is closed and restarted). I don’t know if this is a problem with the underlying Windows APIs or how Unity uses them. Nevertheless, it made it extremely tedious and (until I figured out what was going on) confusing to debug.

In the end, I ended up having to hand-code some reasonable grammar, increasing the complexity bit by bit and always testing it still worked.

The words

The next issue is the words- or specifically, the transformations of the words (pluralizing, adding -ing or -ed, etc…). My game has rules for modifying the suffixes of words, but they work from completed text input back to a known word. For example, postmen is converted to postman, which is a known word. Likewise, wolves would be converted to wolf. We check the suffix of the word, and if it is a known plural suffix, we convert it to the singular form and validate that it is a known word.

The speech recognizer grammar needs to include all variations of a word, but that means going in the other direction: from the singular to the plural (or root to transformed). For “wolf”, we would generate a number of options: wolves, wolfs, wolfes. All these are valid transformations based on our suffix rules. But not all of them are real words. So to validate these transformations, we need to check against a master English word list. I found one online that contained about 500,000 words (including plurals, verb tenses, and such).

The result

The end result is something that works “ok”. I’m sure it would be a lot more accurate if I could limit the grammar only to words that are included in the current loaded room (or at least weight those words more highly). I could do that, but I would need a different SRGS XML file for each room in the game. Even that isn’t a great solution though, because other pieces of logic in the game (which respond to text parser input) might be loaded or unloaded dynamically.

Anyway, speech recognition as an input method for Cascade Quest is something I’ll keep in my back pocket as a possibility.

How does it actually feel to play? “Ok”. I found it got tiring speaking to the game for very long. Would anyone actually use it? If I get it into a more reliable state, maybe I’ll do some playtesting to find out.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Just another WordPress site

Just another WordPress.com site

Harebrained Schemes

Developer's blog for IceFall Games

kosmonaut's blog

3d GFX and more


Turn up the rez!

bitsquid: development blog

Developer's blog for IceFall Games

Game Development by Sean

Developer's blog for IceFall Games

Lost Garden

Developer's blog for IceFall Games


Developer's blog for IceFall Games

Casey Muratori's Blog

Developer's blog for IceFall Games

Blog – Coherent Labs

Developer's blog for IceFall Games

Rendering Evolution

Developer's blog for IceFall Games

Simon schreibt.

Developer's blog for IceFall Games

- Woolfe -

Developer's blog for IceFall Games

Fabio Ferrara

Game Developer

Clone of Duty: Stonehenge

First Person Shooter coming soon to the XBOX 360

Low Tide Productions

Games and other artsy stuff...


Just another WordPress.com site

Sipty's Writing

Take a look inside the mind of a game developer.

Jonas Kyratzes

Writer & Game Designer

%d bloggers like this: