Vocabulary Growth in Speech Recognition

People have been working on large vocabulary continuous speech recognition (LVCSR) since the 1980's. As computers became more powerful and the speech recognition technology advanced, the meaning of "large vocabulary" has changed. In the early days, a vocabulary of one thousand words was considered large. Now, vocabularies of one million tokens is the new norm.

To analyze the trend, I collected 32 publications where the authors built a LVCSR system and recorded the size of the recognizer vocabulary. These were published between 1988 and 2015.

The trend is for the size of the speech recognition vocabulary to double roughly every four years.