IARPA announces winners of its ASpIRE challenge
The Intelligence Advanced Research Projects Activity (IARPA), within the Office of the Director of National Intelligence (ODNI), announced on September 10 the winners of its speech recognition challenge, Automatic Speech recognition in Reverberant Environments (ASpIRE). The winning teams from the Johns Hopkins University, Raytheon BBN Technologies, the Institute for Infocomm Research, and Brno University of Technology will share $110,000 in prizes.
Typically, speech recognition systems are ‘trained’ on speech recorded in environments very similar to the environments in which they are expected to be used. The ASpIRE challenge contestants tackled a harder problem: building accurate systems for automatically transcribing speech recorded in noisy and reverberantenvironments without knowing anything about the recording devices or the acoustics of the space, and without training data that resembled the contest’s test conditions. At the start of the challenge, contestants were given a telephone speech to develop and train their systems over a period of roughly three months. Their systems were tested on very different speech recordings collected in noisy rooms with various sizes, shapes and microphone configurations. The ASpIRE challenge was uniquely challenging because of this kind of mismatch between training data and test data.
The speech data were collected by Linguistic Data Consortium. Appen Butler Hill transcribed the microphone recordings. MIT Lincoln Laboratory and IARPA together evaluated results. InnoCentive managed the challenge website including maintaining a leaderboard.
Challenge entries were scored under two evaluation conditions with equivalent background noise and reverberation:
- The Single Microphone Condition tested accuracy of speech recognition on recordings from single microphones selected arbitrarily from among six microphones placed in the room.
- The Multiple Microphone Condition tested accuracy of speech recognition on recordings from six different microphones recording at once.
All of the ASpIRE challenge winners delivered systems with more than a 50% reduction in word error rate (WER) compared to the IARPA baseline system. WER is the standard measure of accuracy for speech recognition systems; lower WER scores indicate more accurate systems.
The winners in the Single Microphone category are:
- the team from the Center for Language and Speech Processing, Johns Hopkins University (Vijayaditya Peddinti, Guoguo Chen, Dr. Daniel Povey, Dr. Sanjeev Khudanpur);
- the multi-institutional team from Raytheon BBN Technologies (Jeff Ma, Roger Hsiao, William Hartmann, Rich Schwartz, Stavros Tsakalidis), Brno University of Technology (Martin Karafiat, Lukas Burget, Igor Szoke, Frantisek Grezl), and Johns Hopkins University (Sri Harish Mallidi, Hynek Hermansky); and
- the team from the Institute for Infocomm Research, A*STAR, Singapore (Dr. Jonathan William Dennis and Dr. Tran Huy Dat).
The winner in the Multiple Microphone category is:
- the team from the Institute for Infocomm Research, A*STAR, Singapore (Dr. Jonathan William Dennis and Dr. Tran Huy Dat).
“We’re delighted with the diversity of solutions submitted by the ASpIRE challenge contestants,” said Mary Harper, IARPA’s program manager for the ASpIRE challenge. “Their performance under rigorous evaluation conditions suggests that accurate speech recognition – even for speech recorded in environments for which training data are unavailable – is possible.”
Source: ODNI