IARPA announces OpenCLIR challenge winners

The Intelligence Advanced Research Projects Activity, within the Office of the Director of National Intelligence, announced on November 18 the winners of the Open Cross Lingual Information Retrieval – OpenCLIR – prize challenge. Launched in June 2018, this challenge involved innovative approaches to retrieve information from audio and text documents, using English queries against documents that were not in English. The challenge focused on low-resource languages – less studied languages for which large amounts of training data do not exist. One application of such a capability would be to support effective triage and analysis of large volumes of information in a variety of less studied languages. For the OpenCLIR challenge, the language was Swahili.

“A cross-language information retrieval capability that can query both speech and text documents in a lower resource language has not yet been produced for mass consumption. This challenge highlighted state of the art techniques for machine translation and speech recognition development, and joint optimization of all the natural language processing required to achieve a usable end-to-end solution,” said IARPA program manager Carl Rubino. “Aside from limited speech recognition and machine translation data provided by IARPA to train a system, participants were challenged to employ data harvesting methods to improve baseline capabilities. Entries that relied only on the training data provided by the challenge were not expected to be competitive.”

Of the 37 teams registered – from 14 different countries – only five made it to the final evaluation stage. They were evaluated by the National Institute of Standards and Technology on a detection metric that awarded return of relevant documents and penalized false alarms and missed documents. Winners were awarded in the text category but no team met the minimum required threshold performance for speech.

To learn more about the challenge and a complete list of winners please click here.

Source: ODNI