IARPA releases explainable NLP RFI
On November 22, the Intelligence Advanced Research Projects Activity (IARPA) released a request for information (RFI) entitled “Evaluation of Neural Text Generation Models and Methods in Explainable NLP.” Responses are due by 12:00 p.m. Eastern on December 10.
IARPA is seeking information on established techniques, metrics and capabilities related to the evaluation of generated text and the evaluation of human-interpretable explanations for neural language model behavior. This RFI is issued for planning purposes only, and it does not constitute a formal solicitation for proposals or suggest the procurement of any material, data sets, etc. The following sections of this announcement contain details on the specific technology areas of interest, along with instructions for the submission of responses.
Background and Scope
Neural language models (NLMs) have achieved state-of-the-art performance on a wide variety of natural language tasks. In natural language generation in particular, models such as GPT-3 have produced strikingly human-like text. Methods to evaluate and explain these technologies have not kept pace with the technologies themselves.
Language generation models can be used for a variety of automated tasks involving modification of a pre-existing text, such as paraphrasing, style transfer, summarization, etc. Measuring success on these tasks can be challenging: a modified text must remain faithful to the meaning of the text from which it is derived (i.e., maintaining sense), while also exhibiting human-like fluency (i.e., soundness). Although numerous automated techniques for evaluating sense and soundness have been developed, techniques that require humans to grade generated text (e.g., with Likert scales or ranking) remain the gold standard.
Furthermore, as language generation models increasingly produce human-like content on the internet, there is growing interest from diverse stakeholders in capabilities to flag artificially generated text content, in its many varieties. As is the case in other text classification tasks, NLM classifiers have seen success in identifying machine generated text; however, it is difficult to derive explanations for the predictions of multi-layer neural models, and the human user’s inability to understand and trust the rationale underpinning individual model predictions places limits on a system’s potential use cases.
There is a growing body of explainable NLP techniques, but many of the proposed methods for text classifier models involve delineating spans of input text that a model ‘attends’ to when
predicting a label. A shortcoming of span-level explanations is that they do not identify actual linguistic features or structures (syntactic, morphological, discourse-level, etc.) even though there is a growing body of evidence that NLMs encode these aspects of Human Language.
Evaluating explanations presents an additional challenge, especially for text classification tasks, such as machine generated text detection, where it is difficult to produce ground truth annotations. Unlike sentiment classification, where an annotator can generally identify which spans in a sentence express positive or negative sentiment, humans do not have clear intuitions about the kinds of features differentiate human vs. machine generated text. Explainable NLP datasets have focused on tasks where humans have strong intuitions about the correct explanations for ground truth labels. Lack of ground truth datasets poses a challenge for evaluating explanations (though performance on downstream tasks that rely on explanations can serve as a non-ideal proxy for explanation quality).
The purposes of this RFI are the following:
- Identification of novel human or automatic techniques, metrics and capabilities for evaluating the sense and soundness of machine modified text
- Identification of novel methods to derive human-interpretable explanations from NLM text classifiers
- Identification of novel techniques for measuring the quality of local explanations derived from NLMs
The right opportunity can be worth millions. Don’t miss out on the latest IC-focused RFI, BAA, industry day, and RFP information – subscribe to IC News today.