skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Benchmarking Long-tail Generalization with Likelihood Splits

arXiv.org, 2023-05

2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;http://creativecommons.org/licenses/by/4.0 ;EISSN: 2331-8422 ;DOI: 10.48550/arxiv.2210.06799

Full text available

Citations Cited by
  • Title:
    Benchmarking Long-tail Generalization with Likelihood Splits
  • Author: Godbole, Ameya ; Jia, Robin
  • Subjects: Benchmarks ; Computer Science - Computation and Language ; Natural language ; Natural language processing
  • Is Part Of: arXiv.org, 2023-05
  • Description: In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances. We propose a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets. We create 'Likelihood Splits' where examples that are assigned lower likelihood by a pre-trained language model (LM) are placed in the test set, and more likely examples are in the training set. This simple approach can be customized to construct meaningful train-test splits for a wide range of tasks. Likelihood Splits surface more challenges than random splits: relative error rates of state-of-the-art models increase by 59% for semantic parsing on Spider, 93% for natural language inference on SNLI, and 33% for yes/no question answering on BoolQ, on our splits compared with the corresponding random splits. Moreover, Likelihood Splits create fairer benchmarks than adversarial filtering; when the LM used to create the splits is also employed as the task model, our splits do not unfairly penalize the LM.
  • Publisher: Ithaca: Cornell University Library, arXiv.org
  • Language: English
  • Identifier: EISSN: 2331-8422
    DOI: 10.48550/arxiv.2210.06799
  • Source: arXiv.org
    Free E Journals
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central

Searching Remote Databases, Please Wait