TAPE: Assessing Few-shot Russian Language Understanding
Authors:
Ekaterina Taktasheva,
Tatiana Shavrina,
Alena Fenogenova,
Denis Shevelev,
Nadezhda Katricheva,
Maria Tikhonova,
Albina Akhmetgareeva,
Oleg Zinkevich,
Anastasiia Bashmakova,
Svetlana Iordanskaia,
Alena Spiridonova,
Valentina Kurenshchikova,
Ekaterina Artemova,
Vladislav Mikhailov
Abstract:
Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six mo…
▽ More
Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE's design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing the input has a more negligible effect. At the same time, the results demonstrate a significant gap between the neural and human baselines for most tasks. We publicly release TAPE (tape-benchmark.com) to foster research on robust LMs that can generalize to new tasks when little to no supervision is available.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.