Skip to main content

Showing 1–11 of 11 results for author: Ryskina, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.09605  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models

    Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

    Abstract: The ability to build and reason about models of the world is essential for situated language understanding. But evaluating world modeling capabilities in modern AI systems -- especially those based on language models -- has proven challenging, in large part because of the difficulty of disentangling conceptual knowledge about the world from knowledge of surface co-occurrence statistics. This paper… ▽ More

    Submitted 3 July, 2025; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to Transactions of the ACL (TACL). Contains 25 pages (14 main), 6 figures. Visit http://ewok-core.github.io for data and code. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally

  2. Queer In AI: A Case Study in Community-Led Participatory AI

    Authors: Organizers Of QueerInAI, :, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, Huan Zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav , et al. (26 additional authors not shown)

    Abstract: We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the process, look at ways this organization has fallen short of operationalizing participatory and intersectional principles, and then assess th… ▽ More

    Submitted 8 June, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: To appear at FAccT 2023

    Journal ref: 2023 ACM Conference on Fairness, Accountability, and Transparency

  3. State-of-the-art generalisation research in NLP: A taxonomy and review

    Authors: Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin

    Abstract: The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation… ▽ More

    Submitted 12 January, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: This preprint was published as an Analysis article in Nature Machine Intelligence. Please refer to the published version when citing this work. 28 pages of content + 6 pages of appendix + 52 pages of references

    Journal ref: Nat Mach Intell 5, 1161-1174 (2023)

  4. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  5. arXiv:2109.09597  [pdf, other

    cs.CL cs.AI cs.GT

    Two Approaches to Building Collaborative, Task-Oriented Dialog Agents through Self-Play

    Authors: Arkady Arkhangorodsky, Scot Fang, Victoria Knight, Ajay Nagesh, Maria Ryskina, Kevin Knight

    Abstract: Task-oriented dialog systems are often trained on human/human dialogs, such as collected from Wizard-of-Oz interfaces. However, human/human corpora are frequently too small for supervised training to be effective. This paper investigates two approaches to training agent-bots and user-bots through self-play, in which they autonomously explore an API environment, discovering communication strategies… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: 4 pages, 5 figures

  6. arXiv:2109.07230  [pdf, other

    cs.CL cs.LG

    Learning Mathematical Properties of Integers

    Authors: Maria Ryskina, Kevin Knight

    Abstract: Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the repre… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: BlackboxNLP 2021

  7. arXiv:2106.12698  [pdf, other

    cs.CL

    Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction

    Authors: Maria Ryskina, Eduard Hovy, Taylor Berg-Kirkpatrick, Matthew R. Gormley

    Abstract: Traditionally, character-level transduction problems have been solved with finite-state models designed to encode structural and linguistic knowledge of the underlying process, whereas recent approaches rely on the power and flexibility of sequence-to-sequence models with attention. Focusing on the less explored unsupervised learning scenario, we compare the two model classes side by side and find… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted to SIGMORPHON 2021

  8. arXiv:2102.08345  [pdf, other

    cs.CL

    NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

    Authors: Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

    Abstract: When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed q… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: EACL 2021

  9. arXiv:2005.02517  [pdf, other

    cs.CL

    Phonetic and Visual Priors for Decipherment of Informal Romanization

    Authors: Maria Ryskina, Matthew R. Gormley, Taylor Berg-Kirkpatrick

    Abstract: Informal romanization is an idiosyncratic process used by humans in informal digital communication to encode non-Latin script languages into Latin character sets found on common keyboards. Character substitution choices differ between users but have been shown to be governed by the same main principles observed across a variety of languages---namely, character pairs are often associated through ph… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: To appear at ACL 2020

  10. Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods

    Authors: Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, Yulia Tsvetkov

    Abstract: We perform statistical analysis of the phenomenon of neology, the process by which new words emerge in a language, using large diachronic corpora of English. We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive of word emergence although we find… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: SCiL 2020

    Journal ref: Proceedings of the Society for Computation in Linguistics 3.1 (2020): 43-52

  11. arXiv:1704.07875  [pdf, other

    cs.CL

    Automatic Compositor Attribution in the First Folio of Shakespeare

    Authors: Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-Kirkpatrick

    Abstract: Compositor attribution, the clustering of pages in a historical printed document by the individual who set the type, is a bibliographic task that relies on analysis of orthographic variation and inspection of visual details of the printed page. In this paper, we introduce a novel unsupervised model that jointly describes the textual and visual features needed to distinguish compositors. Applied to… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: Short paper (6 pages) accepted at ACL 2017