Skip to main content

Showing 1–11 of 11 results for author: Prud'hommeaux, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.24299  [pdf, ps, other

    cs.DB cs.AI

    Shape Expressions with Inheritance

    Authors: Iovka Boneva, Jose Emilio Labra Gayo, Eric Prud'hommeaux, Katherine Thornton, Andra Waagmeester

    Abstract: We formally introduce an inheritance mechanism for the Shape Expressions language (ShEx). It is inspired by inheritance in object-oriented programming languages, and provides similar advantages such as reuse, modularity, and more flexible data modelling. Using an example, we explain the main features of the inheritance mechanism. We present its syntax and formal semantics. The semantics is an exte… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: Accepted in Extended Semantic Web Conference, ESWC, 2025

  2. arXiv:2209.13778  [pdf, other

    cs.CL

    Data-driven Parsing Evaluation for Child-Parent Interactions

    Authors: Zoey Liu, Emily Prud'hommeaux

    Abstract: We present a syntactic dependency treebank for naturalistic child and child-directed speech in English (MacWhinney, 2000). Our annotations largely followed the guidelines of the Universal Dependencies project (UD (Zeman et al., 2022)), with detailed extensions to lexical/syntactic structures unique to conversational speech (in opposition to written texts). Compared to existing UD-style spoken tree… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  3. arXiv:2208.12888  [pdf, other

    cs.CL cs.SD eess.AS

    Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

    Authors: Zoey Liu, Justin Spence, Emily Prud'hommeaux

    Abstract: Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out" data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minima… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  4. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  5. arXiv:2204.05541  [pdf, other

    cs.CL

    Not always about you: Prioritizing community needs when developing endangered language technology

    Authors: Zoey Liu, Crystal Richardson, Richard Hatcher Jr, Emily Prud'hommeaux

    Abstract: Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second langua… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: To appear in ACL 2022

  6. arXiv:2201.01845  [pdf, other

    cs.CL

    Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

    Authors: Zoey Liu, Emily Prud'hommeaux

    Abstract: Common designs of model evaluation typically focus on monolingual settings, where different models are compared according to their performance on a single data set that is assumed to be representative of all possible data for the task at hand. While this may be reasonable for a large data set, this assumption is difficult to maintain in low-resource scenarios, where artifacts of the data collectio… ▽ More

    Submitted 11 April, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: Published in TACL (https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00467/110437/Data-driven-Model-Generalizability-in)

  7. arXiv:2005.05477  [pdf, other

    cs.CL

    Neural Polysynthetic Language Modelling

    Authors: Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang

    Abstract: Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especially those that are analytic, a common approach is to treat morphologically-distinct variants of a common root as completely independent word types. This assumes, that there are limited morphological infle… ▽ More

    Submitted 13 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  8. arXiv:2004.13203  [pdf, other

    cs.CL

    A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization

    Authors: Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell

    Abstract: Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited. In August 2019, a workshop was held at Carnegie Mellon University in Pittsburgh to attempt to bring together language community members, documentary linguists, and technologists to discuss how to bridge this gap and cr… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted at SLTU-CCURL 2020

  9. arXiv:1701.08924  [pdf, other

    cs.DB cs.NI

    Validating and describing linked data portals using shapes

    Authors: Jose-Emilio Labra-Gayo, Eric Prud'hommeaux, Harold Solbrig, Iovka Boneva

    Abstract: Linked data portals need to be able to advertise and describe the structure of their content. A sufficiently expressive and intuitive schema language will allow portals to communicate these structures. Validation tools will aid in the publication and maintenance of linked data and increase their quality. Two schema language proposals have recently emerged for describing the structures of RDF gra… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

  10. arXiv:1607.04809  [pdf, other

    cs.AI

    Knowledge Representation on the Web revisited: Tools for Prototype Based Ontologies

    Authors: Michael Cochez, Stefan Decker, Eric Prud'hommeaux

    Abstract: In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by applications. We also present extensible implement… ▽ More

    Submitted 16 July, 2016; originally announced July 2016.

    Comments: Related software available from https://github.com/miselico/knowledgebase/

  11. arXiv:1510.05555  [pdf, other

    cs.DB

    Shape Expressions Schemas

    Authors: Iovka Boneva, Jose E. Labra Gayo, Eric G. Prud'hommeaux, Sławek Staworko

    Abstract: We present Shape Expressions (ShEx), an expressive schema language for RDF designed to provide a high-level, user friendly syntax with intuitive semantics. ShEx allows to describe the vocabulary and the structure of an RDF graph, and to constrain the allowed values for the properties of a node. It includes an algebraic grouping operator, a choice operator, cardinalitiy constraints for the number o… ▽ More

    Submitted 16 November, 2015; v1 submitted 19 October, 2015; originally announced October 2015.