Skip to main content

Showing 1–3 of 3 results for author: Blasi, D E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2106.02289  [pdf, other

    cs.CL

    Modeling the Unigram Distribution

    Authors: Irene Nikkarinen, Tiago Pimentel, Damián E. Blasi, Ryan Cotterell

    Abstract: The unigram distribution is the non-contextual probability of finding a specific word form in a corpus. While of central importance to the study of language, it is commonly approximated by each word's sample frequency in the corpus. This approach, being highly dependent on sample size, assigns zero probability to any out-of-vocabulary (oov) word form. As a result, it produces negatively biased pro… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: Irene Nikkarinen and Tiago Pimentel contributed equally to this work. Accepted to the findings of ACL 2021. Code available in https://github.com/irenenikk/modelling-unigram

  2. arXiv:2106.00877  [pdf, other

    cs.CL

    Evaluating Word Embeddings with Categorical Modularity

    Authors: Sílvia Casacuberta, Karina Halevy, Damián E. Blasi

    Abstract: We introduce categorical modularity, a novel low-resource intrinsic metric to evaluate word embedding quality. Categorical modularity is a graph modularity metric based on the $k$-nearest neighbor graph constructed with embedding vectors of words from a fixed set of semantic categories, in which the goal is to measure the proportion of words that have nearest neighbors within the same categories.… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted to Findings of ACL 2021 (Long Paper)

  3. arXiv:1906.05906  [pdf, other

    cs.CL

    Meaning to Form: Measuring Systematicity as Information

    Authors: Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

    Abstract: A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade? For instance, does the character bigram \textit{gl} have any systematic relationship to the meaning of words like \textit{glisten}, \textit{gleam} and \textit{gl… ▽ More

    Submitted 26 July, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at ACL 2019