Skip to main content

Showing 1–6 of 6 results for author: Bareket, D

.
  1. Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

    Authors: Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty

    Abstract: Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and different morphosyntactic properties. This ambiguity goes beyond word-sense disambiguation (WSD), and may include token segmentation into multiple word… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Journal ref: In Proceedings of EACL 2023, 849-864 (2023)

  2. arXiv:2305.05302  [pdf, other

    cs.CL

    The Perfect Victim: Computational Analysis of Judicial Attitudes towards Victims of Sexual Violence

    Authors: Eliya Habba, Renana Keydar, Dan Bareket, Gabriel Stanovsky

    Abstract: We develop computational models to analyze court statements in order to assess judicial attitudes toward victims of sexual violence in the Israeli court system. The study examines the resonance of "rape myths" in the criminal justice system's response to sex crimes, in particular in judicial assessment of victim's credibility. We begin by formulating an ontology for evaluating judicial attitudes t… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  3. arXiv:2211.15199  [pdf

    cs.CL

    Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All

    Authors: Eylon Gueta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty

    Abstract: We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before. We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance. Our experiments show that larger vocabularies… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

  4. arXiv:2104.04052  [pdf, other

    cs.CL

    AlephBERT:A Hebrew Large Pre-Trained Language Model to Start-off your Hebrew NLP Application With

    Authors: Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Shaked Greenfeld, Reut Tsarfaty

    Abstract: Large Pre-trained Language Models (PLMs) have become ubiquitous in the development of language understanding technology and lie at the heart of many artificial intelligence advances. While advances reported for English using PLMs are unprecedented, reported advances using PLMs in Hebrew are few and far between. The problem is twofold. First, Hebrew resources available for training NLP models are n… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  5. Neural Modeling for Named Entities and Morphology (NEMO^2)

    Authors: Dan Bareket, Reut Tsarfaty

    Abstract: Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically-Rich Languages (MRLs) pose a challenge to this basic formulation, as the boundaries of Named Entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental… ▽ More

    Submitted 10 May, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted to TACL. This is a pre-MIT Press publication version

  6. arXiv:2005.01330  [pdf, other

    cs.CL

    From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

    Authors: Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker

    Abstract: It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs).Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.