Skip to main content

Showing 1–2 of 2 results for author: Goldin, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20581  [pdf, ps, other

    cs.CL

    Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

    Authors: Gili Goldin, Shuly Wintner

    Abstract: We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understanding parliamentary language according to the MLM task. We provide a detailed evaluation of the model's performance, showing improvements in perplexity a… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 3 pages, 1 table

    MSC Class: 68T50

  2. The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings

    Authors: Gili Goldin, Nick Howell, Noam Ordan, Ella Rabinovich, Shuly Wintner

    Abstract: We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of t… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures

    MSC Class: 68T50 ACM Class: I.2.7