Skip to main content

Showing 1–2 of 2 results for author: Samo, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.06622  [pdf, other

    cs.CL

    Exploring Italian sentence embeddings properties through multi-tasking

    Authors: Vivi Nastase, Giuseppe Samo, Chunyang Jiang, Paola Merlo

    Abstract: We investigate to what degree existing LLMs encode abstract linguistic information in Italian in a multi-task setting. We exploit curated synthetic data on a large scale -- several Blackbird Language Matrices (BLMs) problems in Italian -- and use them to study how sentence representations built using pre-trained language models encode specific syntactic and semantic information. We use a two-level… ▽ More

    Submitted 29 November, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 6 figures, 4 tables

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2409.06567  [pdf, other

    cs.CL

    Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement

    Authors: Vivi Nastase, Chunyang Jiang, Giuseppe Samo, Paola Merlo

    Abstract: In this paper, our goal is to investigate to what degree multilingual pretrained language models capture cross-linguistically valid abstract linguistic representations. We take the approach of developing curated synthetic data on a large scale, with specific properties, and using them to study sentence representations built using pretrained language models. We use a new multiple-choice task and da… ▽ More

    Submitted 29 November, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: 13 pages, 5 tables, 6 figures

    MSC Class: 68T50 ACM Class: I.2.7