Skip to main content

Showing 1–2 of 2 results for author: Janes, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.10068  [pdf, other

    cs.CV

    Diachronic Document Dataset for Semantic Layout Analysis

    Authors: Thibault Clérice, Juliette Janes, Hugo Scheithauer, Sarah Bénière, Florian Cafiero, Laurent Romary, Simon Gabay, Benoît Sagot

    Abstract: We present a novel, open-access dataset designed for semantic layout analysis, built to support document recreation workflows through mapping with the Text Encoding Initiative (TEI) standard. This dataset includes 7,254 annotated pages spanning a large temporal range (1600-2024) of digitised and born-digital materials across diverse document types (magazines, papers from sciences and humanities, P… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  2. arXiv:2408.04554  [pdf, other

    cs.CL

    Molyé: A Corpus-based Approach to Language Contact in Colonial France

    Authors: Rasul Dent, Juliette Janès, Thibault Clérice, Pedro Ortiz Suarez, Benoît Sagot

    Abstract: Whether or not several Creole languages which developed during the early modern period can be considered genetic descendants of European languages has been the subject of intense debate. This is in large part due to the absence of evidence of intermediate forms. This work introduces a new open corpus, the Molyé corpus, which combines stereotypical representations of three kinds of language variati… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 main pages and 3 pages of references