Skip to main content

Showing 1–9 of 9 results for author: Lorré, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02604  [pdf, other

    cs.CL cs.SD eess.AS

    LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect

    Authors: Hedi Naouara, Jean-Pierre Lorré, Jérôme Louradour

    Abstract: Developing Automatic Speech Recognition (ASR) systems for Tunisian Arabic Dialect is challenging due to the dialect's linguistic complexity and the scarcity of annotated speech datasets. To address these challenges, we propose the LinTO audio and textual datasets -- comprehensive resources that capture phonological and lexical features of Tunisian Arabic Dialect. These datasets include a variety o… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  2. arXiv:2503.12294  [pdf, other

    cs.CL cs.AI

    The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation

    Authors: Olivier Gouvert, Julie Hunter, Jérôme Louradour, Christophe Cerisara, Evan Dufraisse, Yaya Sy, Laura Rivière, Jean-Pierre Lorré, OpenLLM-France community

    Abstract: We present both the Lucie Training Dataset and the Lucie-7B foundation model. The Lucie Training Dataset is a multilingual collection of textual corpora centered around French and designed to offset anglo-centric biases found in many datasets for large language model pretraining. Its French data is pulled not only from traditional web sources, but also from French cultural heritage documents, fill… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  3. arXiv:2311.16840  [pdf, ps, other

    cs.CL cs.AI

    The Claire French Dialogue Dataset

    Authors: Julie Hunter, Jérôme Louradour, Virgile Rennard, Ismaïl Harrando, Guokan Shang, Jean-Pierre Lorré

    Abstract: We present the Claire French Dialogue Dataset (CFDD), a resource created by members of LINAGORA Labs in the context of the OpenLLM France initiative. CFDD is a corpus containing roughly 160 million words from transcripts and stage plays in French that we have assembled and publicly released in an effort to further the development of multilingual, open source language models. This paper describes t… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  4. arXiv:2004.02913  [pdf, other

    cs.CL cs.LG

    Speaker-change Aware CRF for Dialogue Act Classification

    Authors: Guokan Shang, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis, Jean-Pierre Lorré

    Abstract: Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem, using neural network models coupled with a Conditional Random Field (CRF) as the last layer. CRF models the conditional probability of the target DA label sequence given the input utterance sequence. However, the task involves another important input sequence, that of speakers, which is ignored by p… ▽ More

    Submitted 24 June, 2023; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: typo fix: argmin -> argmax

  5. arXiv:1907.09334  [pdf

    cs.HC

    LinTO : Assistant vocal open-source respectueux des données personnelles pour les réunions d'entreprise

    Authors: Jean-Pierre Lorré, Isabelle Ferrané, Francisco Madrigal, Michalis Vazirgiannis, Christophe Bourguignat

    Abstract: This paper presents the first results of the PIA "Grands Défis du Numérique" research project LinTO. The goal of this project is to develop a conversational assistant to help the company's employees, particularly during meetings. LinTO is an interactive device equipped with microphones, a screen and a 360$^\circ$ camera, which allows to control the room, query company's information system, helps f… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: in French. Applications Pratiques de l'Intelligence Artificielle, Jul 2019, Toulouse, France

  6. arXiv:1904.09491  [pdf, other

    cs.CL cs.LG

    Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding

    Authors: Guokan Shang, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis, Jean-Pierre Lorré

    Abstract: Abstractive community detection is an important spoken language understanding task, whose goal is to group utterances in a conversation according to whether they can be jointly summarized by a common abstractive sentence. This paper provides a novel approach to this task. We first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms. We then train it u… ▽ More

    Submitted 7 November, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: Update baselines

  7. arXiv:1805.05271  [pdf, other

    cs.CL

    Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

    Authors: Guokan Shang, Wensi Ding, Zekun Zhang, Antoine Jean-Pierre Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, Jean-Pierre Lorré

    Abstract: We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations. Our work combines the strengths of multiple recent approaches while addressing their weaknesses. Moreover, we leverage recent advances in word embeddings and graph degeneracy applied to NLP to take exterior semantic knowledge into account, and to… ▽ More

    Submitted 14 November, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

    Comments: Published as a long paper at ACL 2018. v2: updated Figure 3

  8. Knowledge-based system for collaborative process specification

    Authors: Frederick Benaben, Vatcharaphun Rajsiri, Jean-Pierre Lorré, Hervé Pingaud

    Abstract: This paper presents an ontology-based approach for the design of a collaborative business process model (CBP). This CBP is considered as a specification of needs in order to build a collaboration information system (CIS) for a network of organisations. The study is a part of a model driven engineering approach of the CIS in a specific enterprise interoperability framework that will be summarised.… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.

    Comments: \<10.1016/j.compind.2009.10.012\&gt

    Journal ref: Computers and Industrial Engineering, Elsevier, 2010, 61 (2), pp.161-175

  9. arXiv:1509.09067  [pdf

    cs.SE

    Semantic issues in model-driven management of information system interoperability

    Authors: Frederick Benaben, Nicolas Boissel-Dallier, Herve Pingaud, Jean-Pierre Lorre

    Abstract: The MISE Project (Mediation Information System Engineering) aims at providing collaborating organizations with a Mediation Information System (MIS) in charge of supporting interoperability of a collaborative network. MISE proposes an overall MIS design method according to a model-driven approach, based on model transformations. This MIS is in charge of managing (i) information, (ii) functions and… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.

    Comments: http://www.tandfonline.com/toc/tcim20/current

    Journal ref: International Journal of Computer Integrated Manufacturing (IJCIM), 2013, 26 (11), pp.1042-1053