-
Agro-STAY : Collecte de données et analyse des informations en agriculture alternative issues de YouTube
Authors:
Laura Maxim,
Julien Rabatel,
Jean-Marc Douguet,
Natalia Grabar,
Roberto Interdonato,
Sébastien Loustau,
Mathieu Roche,
Maguelonne Teisseire
Abstract:
To address the current crises (climatic, social, economic), the self-sufficiency -- a set of practices that combine energy sobriety, self-production of food and energy, and self-construction - arouses an increasing interest. The CNRS STAY project (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) explores this topic by analyzing techniques shared on YouTube. We present Agro-STAY, a platform…
▽ More
To address the current crises (climatic, social, economic), the self-sufficiency -- a set of practices that combine energy sobriety, self-production of food and energy, and self-construction - arouses an increasing interest. The CNRS STAY project (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) explores this topic by analyzing techniques shared on YouTube. We present Agro-STAY, a platform designed for the collection, processing, and visualization of data from YouTube videos and their comments. We use Natural Language Processing (NLP) techniques and language models, which enable a fine-grained analysis of alternative agricultural practice described online.
--
Face aux crises actuelles (climatiques, sociales, économiques), l'auto-suffisance -- ensemble de pratiques combinant sobriété énergétique, autoproduction alimentaire et énergétique et autoconstruction - suscite un intérêt croissant. Le projet CNRS STAY (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) s'inscrit dans ce domaine en analysant les savoirs techniques diffusés sur YouTube. Nous présentons Agro-STAY, une plateforme dédiée à la collecte, au traitement et à la visualisation de données issues de vidéos YouTube et de leurs commentaires. En mobilisant des techniques de traitement automatique des langues (TAL) et des modèles de langues, ce travail permet une analyse fine des pratiques agricoles alternatives décrites en ligne.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations
Authors:
Rémy Decoupes,
Roberto Interdonato,
Mathieu Roche,
Maguelonne Teisseire,
Sarah Valentin
Abstract:
Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inherent biases. In the field of Natural Language Processing, five sources of bias are well-identified: data, annotation, representation, models, and research design. This study focuses on biases related to geographi…
▽ More
Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inherent biases. In the field of Natural Language Processing, five sources of bias are well-identified: data, annotation, representation, models, and research design. This study focuses on biases related to geographical knowledge. We explore the connection between geography and language models by highlighting their tendency to misrepresent spatial information, thus leading to distortions in the representation of geographical distances. This study introduces four indicators to assess these distortions, by comparing geographical and semantic distances. Experiments are conducted from these four indicators with ten widely used language models. Results underscore the critical necessity of inspecting and rectifying spatial biases in language models to ensure accurate and equitable representations.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
An evaluation framework for comparing epidemic intelligence systems
Authors:
Nejat Arinik,
Roberto Interdonato,
Mathieu Roche,
Maguelonne Teisseire
Abstract:
In the context of Epidemic Intelligence, many Event-Based Surveillance (EBS) systems have been proposed in the literature to promote the early identification and characterization of potential health threats from online sources of any nature. Each EBS system has its own surveillance definitions and priorities, therefore this makes the task of selecting the most appropriate EBS system for a given si…
▽ More
In the context of Epidemic Intelligence, many Event-Based Surveillance (EBS) systems have been proposed in the literature to promote the early identification and characterization of potential health threats from online sources of any nature. Each EBS system has its own surveillance definitions and priorities, therefore this makes the task of selecting the most appropriate EBS system for a given situation a challenge for end-users. In this work, we propose a new evaluation framework to address this issue. It first transforms the raw input epidemiological event data into a set of normalized events with multi-granularity, then conducts a descriptive retrospective analysis based on four evaluation objectives: spatial, temporal, thematic and source analysis. We illustrate its relevance by applying it to an Avian Influenza dataset collected by a selection of EBS systems, and show how our framework allows identifying their strengths and drawbacks in terms of epidemic surveillance.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Methodology for identifying study sites in scientific corpus
Authors:
Eric Kergosien,
Marie-Noëlle Bessagnet,
Maguelonne Teisseire,
Joachim Schöpfel,
Mohammad Amin Farvardin,
Stéphane Chaudiron,
Bernard Jacquemin,
Annig Le Parc-Lacayrelle,
Mathieu Roche,
Christian Sallaberry,
Jean-Philippe Tonneau,
Marie-Noelle Bessagnet,
Amin Farvardin,
Annig Lacayrelle
Abstract:
The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publicati…
▽ More
The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publications resulting from the corpus analyzed, (2) to identify the thematics addressed in these works and (3) to develop a web-based geographical information retrieval tool (GIR). The first two actions involve approaches combining Natural languages processing patterns with text mining methods. By crossing the three dimensions (spatial, thematic and temporal) in a GIR engine, it will be possible to understand what research has been carried out on which territories and at what time. In the project, the experiments are carried out on a heterogeneous corpus including electronic thesis and scientific articles from the ISTEX digital libraries and the CIRAD research center.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Automatic Identification of Research Fields in Scientific Papers
Authors:
Eric Kergosien,
Amin Farvardin,
Maguelonne Teisseire,
Marie-Noëlle Bessagnet,
Joachim Schöpfel,
Stéphane Chaudiron,
Bernard Jacquemin,
Annig Le Parc-Lacayrelle,
Mathieu Roche,
Christian Sallaberry,
Jean-Philippe Tonneau
Abstract:
The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identificatio…
▽ More
The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identification of the themes which appear in these documents, and (3) development of a web-based geographical information retrieval tool (GIR). The first two actions combine Natural Language Processing patterns with text mining methods. The integration of the spatial, thematic and temporal dimensions in a GIR contributes to a better understanding of what kind of research has been carried out, of its topics and its geographical and historical coverage. Another originality of the TERRE-ISTEX project is the heterogeneous character of the corpus, including PhD theses and scientific articles from the ISTEX digital libraries and the CIRAD research center.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
GeT_Move: An Efficient and Unifying Spatio-Temporal Pattern Mining Algorithm for Moving Objects
Authors:
Phan Nhat Hai,
Pascal Poncelet,
Maguelonne Teisseire
Abstract:
Recent improvements in positioning technology has led to a much wider availability of massive moving object data. A crucial task is to find the moving objects that travel together. Usually, these object sets are called spatio-temporal patterns. Due to the emergence of many different kinds of spatio-temporal patterns in recent years, different approaches have been proposed to extract them. However,…
▽ More
Recent improvements in positioning technology has led to a much wider availability of massive moving object data. A crucial task is to find the moving objects that travel together. Usually, these object sets are called spatio-temporal patterns. Due to the emergence of many different kinds of spatio-temporal patterns in recent years, different approaches have been proposed to extract them. However, each approach only focuses on mining a specific kind of pattern. In addition to being a painstaking task due to the large number of algorithms used to mine and manage patterns, it is also time consuming. Moreover, we have to execute these algorithms again whenever new data are added to the existing database. To address these issues, we first redefine spatio-temporal patterns in the itemset context. Secondly, we propose a unifying approach, named GeT_Move, which uses a frequent closed itemset-based spatio-temporal pattern-mining algorithm to mine and manage different spatio-temporal patterns. GeT_Move is implemented in two versions which are GeT_Move and Incremental GeT_Move. To optimize the efficiency and to free the parameters setting, we also propose a Parameter Free Incremental GeT_Move algorithm. Comprehensive experiments are performed on real datasets as well as large synthetic datasets to demonstrate the effectiveness and efficiency of our approaches.
△ Less
Submitted 4 April, 2012;
originally announced April 2012.
-
Low temperature reflectivity study of ZnO/(Zn,Mg)O quantum wells grown on M-plane ZnO substrates
Authors:
Luc Beaur,
Thierry Bretagnon,
Christelle Brimont,
Thierry Guillet,
Bernard Gil,
Dimitri Tainoff,
M. Teisseire,
J. M. Chauveau
Abstract:
We report growth of high quality ZnO/Zn0.8Mg0.2O quantum well on M-plane oriented ZnO substrates. The optical properties of these quantum wells are studied by using reflectance spectroscopy. The optical spectra reveal strong in-plane optical anisotropies, as predicted by group theory, and marked reflectance structures, as an evidence of good interface morphologies. Signatures ofc onfined excitons…
▽ More
We report growth of high quality ZnO/Zn0.8Mg0.2O quantum well on M-plane oriented ZnO substrates. The optical properties of these quantum wells are studied by using reflectance spectroscopy. The optical spectra reveal strong in-plane optical anisotropies, as predicted by group theory, and marked reflectance structures, as an evidence of good interface morphologies. Signatures ofc onfined excitons built from the spin-orbit split-off valence band, the analog of exciton C in bulk ZnO are detected in normal incidence reflectivity experiments using a photon polarized along the c axis of the wurtzite lattice. Experiments performed in the context of an orthogonal photon polarization, at 90^{\circ}; of this axis, reveal confined states analogs of A and B bulk excitons. Envelope function calculations which include excitonic interaction nicely account for the experimental report.
△ Less
Submitted 17 October, 2011; v1 submitted 10 January, 2011;
originally announced January 2011.