-
Two-way automata and transducers with planar behaviours are aperiodic
Authors:
Lê Thành Dũng Nguyên,
Camille Noûs,
Cécilia Pradic
Abstract:
We consider a notion of planarity for two-way finite automata and transducers, inspired by Temperley-Lieb monoids of planar diagrams. We show that this restriction captures star-free languages and first-order transductions.
We consider a notion of planarity for two-way finite automata and transducers, inspired by Temperley-Lieb monoids of planar diagrams. We show that this restriction captures star-free languages and first-order transductions.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
The Domino problem is undecidable on every rhombus subshift
Authors:
Benjamin Hellouin de Menibus,
Victor H. Lutfalla,
Camille Noûs
Abstract:
We extend the classical Domino problem to any tiling of rhombus-shaped tiles. For any subshift X of edge-to-edge rhombus tilings, such as the Penrose subshift, we prove that the associated X-Domino problem is $Π^0_1$ -hard and therefore undecidable. It is $Π^0_1$ -complete when the subshift X is given by a computable sequence of forbidden patterns.
We extend the classical Domino problem to any tiling of rhombus-shaped tiles. For any subshift X of edge-to-edge rhombus tilings, such as the Penrose subshift, we prove that the associated X-Domino problem is $Π^0_1$ -hard and therefore undecidable. It is $Π^0_1$ -complete when the subshift X is given by a computable sequence of forbidden patterns.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Rumor Classification through a Multimodal Fusion Framework and Ensemble Learning
Authors:
Abderrazek Azri,
Cécile Favre,
Nouria Harbi,
Jérôme Darmont,
Camille Noûs
Abstract:
The proliferation of rumors on social media has become a major concern due to its ability to create a devastating impact. Manually assessing the veracity of social media messages is a very time-consuming task that can be much helped by machine learning. Most message veracity verification methods only exploit textual contents and metadata. Very few take both textual and visual contents, and more pa…
▽ More
The proliferation of rumors on social media has become a major concern due to its ability to create a devastating impact. Manually assessing the veracity of social media messages is a very time-consuming task that can be much helped by machine learning. Most message veracity verification methods only exploit textual contents and metadata. Very few take both textual and visual contents, and more particularly images, into account. Moreover, prior works have used many classical machine learning models to detect rumors. However, although recent studies have proven the effectiveness of ensemble machine learning approaches, such models have seldom been applied. Thus, in this paper, we propose a set of advanced image features that are inspired from the field of image quality assessment, and introduce the Multimodal fusiON framework to assess message veracIty in social neTwORks (MONITOR), which exploits all message features by exploring various machine learning models. Moreover, we demonstrate the effectiveness of ensemble learning algorithms for rumor detection by using five metalearning models. Eventually, we conduct extensive experiments on two real-world datasets. Results show that MONITOR outperforms state-of-the-art machine learning baselines and that ensemble models significantly increase MONITOR's performance.
△ Less
Submitted 4 January, 2023;
originally announced February 2023.
-
Some Notes on Polyadic Concept Analysis
Authors:
Alexandre Bazin,
Giacomo Kahn,
Camille Noûs
Abstract:
Despite the popularity of Formal Concept Analysis (FCA) as a mathematical framework for data analysis, some of its extensions are still considered arcane. Polyadic Concept Analysis (PCA) is one of the most promising yet understudied of these extensions. This formalism offers many interesting open questions but is hindered in its dissemination by complex notations and a lack of agreed-upon basic de…
▽ More
Despite the popularity of Formal Concept Analysis (FCA) as a mathematical framework for data analysis, some of its extensions are still considered arcane. Polyadic Concept Analysis (PCA) is one of the most promising yet understudied of these extensions. This formalism offers many interesting open questions but is hindered in its dissemination by complex notations and a lack of agreed-upon basic definitions. In this paper, we discuss in a mostly informal way the fundamental differences between FCA and PCA in the relation between contexts, conceptual structures, and rules. We identify open questions, present partial results on the maximal size of concept n-lattices and suggest new research directions.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Minimizing subject-dependent calibration for BCI with Riemannian transfer learning
Authors:
Salim Khazem,
Sylvain Chevallier,
Quentin Barthélemy,
Karim Haroun,
Camille Noûs
Abstract:
Calibration is still an important issue for user experience in Brain-Computer Interfaces (BCI). Common experimental designs often involve a lengthy training period that raises the cognitive fatigue, before even starting to use the BCI. Reducing or suppressing this subject-dependent calibration is possible by relying on advanced machine learning techniques, such as transfer learning. Building on Ri…
▽ More
Calibration is still an important issue for user experience in Brain-Computer Interfaces (BCI). Common experimental designs often involve a lengthy training period that raises the cognitive fatigue, before even starting to use the BCI. Reducing or suppressing this subject-dependent calibration is possible by relying on advanced machine learning techniques, such as transfer learning. Building on Riemannian BCI, we present a simple and effective scheme to train a classifier on data recorded from different subjects, to reduce the calibration while preserving good performances. The main novelty of this paper is to propose a unique approach that could be applied on very different paradigms. To demonstrate the robustness of this approach, we conducted a meta-analysis on multiple datasets for three BCI paradigms: event-related potentials (P300), motor imagery and SSVEP. Relying on the MOABB open source framework to ensure the reproducibility of the experiments and the statistical analysis, the results clearly show that the proposed approach could be applied on any kind of BCI paradigm and in most of the cases to significantly improve the classifier reliability. We point out some key features to further improve transfer learning methods.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Calling to CNN-LSTM for Rumor Detection: A Deep Multi-channel Model for Message Veracity Classification in Microblogs
Authors:
Abderrazek Azri,
Cécile Favre,
Nouria Harbi,
Jérôme Darmont,
Camille Noûs
Abstract:
Reputed by their low-cost, easy-access, real-time and valuable information, social media also wildly spread unverified or fake news. Rumors can notably cause severe damage on individuals and the society. Therefore, rumor detection on social media has recently attracted tremendous attention. Most rumor detection approaches focus on rumor feature analysis and social features, i.e., metadata in socia…
▽ More
Reputed by their low-cost, easy-access, real-time and valuable information, social media also wildly spread unverified or fake news. Rumors can notably cause severe damage on individuals and the society. Therefore, rumor detection on social media has recently attracted tremendous attention. Most rumor detection approaches focus on rumor feature analysis and social features, i.e., metadata in social media. Unfortunately, these features are data-specific and may not always be available, e.g., when the rumor has just popped up and not yet propagated. In contrast, post contents (including images or videos) play an important role and can indicate the diffusion purpose of a rumor. Furthermore, rumor classification is also closely related to opinion mining and sentiment analysis. Yet, to the best of our knowledge, exploiting images and sentiments is little investigated.Considering the available multimodal features from microblogs, notably, we propose in this paper an end-to-end model called deepMONITOR that is based on deep neural networks and allows quite accurate automated rumor verification, by utilizing all three characteristics: post textual and image contents, as well as sentiment. deepMONITOR concatenates image features with the joint text and sentiment features to produce a reliable, fused classification. We conduct extensive experiments on two large-scale, real-world datasets. The results show that deepMONITOR achieves a higher accuracy than state-of-the-art methods.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
MONITOR: A Multimodal Fusion Framework to Assess Message Veracity in Social Networks
Authors:
Abderrazek Azri,
Cécile Favre,
Nouria Harbi,
Jérôme Darmont,
Camille Noûs
Abstract:
Users of social networks tend to post and share content with little restraint. Hence, rumors and fake news can quickly spread on a huge scale. This may pose a threat to the credibility of social media and can cause serious consequences in real life. Therefore, the task of rumor detection and verification has become extremely important. Assessing the veracity of a social media message (e.g., by fac…
▽ More
Users of social networks tend to post and share content with little restraint. Hence, rumors and fake news can quickly spread on a huge scale. This may pose a threat to the credibility of social media and can cause serious consequences in real life. Therefore, the task of rumor detection and verification has become extremely important. Assessing the veracity of a social media message (e.g., by fact checkers) involves analyzing the text of the message, its context and any multimedia attachment. This is a very time-consuming task that can be much helped by machine learning. In the literature, most message veracity verification methods only exploit textual contents and metadata. Very few take both textual and visual contents, and more particularly images, into account. In this paper, we second the hypothesis that exploiting all of the components of a social media post enhances the accuracy of veracity detection. To further the state of the art, we first propose using a set of advanced image features that are inspired from the field of image quality assessment, which effectively contributes to rumor detection. These metrics are good indicators for the detection of fake images, even for those generated by advanced techniques like generative adversarial networks (GANs). Then, we introduce the Multimodal fusiON framework to assess message veracIty in social neTwORks (MONITOR), which exploits all message features (i.e., text, social context, and image features) by supervised machine learning. Such algorithms provide interpretability and explainability in the decisions taken, which we believe is particularly important in the context of rumor verification. Experimental results show that MONITOR can detect rumors with an accuracy of 96% and 89% on the MediaEval benchmark and the FakeNewsNet dataset, respectively. These results are significantly better than those of state-of-the-art machine learning baselines.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Joint Management and Analysis of Textual Documents and Tabular Data within the AUDAL Data Lake
Authors:
Pegdwendé Sawadogo,
Jérôme Darmont,
Camille Noûs
Abstract:
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the concept of data lake is still maturing, and there are still few methodological approaches to data lake design. Thus, we introduce a new approach to d…
▽ More
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the concept of data lake is still maturing, and there are still few methodological approaches to data lake design. Thus, we introduce a new approach to design a data lake and propose an extensive metadata system to activate richer features than those usually supported in data lake approaches. We implement our approach in the AUDAL data lake, where we jointly exploit both textual documents and tabular data, in contrast with structured and/or semi-structured data typically processed in data lakes from the literature. Furthermore, we also innovate by leveraging metadata to activate both data retrieval and content analysis, including Text-OLAP and SQL querying. Finally, we show the feasibility of our approach using a real-word use case on the one hand, and a benchmark on the other hand.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics
Authors:
Pengfei Liu,
Sabine Loudcher,
Jérôme Darmont,
Camille Noûs
Abstract:
With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively.…
▽ More
With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively. A few approaches have been proposed, but none of them covers the full data lifecycle nor provides an efficient data management system. Hence, we propose the use of a data lake to provide centralized data stores to host heterogeneous data, as well as tools for data quality checking, cleaning, transformation, and analysis. In this paper, we propose a generic, flexible and complete data lake architecture. Our metadata management system exploits goldMEDAL, which is the most complete metadata model currently available. Finally, we detail the concrete implementation of this architecture dedicated to an archaeological project.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
goldMEDAL : une nouvelle contribution {à} la mod{é}lisation g{é}n{é}rique des m{é}tadonn{é}es des lacs de donn{é}es
Authors:
Etienne Scholly,
Pegdwendé Sawadogo,
Pengfei Liu,
Javier Espinosa-Oviedo,
Cécile Favre,
Sabine Loudcher,
Jérôme Darmont,
Camille Noûs
Abstract:
We summarize here a paper published in 2021 in the DOLAP international workshop DOLAP associated with the EDBT and ICDT conferences. We propose goldMEDAL, a generic metadata model for data lakes based on four concepts and a three-level modeling: conceptual, logical and physical.
We summarize here a paper published in 2021 in the DOLAP international workshop DOLAP associated with the EDBT and ICDT conferences. We propose goldMEDAL, a generic metadata model for data lakes based on four concepts and a three-level modeling: conceptual, logical and physical.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Comparison-free polyregular functions
Authors:
Lê Thành Dũng Tito Nguyên,
Camille Noûs,
Cécilia Pradic
Abstract:
This paper introduces a new automata-theoretic class of string-to-string functions with polynomial growth. Several equivalent definitions are provided: a machine model which is a restricted variant of pebble transducers, and a few inductive definitions that close the class of regular functions under certain operations. Our motivation for studying this class comes from another characterization, whi…
▽ More
This paper introduces a new automata-theoretic class of string-to-string functions with polynomial growth. Several equivalent definitions are provided: a machine model which is a restricted variant of pebble transducers, and a few inductive definitions that close the class of regular functions under certain operations. Our motivation for studying this class comes from another characterization, which we merely mention here but prove elsewhere, based on a $λ$-calculus with a linear type system.As their name suggests, these comparison-free polyregular functions form a subclass of polyregular functions; we prove that the inclusion is strict. We also show that they are incomparable with HDT0L transductions, closed under usual function composition -- but not under a certain ``map'' combinator -- and satisfy a comparison-free version of the pebble minimization theorem.On the broader topic of polynomial growth transductions, we also consider the recently introduced layered streaming string transducers (SSTs), or equivalently k-marble transducers. We prove that a function can be obtained by composing such transducers together if and only if it is polyregular, and that k-layered SSTs (or k-marble transducers) are closed under ``map'' and equivalent to a corresponding notion of (k+1)-layered HDT0L systems.
△ Less
Submitted 22 February, 2023; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Coining goldMEDAL: A New Contribution to Data Lake Generic Metadata Modeling
Authors:
Etienne Scholly,
Pegdwendé Sawadogo,
Pengfei Liu,
Javier Alfonso Espinosa-Oviedo,
Cécile Favre,
Sabine Loudcher,
Jérôme Darmont,
Camille Noûs
Abstract:
The rise of big data has revolutionized data exploitation practices and led to the emergence of new concepts. Among them, data lakes have emerged as large heterogeneous data repositories that can be analyzed by various methods. An efficient data lake requires a metadata system that addresses the many problems arising when dealing with big data. In consequence, the study of data lake metadata model…
▽ More
The rise of big data has revolutionized data exploitation practices and led to the emergence of new concepts. Among them, data lakes have emerged as large heterogeneous data repositories that can be analyzed by various methods. An efficient data lake requires a metadata system that addresses the many problems arising when dealing with big data. In consequence, the study of data lake metadata models is currently an active research topic and many proposals have been made in this regard. However, existing metadata models are either tailored for a specific use case or insufficiently generic to manage different types of data lakes, including our previous model MEDAL. In this paper, we generalize MEDAL's concepts in a new metadata model called goldMEDAL. Moreover, we compare goldMEDAL with the most recent state-of-the-art metadata models aiming at genericity and show that we can reproduce these metadata models with goldMEDAL's concepts. As a proof of concept, we also illustrate that goldMEDAL allows the design of various data lakes by presenting three different use cases.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
RIGOLETTO -- RIemannian GeOmetry LEarning: applicaTion To cOnnectivity. A contribution to the Clinical BCI Challenge -- WCCI2020
Authors:
Marie-Constance Corsi,
Florian Yger,
Sylvain Chevallier,
Camille Noûs
Abstract:
This short technical report describes the approach submitted to the Clinical BCI Challenge-WCCI2020. This submission aims to classify motor imagery task from EEG signals and relies on Riemannian Geometry, with a twist. Instead of using the classical covariance matrices, we also rely on measures of functional connectivity. Our approach ranked 1st on the task 1 of the competition.
This short technical report describes the approach submitted to the Clinical BCI Challenge-WCCI2020. This submission aims to classify motor imagery task from EEG signals and relies on Riemannian Geometry, with a twist. Instead of using the classical covariance matrices, we also rely on measures of functional connectivity. Our approach ranked 1st on the task 1 of the competition.
△ Less
Submitted 11 March, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Data Lakes for Digital Humanities
Authors:
Jérôme Darmont,
Cécile Favre,
Sabine Loudcher,
Camille Noûs
Abstract:
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run i…
▽ More
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Implicit automata in typed $λ$-calculi II: streaming transducers vs categorical semantics
Authors:
Lê Thành Dũng Nguyên,
Camille Noûs,
Cécilia Pradic
Abstract:
We characterize regular string transductions as programs in a linear $λ$-calculus with additives. One direction of this equivalence is proved by encoding copyless streaming string transducers (SSTs), which compute regular functions, into our $λ$-calculus. For the converse, we consider a categorical framework for defining automata and transducers over words, which allows us to relate register upd…
▽ More
We characterize regular string transductions as programs in a linear $λ$-calculus with additives. One direction of this equivalence is proved by encoding copyless streaming string transducers (SSTs), which compute regular functions, into our $λ$-calculus. For the converse, we consider a categorical framework for defining automata and transducers over words, which allows us to relate register updates in SSTs to the semantics of the linear $λ$-calculus in a suitable monoidal closed category. To illustrate the relevance of monoidal closure to automata theory, we also leverage this notion to give abstract generalizations of the arguments showing that copyless SSTs may be determinized and that the composition of two regular functions may be implemented by a copyless SST. Our main result is then generalized from strings to trees using a similar approach. In doing so, we exhibit a connection between a feature of streaming tree transducers and the multiplicative/additive distinction of linear logic.
Keywords: MSO transductions, implicit complexity, Dialectica categories, Church encodings
△ Less
Submitted 25 August, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Symbolic coding of linear complexity for generic translations of the torus, using continued fractions
Authors:
N. Pytheas Fogg,
C. Noûs
Abstract:
In this paper, we prove that almost every translation of $\mathbb{T}^2$ admits a symbolic coding which has linear complexity $2n+1$. The partitions are constructed with Rauzy fractals associated with sequences of substitutions, which are produced by a particular extended continued fraction algorithm in projective dimension $2$. More generally, in dimension $d\geq 1$, we study extended measured con…
▽ More
In this paper, we prove that almost every translation of $\mathbb{T}^2$ admits a symbolic coding which has linear complexity $2n+1$. The partitions are constructed with Rauzy fractals associated with sequences of substitutions, which are produced by a particular extended continued fraction algorithm in projective dimension $2$. More generally, in dimension $d\geq 1$, we study extended measured continued fraction algorithms, which associate to each direction a subshift generated by substitutions, called $S$-adic subshift. We give some conditions which imply the existence, for almost every direction, of a translation of the torus $\mathbb{T}^d$ and a nice generating partition, such that the associated coding is a conjugacy with the subshift.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
#P-completeness of counting update digraphs, cacti, and a series-parallel decomposition method
Authors:
Camille Noûs,
Kévin Perrot,
Sylvain Sené,
Lucas Venturini
Abstract:
Automata networks are a very general model of interacting entities, with applications to biological phenomena such as gene regulation. In many contexts, the order in which entities update their state is unknown, and the dynamics may be very sensitive to changes in this schedule of updates. Since the works of Aracena et. al, it is known that update digraphs are pertinent objects to study non-equiva…
▽ More
Automata networks are a very general model of interacting entities, with applications to biological phenomena such as gene regulation. In many contexts, the order in which entities update their state is unknown, and the dynamics may be very sensitive to changes in this schedule of updates. Since the works of Aracena et. al, it is known that update digraphs are pertinent objects to study non-equivalent block-sequential update schedules. We prove that counting the number of equivalence classes, that is a tight upper bound on the synchronism sensitivity of a given network, is #P-complete. The problem is nevertheless computable in quasi-quadratic time for oriented cacti, and for oriented series-parallel graphs thanks to a decomposition method.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.