-
Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings
Authors:
Francesco D'Amico,
Dario Bocchi,
Luca Maria Del Bono,
Saverio Rossi,
Matteo Negri
Abstract:
Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implemen…
▽ More
Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whose basins of attraction exceed those of any classical Hopfield rule. We explain quantitatively this effect on uncorrelated random patterns. Moreover, we show that, for different structured datasets coming from computer science (random feature model, MNIST), physics (spin glasses) and biology (proteins), as the number of training examples increases the learned network goes beyond memorization, developing meaningful attractors with non-trivial correlations with test examples, thus showing the ability to generalize. Our results therefore reveal pseudo-likelihood works both as an efficient inference tool and as a principled mechanism for memory and generalization.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Identity resolution of software metadata using Large Language Models
Authors:
Eva Martín del Pico,
Josep Lluís Gelpí,
Salvador Capella-Gutiérrez
Abstract:
Software is an essential component of research. However, little attention has been paid to it compared with that paid to research data. Recently, there has been an increase in efforts to acknowledge and highlight the importance of software in research activities.
Structured metadata from platforms like bio.tools, Bioconductor, and Galaxy ToolShed offers valuable insights into research software i…
▽ More
Software is an essential component of research. However, little attention has been paid to it compared with that paid to research data. Recently, there has been an increase in efforts to acknowledge and highlight the importance of software in research activities.
Structured metadata from platforms like bio.tools, Bioconductor, and Galaxy ToolShed offers valuable insights into research software in the Life Sciences. Although originally intended to support discovery and integration, this metadata can be repurposed for large-scale analysis of software practices. However, its quality and completeness vary across platforms, reflecting diverse documentation practices.
To gain a comprehensive view of software development and sustainability, consolidating this metadata is necessary, but requires robust mechanisms to address its heterogeneity and scale.
This article presents an evaluation of instruction-tuned large language models for the task of software metadata identity resolution, a critical step in assembling a cohesive collection of research software. Such a collection is the reference component for the Software Observatory at OpenEBench, a platform that aggregates metadata to monitor the FAIRness of research software in the Life Sciences.
We benchmarked multiple models against a human-annotated gold standard, examined their behavior on ambiguous cases, and introduced an agreement-based proxy for high-confidence automated decisions. The proxy achieved high precision and statistical robustness, while also highlighting the limitations of current models and the broader challenges of automating semantic judgment in FAIR-aligned software metadata across registries and repositories.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models
Authors:
Luca Maria Del Bono,
Federico Ricci-Tersenghi,
Francesco Zamponi
Abstract:
Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here…
▽ More
Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here a complete analytic study of the widely-used Sequential Tempering procedure applied to a shallow MADE architecture for the Curie-Weiss model. The contribution of this work is twofold: firstly, we give a description of the optimal weights and of the training under Gradient Descent optimization. Secondly, we compare what happens in Sequential Tempering with and without the addition of local Metropolis Monte Carlo steps. We are thus able to give theoretical predictions on the best procedure to apply in this case. This work establishes a clear theoretical basis for the integration of machine learning techniques into Monte Carlo sampling and optimization.
△ Less
Submitted 15 June, 2025; v1 submitted 28 May, 2025;
originally announced May 2025.
-
GPU acceleration of non-equilibrium Green's function calculation using OpenACC and CUDA FORTRAN
Authors:
Jia Yin,
Khaled Z. Ibrahim,
Mauro Del Ben,
Jack Deslippe,
Yang-hao Chan,
Chao Yang
Abstract:
The numerical solution of the Kadanoff-Baym nonlinear integro-differential equations, which yields the non-equilibrium Green's functions (NEGFs) of quantum many-body systems, poses significant computational challenges due to its high computational complexity. In this work, we present efficient implementations of a numerical method for solving these equations on distributed-memory architectures, in…
▽ More
The numerical solution of the Kadanoff-Baym nonlinear integro-differential equations, which yields the non-equilibrium Green's functions (NEGFs) of quantum many-body systems, poses significant computational challenges due to its high computational complexity. In this work, we present efficient implementations of a numerical method for solving these equations on distributed-memory architectures, including many-core CPUs and multi-GPU systems. For CPU-based platforms, we adopt a hybrid MPI/OpenMP programming model to exploit both inter-node and intra-node parallelism. On GPU-accelerated systems, we implement the method using two distinct approaches: MPI/OpenACC and MPI/CUDA FORTRAN. Several optimization strategies are employed to enhance GPU performance, including techniques to maximize computational resource utilization and minimize the overhead associated with kernel launches and memory management. Although OpenACC is easy to use, CUDA FORTRAN provides more advanced features for configuring and managing multiple levels of concurrency, while also simplifying memory allocation and data movement between host and device. This flexibility translates into significant performance improvements. We compare the performance of the three implementations and demonstrate that the GPU-based approaches achieve substantial speedups over CPU-based implementations. Furthermore, both CPU and GPU versions exhibit excellent strong and weak scaling, confirming the scalability and efficiency of our approach for large-scale NEGF computations.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
I'll believe it when I see it: Images increase misinformation sharing in Vision-Language Models
Authors:
Alice Plebe,
Timothy Douglas,
Diana Riazi,
R. Maria del Rio-Chanona
Abstract:
Large language models are increasingly integrated into news recommendation systems, raising concerns about their role in spreading misinformation. In humans, visual content is known to boost credibility and shareability of information, yet its effect on vision-language models (VLMs) remains unclear. We present the first study examining how images influence VLMs' propensity to reshare news content,…
▽ More
Large language models are increasingly integrated into news recommendation systems, raising concerns about their role in spreading misinformation. In humans, visual content is known to boost credibility and shareability of information, yet its effect on vision-language models (VLMs) remains unclear. We present the first study examining how images influence VLMs' propensity to reshare news content, whether this effect varies across model families, and how persona conditioning and content attributes modulate this behavior. To support this analysis, we introduce two methodological contributions: a jailbreaking-inspired prompting strategy that elicits resharing decisions from VLMs while simulating users with antisocial traits and political alignments; and a multimodal dataset of fact-checked political news from PolitiFact, paired with corresponding images and ground-truth veracity labels. Experiments across model families reveal that image presence increases resharing rates by 4.8% for true news and 15.0% for false news. Persona conditioning further modulates this effect: Dark Triad traits amplify resharing of false news, whereas Republican-aligned profiles exhibit reduced veracity sensitivity. Of all the tested models, only Claude-3-Haiku demonstrates robustness to visual misinformation. These findings highlight emerging risks in multimodal model behavior and motivate the development of tailored evaluation frameworks and mitigation strategies for personalized AI systems. Code and dataset are available at: https://github.com/3lis/misinfo_vlm
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Can Generative AI agents behave like humans? Evidence from laboratory market experiments
Authors:
R. Maria del Rio-Chanona,
Marco Pangallo,
Cars Hommes
Abstract:
We explore the potential of Large Language Models (LLMs) to replicate human behavior in economic market experiments. Compared to previous studies, we focus on dynamic feedback between LLM agents: the decisions of each LLM impact the market price at the current step, and so affect the decisions of the other LLMs at the next step. We compare LLM behavior to market dynamics observed in laboratory set…
▽ More
We explore the potential of Large Language Models (LLMs) to replicate human behavior in economic market experiments. Compared to previous studies, we focus on dynamic feedback between LLM agents: the decisions of each LLM impact the market price at the current step, and so affect the decisions of the other LLMs at the next step. We compare LLM behavior to market dynamics observed in laboratory settings and assess their alignment with human participants' behavior. Our findings indicate that LLMs do not adhere strictly to rational expectations, displaying instead bounded rationality, similarly to human participants. Providing a minimal context window i.e. memory of three previous time steps, combined with a high variability setting capturing response heterogeneity, allows LLMs to replicate broad trends seen in human experiments, such as the distinction between positive and negative feedback markets. However, differences remain at a granular level--LLMs exhibit less heterogeneity in behavior than humans. These results suggest that LLMs hold promise as tools for simulating realistic human behavior in economic contexts, though further research is needed to refine their accuracy and increase behavioral diversity.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Diffusion Models for conditional MRI generation
Authors:
Miguel Herencia García del Castillo,
Ricardo Moya Garcia,
Manuel Jesús Cerezo Mazón,
Ekaitz Arriola Garcia,
Pablo Menéndez Fernández-Miranda
Abstract:
In this article, we present a Latent Diffusion Model (LDM) for the generation of brain Magnetic Resonance Imaging (MRI), conditioning its generation based on pathology (Healthy, Glioblastoma, Sclerosis, Dementia) and acquisition modality (T1w, T1ce, T2w, Flair, PD).
To evaluate the quality of the generated images, the Fréchet Inception Distance (FID) and Multi-Scale Structural Similarity Index (…
▽ More
In this article, we present a Latent Diffusion Model (LDM) for the generation of brain Magnetic Resonance Imaging (MRI), conditioning its generation based on pathology (Healthy, Glioblastoma, Sclerosis, Dementia) and acquisition modality (T1w, T1ce, T2w, Flair, PD).
To evaluate the quality of the generated images, the Fréchet Inception Distance (FID) and Multi-Scale Structural Similarity Index (MS-SSIM) metrics were employed. The results indicate that the model generates images with a distribution similar to real ones, maintaining a balance between visual fidelity and diversity. Additionally, the model demonstrates extrapolation capability, enabling the generation of configurations that were not present in the training data.
The results validate the potential of the model to increase in the number of samples in clinical datasets, balancing underrepresented classes, and evaluating AI models in medicine, contributing to the development of diagnostic tools in radiology without compromising patient privacy.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions
Authors:
Nathanaël Carraz Rakotonirina,
Mohammed Hamdy,
Jon Ander Campos,
Lucas Weber,
Alberto Testoni,
Marzieh Fadaee,
Sandro Pezzelle,
Marco Del Tredici
Abstract:
Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions…
▽ More
Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.
△ Less
Submitted 6 June, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Towards Safer Chatbots: A Framework for Policy Compliance Evaluation of Custom GPTs
Authors:
David Rodriguez,
William Seymour,
Jose M. Del Alamo,
Jose Such
Abstract:
Large Language Models (LLMs) have gained unprecedented prominence, achieving widespread adoption across diverse domains and integrating deeply into society. The capability to fine-tune general-purpose LLMs, such as Generative Pre-trained Transformers (GPT), for specific tasks has facilitated the emergence of numerous Custom GPTs. These tailored models are increasingly made available through dedica…
▽ More
Large Language Models (LLMs) have gained unprecedented prominence, achieving widespread adoption across diverse domains and integrating deeply into society. The capability to fine-tune general-purpose LLMs, such as Generative Pre-trained Transformers (GPT), for specific tasks has facilitated the emergence of numerous Custom GPTs. These tailored models are increasingly made available through dedicated marketplaces, such as OpenAI's GPT Store. However, their black-box nature introduces significant safety and compliance risks. In this work, we present a scalable framework for the automated evaluation of Custom GPTs against OpenAI's usage policies, which define the permissible behaviors of these systems. Our framework integrates three core components: (1) automated discovery and data collection of models from the GPT store, (2) a red-teaming prompt generator tailored to specific policy categories and the characteristics of each target GPT, and (3) an LLM-as-a-judge technique to analyze each prompt-response pair for potential policy violations. We validate our framework with a manually annotated ground truth, and evaluate it through a large-scale study with 782 Custom GPTs across three categories: Romantic, Cybersecurity, and Academic GPTs. Our manual annotation process achieved an F1 score of 0.975 in identifying policy violations, confirming the reliability of the framework's assessments. The results reveal that 58.7% of the analyzed models exhibit indications of non-compliance, exposing weaknesses in the GPT store's review and approval processes. Furthermore, our findings indicate that a model's popularity does not correlate with compliance, and non-compliance issues largely stem from behaviors inherited from base models rather than user-driven customizations. We believe this approach is extendable to other chatbot platforms and policy domains, improving LLM-based systems safety.
△ Less
Submitted 14 April, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room
Authors:
Santiago Cepeda,
Olga Esteban-Sinovas,
Roberto Romero,
Vikas Singh,
Prakash Shetty,
Aliasgar Moiyadi,
Ilyess Zemmoura,
Giuseppe Roberto Giammalva,
Massimiliano Del Bene,
Arianna Barbotti,
Francesco DiMeco,
Timothy R. West,
Brian V. Nahed,
Ignacio Arrese,
Roberto Hornero,
Rosario Sarabia
Abstract:
Intraoperative ultrasound (ioUS) is a valuable tool in brain tumor surgery due to its versatility, affordability, and seamless integration into the surgical workflow. However, its adoption remains limited, primarily because of the challenges associated with image interpretation and the steep learning curve required for effective use. This study aimed to enhance the interpretability of ioUS images…
▽ More
Intraoperative ultrasound (ioUS) is a valuable tool in brain tumor surgery due to its versatility, affordability, and seamless integration into the surgical workflow. However, its adoption remains limited, primarily because of the challenges associated with image interpretation and the steep learning curve required for effective use. This study aimed to enhance the interpretability of ioUS images by developing a real-time brain tumor detection system deployable in the operating room. We collected 2D ioUS images from the Brain Tumor Intraoperative Database (BraTioUS) and the public ReMIND dataset, annotated with expert-refined tumor labels. Using the YOLO11 architecture and its variants, we trained object detection models to identify brain tumors. The dataset included 1,732 images from 192 patients, divided into training, validation, and test sets. Data augmentation expanded the training set to 11,570 images. In the test dataset, YOLO11s achieved the best balance of precision and computational efficiency, with a mAP@50 of 0.95, mAP@50-95 of 0.65, and a processing speed of 34.16 frames per second. The proposed solution was prospectively validated in a cohort of 15 consecutively operated patients diagnosed with brain tumors. Neurosurgeons confirmed its seamless integration into the surgical workflow, with real-time predictions accurately delineating tumor regions. These findings highlight the potential of real-time object detection algorithms to enhance ioUS-guided brain tumor surgery, addressing key challenges in interpretation and providing a foundation for future development of computer vision-based tools for neuro-oncological surgery.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Addressing Multilabel Imbalance with an Efficiency-Focused Approach Using Diffusion Model-Generated Synthetic Samples
Authors:
Francisco Charte,
Miguel Ángel Dávila,
María Dolores Pérez-Godoy,
María José del Jesus
Abstract:
Predictive models trained on imbalanced data tend to produce biased results. This problem is exacerbated when there is not just one output label, but a set of them. This is the case for multilabel learning (MLL) algorithms used to classify patterns, rank labels, or learn the distribution of outputs. Many solutions have been proposed in the literature. The one that can be applied universally, indep…
▽ More
Predictive models trained on imbalanced data tend to produce biased results. This problem is exacerbated when there is not just one output label, but a set of them. This is the case for multilabel learning (MLL) algorithms used to classify patterns, rank labels, or learn the distribution of outputs. Many solutions have been proposed in the literature. The one that can be applied universally, independent of the algorithm used to build the model, is data resampling. The generation of new instances associated with minority labels, so that empty areas of the feature space are filled, helps to improve the obtained models. The quality of these new instances depends on the algorithm used to generate them. In this paper, a diffusion model tailored to produce new instances for MLL data, called MLDM (\textit{MultiLabel Diffusion Model}), is proposed. Diffusion models have been mainly used to generate artificial images and videos. Our proposed MLDM is based on this type of models. The experiments conducted compare MLDM with several other MLL resampling algorithms. The results show that MLDM is competitive while it improves efficiency.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
Responsible AI Governance: A Response to UN Interim Report on Governing AI for Humanity
Authors:
Sarah Kiden,
Bernd Stahl,
Beverley Townsend,
Carsten Maple,
Charles Vincent,
Fraser Sampson,
Geoff Gilbert,
Helen Smith,
Jayati Deshmukh,
Jen Ross,
Jennifer Williams,
Jesus Martinez del Rincon,
Justyna Lisinska,
Karen O'Shea,
Márjory Da Costa Abreu,
Nelly Bencomo,
Oishi Deb,
Peter Winter,
Phoebe Li,
Philip Torr,
Pin Lean Lau,
Raquel Iniesta,
Gopal Ramchurn,
Sebastian Stein,
Vahid Yazdanpanah
Abstract:
This report presents a comprehensive response to the United Nation's Interim Report on Governing Artificial Intelligence (AI) for Humanity. It emphasizes the transformative potential of AI in achieving the Sustainable Development Goals (SDGs) while acknowledging the need for robust governance to mitigate associated risks. The response highlights opportunities for promoting equitable, secure, and i…
▽ More
This report presents a comprehensive response to the United Nation's Interim Report on Governing Artificial Intelligence (AI) for Humanity. It emphasizes the transformative potential of AI in achieving the Sustainable Development Goals (SDGs) while acknowledging the need for robust governance to mitigate associated risks. The response highlights opportunities for promoting equitable, secure, and inclusive AI ecosystems, which should be supported by investments in infrastructure and multi-stakeholder collaborations across jurisdictions. It also underscores challenges, including societal inequalities exacerbated by AI, ethical concerns, and environmental impacts. Recommendations advocate for legally binding norms, transparency, and multi-layered data governance models, alongside fostering AI literacy and capacity-building initiatives. Internationally, the report calls for harmonising AI governance frameworks with established laws, human rights standards, and regulatory approaches. The report concludes with actionable principles for fostering responsible AI governance through collaboration among governments, industry, academia, and civil society, ensuring the development of AI aligns with universal human values and the public good.
△ Less
Submitted 31 December, 2024; v1 submitted 29 November, 2024;
originally announced December 2024.
-
The NetMob2024 Dataset: Population Density and OD Matrices from Four LMIC Countries
Authors:
Wenlan Zhang,
Miguel Nunez del Prado,
Vincent Gauthier,
Sveta Milusheva
Abstract:
The NetMob24 dataset offers a unique opportunity for researchers from a range of academic fields to access comprehensive spatiotemporal data sets spanning four countries (India, Mexico, Indonesia, and Colombia) over the course of two years (2019 and 2020). This dataset, developed in collaboration with Cuebiq (Also referred to as Spectus), comprises privacy-preserving aggregated data sets derived f…
▽ More
The NetMob24 dataset offers a unique opportunity for researchers from a range of academic fields to access comprehensive spatiotemporal data sets spanning four countries (India, Mexico, Indonesia, and Colombia) over the course of two years (2019 and 2020). This dataset, developed in collaboration with Cuebiq (Also referred to as Spectus), comprises privacy-preserving aggregated data sets derived from mobile application (app) data collected from users who have voluntarily consented to anonymous data collection for research purposes. It is our hope that this reference dataset will foster the production of new research methods and the reproducibility of research outcomes.
△ Less
Submitted 2 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Quantification of stylistic differences in human- and ASR-produced transcripts of African American English
Authors:
Annika Heuser,
Tyler Kendall,
Miguel del Rio,
Quinten McNamara,
Nishchal Bhandari,
Corey Miller,
Migüel Jetté
Abstract:
Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as human transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a significant role in ASR performance evaluation when differences exist between training and test datasets. The problem is compounded for speech from underrepres…
▽ More
Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as human transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a significant role in ASR performance evaluation when differences exist between training and test datasets. The problem is compounded for speech from underrepresented varieties, where the speech to orthography mapping is not as standardized. We categorize the kinds of stylistic differences between 6 transcription versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verbatim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER). The results, and overall analysis, help clarify how ASR outputs are a function of the decisions made by the training data's human transcribers.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
UnLearning from Experience to Avoid Spurious Correlations
Authors:
Jeff Mitchell,
Jesús Martínez del Rincón,
Niall McLaughlin
Abstract:
While deep neural networks can achieve state-of-the-art performance in many tasks, these models are more fragile than they appear. They are prone to learning spurious correlations in their training data, leading to surprising failure cases. In this paper, we propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE). Our method is based on using two…
▽ More
While deep neural networks can achieve state-of-the-art performance in many tasks, these models are more fragile than they appear. They are prone to learning spurious correlations in their training data, leading to surprising failure cases. In this paper, we propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE). Our method is based on using two classification models trained in parallel: student and teacher models. Both models receive the same batches of training data. The student model is trained with no constraints and pursues the spurious correlations in the data. The teacher model is trained to solve the same classification problem while avoiding the mistakes of the student model. As training is done in parallel, the better the student model learns the spurious correlations, the more robust the teacher model becomes. The teacher model uses the gradient of the student's output with respect to its input to unlearn mistakes made by the student. We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Foundation Models for Music: A Survey
Authors:
Yinghao Ma,
Anders Øland,
Anton Ragni,
Bleiz MacSen Del Sette,
Charalampos Saitis,
Chris Donahue,
Chenghua Lin,
Christos Plachouras,
Emmanouil Benetos,
Elona Shatri,
Fabio Morreale,
Ge Zhang,
György Fazekas,
Gus Xia,
Huan Zhang,
Ilaria Manco,
Jiawen Huang,
Julien Guinot,
Liwei Lin,
Luca Marinelli,
Max W. Y. Lam,
Megha Sharma,
Qiuqiang Kong,
Roger B. Dannenberg,
Ruibin Yuan
, et al. (17 additional authors not shown)
Abstract:
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi…
▽ More
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.
△ Less
Submitted 3 September, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes
Authors:
Doru Thom Popovici,
Mauro del Ben,
Osni Marques,
Andrew Canning
Abstract:
Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms…
▽ More
Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Large Language Models: A New Approach for Privacy Policy Analysis at Scale
Authors:
David Rodriguez,
Ian Yang,
Jose M. Del Alamo,
Norman Sadeh
Abstract:
The number and dynamic nature of web and mobile applications presents significant challenges for assessing their compliance with data protection laws. In this context, symbolic and statistical Natural Language Processing (NLP) techniques have been employed for the automated analysis of these systems' privacy policies. However, these techniques typically require labor-intensive and potentially erro…
▽ More
The number and dynamic nature of web and mobile applications presents significant challenges for assessing their compliance with data protection laws. In this context, symbolic and statistical Natural Language Processing (NLP) techniques have been employed for the automated analysis of these systems' privacy policies. However, these techniques typically require labor-intensive and potentially error-prone manually annotated datasets for training and validation. This research proposes the application of Large Language Models (LLMs) as an alternative for effectively and efficiently extracting privacy practices from privacy policies at scale. Particularly, we leverage well-known LLMs such as ChatGPT and Llama 2, and offer guidance on the optimal design of prompts, parameters, and models, incorporating advanced strategies such as few-shot learning. We further illustrate its capability to detect detailed and varied privacy practices accurately. Using several renowned datasets in the domain as a benchmark, our evaluation validates its exceptional performance, achieving an F1 score exceeding 93%. Besides, it does so with reduced costs, faster processing times, and fewer technical knowledge requirements. Consequently, we advocate for LLM-based solutions as a sound alternative to traditional NLP techniques for the automated analysis of privacy policies at scale.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Mitigating Challenges of the Space Environment for Onboard Artificial Intelligence: Design Overview of the Imaging Payload on SpIRIT
Authors:
Miguel Ortiz del Castillo,
Jonathan Morgan,
Jack McRobbie,
Clint Therakam,
Zaher Joukhadar,
Robert Mearns,
Simon Barraclough,
Richard Sinnott,
Andrew Woods,
Chris Bayliss,
Kris Ehinger,
Ben Rubinstein,
James Bailey,
Airlie Chapman,
Michele Trenti
Abstract:
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typically be transmitted to mission control. Here, we present the hardware and software design of an onboard AI subsystem hosted on SpIRIT. The system is optimised for on-board computer vis…
▽ More
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typically be transmitted to mission control. Here, we present the hardware and software design of an onboard AI subsystem hosted on SpIRIT. The system is optimised for on-board computer vision experiments based on visible light and long wave infrared cameras. This paper highlights the key design choices made to maximise the robustness of the system in harsh space conditions, and their motivation relative to key mission requirements, such as limited compute resources, resilience to cosmic radiation, extreme temperature variations, distribution shifts, and very low transmission bandwidths. The payload, called Loris, consists of six visible light cameras, three infrared cameras, a camera control board and a Graphics Processing Unit (GPU) system-on-module. Loris enables the execution of AI models with on-orbit fine-tuning as well as a next-generation image compression algorithm, including progressive coding. This innovative approach not only enhances the data processing capabilities of nanosatellites but also lays the groundwork for broader applications to remote sensing from space.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Test Time Training for Industrial Anomaly Segmentation
Authors:
Alex Costanzino,
Pierluigi Zama Ramirez,
Mirko Del Moro,
Agostino Aiezzo,
Giuseppe Lisanti,
Samuele Salti,
Luigi Di Stefano
Abstract:
Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation s…
▽ More
Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality
Authors:
Adrian Perez Dieguez,
Min Choi,
Mahmut Okyay,
Mauro Del Ben,
Bryan M. Wong,
Khaled Z. Ibrahim
Abstract:
Tuning searches are pivotal in High-Performance Computing (HPC), addressing complex optimization challenges in computational applications. The complexity arises not only from finely tuning parameters within routines but also potential interdependencies among them, rendering traditional optimization methods inefficient. Instead of scrutinizing interdependencies among parameters and routines, practi…
▽ More
Tuning searches are pivotal in High-Performance Computing (HPC), addressing complex optimization challenges in computational applications. The complexity arises not only from finely tuning parameters within routines but also potential interdependencies among them, rendering traditional optimization methods inefficient. Instead of scrutinizing interdependencies among parameters and routines, practitioners often face the dilemma of conducting independent tuning searches for each routine, thereby overlooking interdependence, or pursuing a more resource-intensive joint search for all routines. This decision is driven by the consideration that some interdependence analysis and high-dimensional decomposition techniques in literature may be prohibitively expensive in HPC tuning searches. Our methodology adapts and refines these methods to ensure computational feasibility while maximizing performance gains in real-world scenarios. Our methodology leverages a cost-effective interdependence analysis to decide whether to merge several tuning searches into a joint search or conduct orthogonal searches. Tested on synthetic functions with varying levels of parameter interdependence, our methodology efficiently explores the search space. In comparison to Bayesian-optimization-based full independent or fully joint searches, our methodology suggested an optimized breakdown of independent and merged searches that led to final configurations up to 8% more accurate, reducing the search time by up to 95%. When applied to GPU-offloaded Real-Time Time-Dependent Density Functional Theory (RT-TDDFT), an application in computational materials science that challenges modern HPC autotuners, our methodology achieved an effective tuning search. Its adaptability and efficiency extend beyond RT-TDDFT, making it valuable for related applications in HPC.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Authors:
Alexandre Drouin,
Maxime Gasse,
Massimo Caccia,
Issam H. Laradji,
Manuel Del Verme,
Tom Marty,
Léo Boisvert,
Megh Thakkar,
Quentin Cappart,
David Vazquez,
Nicolas Chapados,
Alexandre Lacoste
Abstract:
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also…
▽ More
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.
△ Less
Submitted 23 July, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Monitoring the Venice Lagoon: an IoT Cloud-Based Sensor Nerwork Approach
Authors:
Filippo Campagnaro,
Matin Ghalkhani,
Riccardo Tumiati,
Federico Marin,
Matteo Del Grande,
Alessandro Pozzebon,
Davide De Battisti,
Roberto Francescon,
Michele Zorzi
Abstract:
Monitoring the coastal area of the Venice Lagoon is of significant importance. While the impact of global warming is felt worldwide, coastal and littoral regions bear the brunt more prominently. These areas not only face the threat of rising sea levels but also contend with the escalating occurrence of seaquakes and floods. Additionally, the intricate ecosystems of rivers, seas, and lakes undergo…
▽ More
Monitoring the coastal area of the Venice Lagoon is of significant importance. While the impact of global warming is felt worldwide, coastal and littoral regions bear the brunt more prominently. These areas not only face the threat of rising sea levels but also contend with the escalating occurrence of seaquakes and floods. Additionally, the intricate ecosystems of rivers, seas, and lakes undergo profound transformations due to climate change and pollutants.
Employing devices like the SENSWICH floating wireless sensor presented in this article and similar measurement instruments proves invaluable to automate environmental monitoring, hence eliminating the need for manual sampling campaigns. The utilization of wireless measurement devices offers cost-effectiveness, real-time analysis, and a reduction in human resource requirements. Storing data in cloud services further enhances the ability to monitor parameter changes over extended time intervals.
In this article, we present an enhanced sensing device aimed at automating water quality assessment, while considering power consumption and reducing circuit complexity. Specifically, we will introduce the new schematic and circuit of SENSWICH which had changes in circuit and electronic aspects. Furthermore, we outline the methodology for aggregating data in a cloud service environment, such as Amazon Web Service (AWS), and using Grafana for visualization.
△ Less
Submitted 29 April, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
To Err Is Human, but Llamas Can Learn It Too
Authors:
Agnes Luhtaru,
Taido Purason,
Martin Vainikko,
Maksym Del,
Mark Fishel
Abstract:
This study explores enhancing grammatical error correction (GEC) through artificial error generation (AEG) using language models (LMs). Specifically, we fine-tune Llama 2-based LMs for error generation and find that this approach yields synthetic errors akin to human errors. Next, we train GEC Llama models with the help of these artificial errors and outperform previous state-of-the-art error corr…
▽ More
This study explores enhancing grammatical error correction (GEC) through artificial error generation (AEG) using language models (LMs). Specifically, we fine-tune Llama 2-based LMs for error generation and find that this approach yields synthetic errors akin to human errors. Next, we train GEC Llama models with the help of these artificial errors and outperform previous state-of-the-art error correction models, with gains ranging between 0.8 and 6 F0.5 points across all tested languages (German, Ukrainian, and Estonian). Moreover, we demonstrate that generating errors by fine-tuning smaller sequence-to-sequence models and prompting large commercial LMs (GPT-3.5 and GPT-4) also results in synthetic errors beneficially affecting error generation models.
△ Less
Submitted 4 October, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Efficient anytime algorithms to solve the bi-objective Next Release Problem
Authors:
Miguel Ángel Domínguez-Ríos,
Francisco Chicano,
Enrique Alba,
Isabel María del Águila,
José del Sagrado
Abstract:
The Next Release Problem consists in selecting a subset of requirements to develop in the next release of a software product. The selection should be done in a way that maximizes the satisfaction of the stakeholders while the development cost is minimized and the constraints of the requirements are fulfilled. Recent works have solved the problem using exact methods based on Integer Linear Programm…
▽ More
The Next Release Problem consists in selecting a subset of requirements to develop in the next release of a software product. The selection should be done in a way that maximizes the satisfaction of the stakeholders while the development cost is minimized and the constraints of the requirements are fulfilled. Recent works have solved the problem using exact methods based on Integer Linear Programming. In practice, there is no need to compute all the efficient solutions of the problem; a well-spread set in the objective space is more convenient for the decision maker. The exact methods used in the past to find the complete Pareto front explore the objective space in a lexicographic order or use a weighted sum of the objectives to solve a single-objective problem, finding only supported solutions. In this work, we propose five new methods that maintain a well-spread set of solutions at any time during the search, so that the decision maker can stop the algorithm when a large enough set of solutions is found. The methods are called anytime due to this feature. They find both supported and non-supported solutions, and can complete the whole Pareto front if the time provided is long enough.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Stability prediction of the software requirements specification
Authors:
J. del Sagrado,
I. M. del Águila
Abstract:
Complex decision-making is a prominent aspect of Requirements Engineering. This work presents the Bayesian network Requisites that predicts whether the requirements specification documents have to be revised. We show how to validate Requisites by means of metrics obtained from a large complex software project. Besides, this Bayesian network has been integrated into a software tool by defining a co…
▽ More
Complex decision-making is a prominent aspect of Requirements Engineering. This work presents the Bayesian network Requisites that predicts whether the requirements specification documents have to be revised. We show how to validate Requisites by means of metrics obtained from a large complex software project. Besides, this Bayesian network has been integrated into a software tool by defining a communication interface inside a multilayer architecture to add this a new decision making functionality. It provides requirements engineers a way of exploring the software requirement specification by combining requirement metrics and the probability values estimated by the Bayesian network.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Assisted Requirements Selection by Clustering
Authors:
José del Sagrado,
Isabel M del Águila
Abstract:
Requirements selection is a decision-making process that enables project managers to focus on the deliverables that add most value to the project outcome. This task is performed to define which features or requirements will be developed in the next release. It is a complex multi-criteria decision process that has been focused by many research works because a balance between business profits and in…
▽ More
Requirements selection is a decision-making process that enables project managers to focus on the deliverables that add most value to the project outcome. This task is performed to define which features or requirements will be developed in the next release. It is a complex multi-criteria decision process that has been focused by many research works because a balance between business profits and investment is needed. The spectrum of prioritization techniques spans from simple and qualitative to elaborated analytic prioritization approaches that fall into the category of optimization algorithms. This work studies the combination of the qualitative MoSCoW method and cluster analysis for requirements selection. The feasibility of our methodology has been tested on three case studies (with 20, 50 and 100 requirements). In each of them, the requirements have been clustered, then the clustering configurations found have been evaluated using internal validation measures for the compactness, connectivity and separability of the clusters. The experimental results show the validity of clustering strategies for the identification of the core set of requirements for the software product, being the number of categories proposed by MoSCoW a good starting point in requirements prioritization and negotiation.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
PARDINUS: Weakly supervised discarding of photo-trapping empty images based on autoencoders
Authors:
David de la Rosa,
Antonio J Rivera,
María J del Jesus,
Francisco Charte
Abstract:
Photo-trapping cameras are widely employed for wildlife monitoring. Those cameras take photographs when motion is detected to capture images where animals appear. A significant portion of these images are empty - no wildlife appears in the image. Filtering out those images is not a trivial task since it requires hours of manual work from biologists. Therefore, there is a notable interest in automa…
▽ More
Photo-trapping cameras are widely employed for wildlife monitoring. Those cameras take photographs when motion is detected to capture images where animals appear. A significant portion of these images are empty - no wildlife appears in the image. Filtering out those images is not a trivial task since it requires hours of manual work from biologists. Therefore, there is a notable interest in automating this task. Automatic discarding of empty photo-trapping images is still an open field in the area of Machine Learning. Existing solutions often rely on state-of-the-art supervised convolutional neural networks that require the annotation of the images in the training phase. PARDINUS (Weakly suPervised discARDINg of photo-trapping empty images based on aUtoencoderS) is constructed on the foundation of weakly supervised learning and proves that this approach equals or even surpasses other fully supervised methods that require further labeling work.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Evaluating the Relationship Between News Source Sharing and Political Beliefs
Authors:
Sofía M del Pozo,
Sebastián Pinto,
Matteo Serafino,
Federico Moss,
Tomás Cicchini,
Hernán A Makse,
Pablo Balenzuela
Abstract:
In an era marked by an abundance of news sources, access to information significantly influences public opinion. Notably, the bias of news sources often serves as an indicator of individuals' political leanings. This study explores this hypothesis by examining the news sharing behavior of politically active social media users, whose political ideologies were identified in a previous study. Using c…
▽ More
In an era marked by an abundance of news sources, access to information significantly influences public opinion. Notably, the bias of news sources often serves as an indicator of individuals' political leanings. This study explores this hypothesis by examining the news sharing behavior of politically active social media users, whose political ideologies were identified in a previous study. Using correspondence analysis, we estimate the Media Sharing Index (MSI), a measure that captures bias in media outlets and user preferences within a hidden space. During Argentina's 2019 election on Twitter, we observed a predictable pattern: center-right individuals predominantly shared media from center-right biased outlets. However, it is noteworthy that those with center-left inclinations displayed a more diverse media consumption, which is a significant finding. Despite a noticeable polarization based on political affiliation observed in a retweet network analysis, center-left users showed more diverse media sharing preferences, particularly concerning the MSI. Although these findings are specific to Argentina, the developed methodology can be applied in other countries to assess the correlation between users' political leanings and the media they share.
△ Less
Submitted 15 October, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Analyzing User Ideologies and Shared News During the 2019 Argentinian Elections
Authors:
Sofía M del Pozo,
Sebastián Pinto,
Matteo Serafino,
Lucio Garcia,
Hernán A Makse,
Pablo Balenzuela
Abstract:
The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users' political ideologies and the news they share during Argentina's 2019 election period. Our findings reveal…
▽ More
The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users' political ideologies and the news they share during Argentina's 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.
△ Less
Submitted 25 April, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Navigation with shadow prices to optimize multi-commodity flow rates
Authors:
Ignacio Boero,
Igor Spasojevic,
Mariana del Castillo,
George Pappas,
Vijay Kumar,
Alejandro Ribeiro
Abstract:
We propose a method for providing communication network infrastructure in autonomous multi-agent teams. In particular, we consider a set of communication agents that are placed alongside regular agents from the system in order to improve the rate of information transfer between the latter. In order to find the optimal positions to place such agents, we define a flexible performance function that a…
▽ More
We propose a method for providing communication network infrastructure in autonomous multi-agent teams. In particular, we consider a set of communication agents that are placed alongside regular agents from the system in order to improve the rate of information transfer between the latter. In order to find the optimal positions to place such agents, we define a flexible performance function that adapts to network requirements for different systems. We provide an algorithm based on shadow prices of a related convex optimization problem in order to drive the configuration of the complete system towards a local maximum. We apply our method to three different performance functions associated with three practical scenarios in which we show both the performance of the algorithm and the flexibility it allows for optimizing different network requirements.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Saving temporary exhibitions in virtual environments: the Digital Renaissance of Ulisse Aldrovandi -- acquisition and digitisation of cultural heritage objects
Authors:
Roberto Balzani,
Sebastian Barzaghi,
Gabriele Bitelli,
Federica Bonifazi,
Alice Bordignon,
Luca Cipriani,
Simona Colitti,
Federica Collina,
Marilena Daquino,
Francesca Fabbri,
Bruno Fanini,
Filippo Fantini,
Daniele Ferdani,
Giulia Fiorini,
Elena Formia,
Anna Forte,
Federica Giacomini,
Valentina Alena Girelli,
Bianca Gualandi,
Ivan Heibi,
Alessandro Iannucci,
Rachele Manganelli Del Fà,
Arcangelo Massari,
Arianna Moretti,
Silvio Peroni
, et al. (8 additional authors not shown)
Abstract:
As per the objectives of Project CHANGES, particularly its thematic sub-project on the use of virtual technologies for museums and art collections, our goal was to obtain a digital twin of the temporary exhibition on Ulisse Aldrovandi called "The Other Renaissance", and make it accessible to users online. After a preliminary study of the exhibition, focussing on acquisition constraints and related…
▽ More
As per the objectives of Project CHANGES, particularly its thematic sub-project on the use of virtual technologies for museums and art collections, our goal was to obtain a digital twin of the temporary exhibition on Ulisse Aldrovandi called "The Other Renaissance", and make it accessible to users online. After a preliminary study of the exhibition, focussing on acquisition constraints and related solutions, we proceeded with the digital twin creation by acquiring, processing, modelling, optimising, exporting, and metadating the exhibition. We made hybrid use of two acquisition techniques to create new digital cultural heritage objects and environments, and we used open technologies, formats, and protocols to make available the final digital product. Here, we describe the process of collecting and curating bibliographical exhibition (meta)data and the beginning of the digital twin creation to foster its findability, accessibility, interoperability, and reusability. The creation of the digital twin is currently ongoing.
△ Less
Submitted 27 December, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Intelligent Assistant Language Understanding On Device
Authors:
Cecilia Aas,
Hisham Abdelsalam,
Irina Belousova,
Shruti Bhargava,
Jianpeng Cheng,
Robert Daland,
Joris Driesen,
Federico Flego,
Tristan Guigue,
Anders Johannsen,
Partha Lal,
Jiarui Lu,
Joel Ruben Antony Moniz,
Nathan Perkins,
Dhivya Piraviperumal,
Stephen Pulman,
Diarmuid Ó Séaghdha,
David Q. Sun,
John Torr,
Marco Del Vecchio,
Jay Wacker,
Jason D. Williams,
Hong Yu
Abstract:
It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and techn…
▽ More
It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and technologies. For example, some approaches in the dialog systems literature are difficult to maintain over time in a deployment setting. We hope that sharing learnings from our practical experiences may help inform future work in the research community.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Authors:
Maria del Rio-Chanona,
Nadzeya Laurentsyeva,
Johannes Wachs
Abstract:
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant…
▽ More
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
ATLAS: Automatically Detecting Discrepancies Between Privacy Policies and Privacy Labels
Authors:
Akshath Jain,
David Rodriguez,
Jose M. del Alamo,
Norman Sadeh
Abstract:
Privacy policies are long, complex documents that end-users seldom read. Privacy labels aim to ameliorate these issues by providing succinct summaries of salient data practices. In December 2020, Apple began requiring that app developers submit privacy labels describing their apps' data practices. Yet, research suggests that app developers often struggle to do so. In this paper, we automatically i…
▽ More
Privacy policies are long, complex documents that end-users seldom read. Privacy labels aim to ameliorate these issues by providing succinct summaries of salient data practices. In December 2020, Apple began requiring that app developers submit privacy labels describing their apps' data practices. Yet, research suggests that app developers often struggle to do so. In this paper, we automatically identify possible discrepancies between mobile app privacy policies and their privacy labels. Such discrepancies could be indicators of potential privacy compliance issues.
We introduce the Automated Privacy Label Analysis System (ATLAS). ATLAS includes three components: a pipeline to systematically retrieve iOS App Store listings and privacy policies; an ensemble-based classifier capable of predicting privacy labels from the text of privacy policies with 91.3% accuracy using state-of-the-art NLP techniques; and a discrepancy analysis mechanism that enables a large-scale privacy analysis of the iOS App Store.
Our system has enabled us to analyze 354,725 iOS apps. We find several interesting trends. For example, only 40.3% of apps in the App Store provide easily accessible privacy policies, and only 29.6% of apps provide both accessible privacy policies and privacy labels. Among apps that provide both, 88.0% have at least one possible discrepancy between the text of their privacy policy and their privacy label, which could be indicative of a potential compliance issue. We find that, on average, apps have 5.32 such potential compliance issues.
We hope that ATLAS will help app developers, researchers, regulators, and mobile app stores alike. For example, app developers could use our classifier to check for discrepancies between their privacy policies and privacy labels, and regulators could use our system to help review apps at scale for potential compliance issues.
△ Less
Submitted 24 May, 2023;
originally announced June 2023.
-
A Networked Multi-Agent System for Mobile Wireless Infrastructure on Demand
Authors:
Miguel Calvo-Fullana,
Mikhail Gerasimenko,
Daniel Mox,
Leopoldo Agorio,
Mariana del Castillo,
Vijay Kumar,
Alejandro Ribeiro,
Juan Andres Bazerque
Abstract:
Despite the prevalence of wireless connectivity in urban areas around the globe, there remain numerous and diverse situations where connectivity is insufficient or unavailable. To address this, we introduce mobile wireless infrastructure on demand, a system of UAVs that can be rapidly deployed to establish an ad-hoc wireless network. This network has the capability of reconfiguring itself dynamica…
▽ More
Despite the prevalence of wireless connectivity in urban areas around the globe, there remain numerous and diverse situations where connectivity is insufficient or unavailable. To address this, we introduce mobile wireless infrastructure on demand, a system of UAVs that can be rapidly deployed to establish an ad-hoc wireless network. This network has the capability of reconfiguring itself dynamically to satisfy and maintain the required quality of communication. The system optimizes the positions of the UAVs and the routing of data flows throughout the network to achieve this quality of service (QoS). By these means, task agents using the network simply request a desired QoS, and the system adapts accordingly while allowing them to move freely. We have validated this system both in simulation and in real-world experiments. The results demonstrate that our system effectively offers mobile wireless infrastructure on demand, extending the operational range of task agents and supporting complex mobility patterns, all while ensuring connectivity and being resilient to agent failures.
△ Less
Submitted 16 September, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms
Authors:
Antonio J. Rivera,
Miguel A. Dávila,
David Elizondo,
María J. del Jesus,
Francisco Charte
Abstract:
Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication pre…
▽ More
Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication presents mldr.resampling, a software package that provides reference implementations for eleven multilabel resampling methods, with an emphasis on efficiency since these algorithms are usually time-consuming.
△ Less
Submitted 30 May, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Error-Tolerant Exact Query Learning of Finite Set Partitions with Same-Cluster Oracle
Authors:
Adela Frances DePavia,
Olga Medrano Martín del Campo,
Erasmo Tani
Abstract:
This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a Rényi-Ulam style analytical framework for this problem, and prove upper and lower boun…
▽ More
This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a Rényi-Ulam style analytical framework for this problem, and prove upper and lower bounds on its worst-case query complexity. Further, we bound the expected performance of a relevant randomized algorithm. Finally, we study the relationship between adaptivity and query complexity for this problem and related variants.
△ Less
Submitted 16 June, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
An Estimation of Distribution Algorithm based on interactions between requirements to solve the bi-objective Next Release Problem
Authors:
Jose del Sagrado,
Jose Antonio Sierra Ibanez,
Isabel M. del Aguila
Abstract:
Selecting the appropriate requirements to develop in the next release of an open market software product under evolution, is a compulsory step of each software development project. This selection should be done by maximizing stakeholders' satisfaction and minimizing development costs, while keeping constraints. In this work we investigate what is the requirements interactions impact when searching…
▽ More
Selecting the appropriate requirements to develop in the next release of an open market software product under evolution, is a compulsory step of each software development project. This selection should be done by maximizing stakeholders' satisfaction and minimizing development costs, while keeping constraints. In this work we investigate what is the requirements interactions impact when searching for solutions of the bi-objective Next Release Problem. In one hand, these interactions are explicitly included in two algorithms: a branch and bound algorithm and an estimation of distribution algorithm (EDA). And on the other, we study the performance of these not previously used solving approaches by applying them in several instances of small, medium and large size data sets. We find that interactions inclusion do enhance the search and when time restrictions exists, as in the case of the bi-objective Next Release Problem, EDAs have proven to be stable and reliable locating a large number of solutions on the reference Pareto front.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
EvoAAA: An evolutionary methodology for automated \neural autoencoder architecture search
Authors:
Francisco Charte,
Antonio J. Rivera,
Francisco Martínez,
María J. del Jesus
Abstract:
Machine learning models work better when curated features are provided to them. Feature engineering methods have been usually used as a preprocessing step to obtain or build a proper feature set. In late years, autoencoders (a specific type of symmetrical neural network) have been widely used to perform representation learning, proving their competitiveness against classical feature engineering al…
▽ More
Machine learning models work better when curated features are provided to them. Feature engineering methods have been usually used as a preprocessing step to obtain or build a proper feature set. In late years, autoencoders (a specific type of symmetrical neural network) have been widely used to perform representation learning, proving their competitiveness against classical feature engineering algorithms. The main obstacle in the use of autoencoders is finding a good architecture, a process that most experts confront manually. An automated autoencoder architecture search procedure, based on evolutionary methods, is proposed in this paper. The methodology is tested against nine heterogeneous data sets. The obtained results show the ability of this approach to find better architectures, able to concentrate most of the useful information in a minimized coding, in a reduced time.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4
Authors:
Maksym Del,
Mark Fishel
Abstract:
Large language models (LLMs) have demonstrated solid zero-shot reasoning capabilities, which is reflected in their performance on the current test tasks. This calls for a more challenging benchmark requiring highly advanced reasoning ability to be solved. In this paper, we introduce such a benchmark, consisting of 191 long-form (1200 words on average) mystery narratives constructed as detective pu…
▽ More
Large language models (LLMs) have demonstrated solid zero-shot reasoning capabilities, which is reflected in their performance on the current test tasks. This calls for a more challenging benchmark requiring highly advanced reasoning ability to be solved. In this paper, we introduce such a benchmark, consisting of 191 long-form (1200 words on average) mystery narratives constructed as detective puzzles. Puzzles are sourced from the "5 Minute Mystery" platform and include a multiple-choice question for evaluation. Only 47% of humans solve a puzzle successfully on average, while the best human solvers achieve over 80% success rate. We show that GPT-3 models barely outperform random on this benchmark (with 28% accuracy) while state-of-the-art GPT-4 solves only 38% of puzzles. This indicates that there is still a significant gap in the deep reasoning abilities of LLMs and humans and highlights the need for further research in this area. Our work introduces a challenging benchmark for future studies on reasoning in language models and contributes to a better understanding of the limits of LLMs' abilities.
△ Less
Submitted 1 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Quantum Clustering with k-Means: a Hybrid Approach
Authors:
Alessandro Poggiali,
Alessandro Berti,
Anna Bernasconi,
Gianna M. Del Corso,
Riccardo Guidotti
Abstract:
Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, e…
▽ More
Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, each algorithm incrementally leverages quantum parallelism to reduce the complexity of the cluster assignment step up to a constant cost. In particular, we exploit quantum phenomena to speed up the computation of distances. The core idea is that the computation of distances between records and centroids can be executed simultaneously, thus saving time, especially for big datasets. We show that our hybrid quantum k-Means algorithms can be more efficient than the classical version, still obtaining comparable clustering results.
△ Less
Submitted 15 December, 2022; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Cross-lingual Similarity of Multilingual Representations Revisited
Authors:
Maksym Del,
Mark Fishel
Abstract:
Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity thes…
▽ More
Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Data Fusion in Neuromarketing: Multimodal Analysis of Biosignals, Lifecycle Stages, Current Advances, Datasets, Trends, and Challenges
Authors:
Mario Quiles Pérez,
Enrique Tomás Martínez Beltrán,
Sergio López Bernal,
Eduardo Horna Prat,
Luis Montesano Del Campo,
Lorenzo Fernández Maimó,
Alberto Huertas Celdrán
Abstract:
The primary goal of any company is to increase its profits by improving both the quality of its products and how they are advertised. In this context, neuromarketing seeks to enhance the promotion of products and generate a greater acceptance on potential buyers. Traditionally, neuromarketing studies have relied on a single biosignal to obtain feedback from presented stimuli. However, thanks to ne…
▽ More
The primary goal of any company is to increase its profits by improving both the quality of its products and how they are advertised. In this context, neuromarketing seeks to enhance the promotion of products and generate a greater acceptance on potential buyers. Traditionally, neuromarketing studies have relied on a single biosignal to obtain feedback from presented stimuli. However, thanks to new devices and technological advances studying this area of knowledge, recent trends indicate a shift towards the fusion of diverse biosignals. An example is the usage of electroencephalography for understanding the impact of an advertisement at the neural level and visual tracking to identify the stimuli that induce such impacts. This emerging pattern determines which biosignals to employ for achieving specific neuromarketing objectives. Furthermore, the fusion of data from multiple sources demands advanced processing methodologies. Despite these complexities, there is a lack of literature that adequately collates and organizes the various data sources and the applied processing techniques for the research objectives pursued. To address these challenges, the current paper conducts a comprehensive analysis of the objectives, biosignals, and data processing techniques employed in neuromarketing research. This study provides both the technical definition and a graphical distribution of the elements under revision. Additionally, it presents a categorization based on research objectives and provides an overview of the combinatory methodologies employed. After this, the paper examines primary public datasets designed for neuromarketing research together with others whose main purpose is not neuromarketing, but can be used for this matter. Ultimately, this work provides a historical perspective on the evolution of techniques across various phases over recent years and enumerates key lessons learned.
△ Less
Submitted 21 August, 2023; v1 submitted 30 August, 2022;
originally announced September 2022.
-
Lessons from a Space Lab -- An Image Acquisition Perspective
Authors:
Leo Pauly,
Michele Lynn Jamrozik,
Miguel Ortiz Del Castillo,
Olivia Borgue,
Inder Pal Singh,
Mohatashem Reyaz Makhdoomi,
Olga-Orsalia Christidi-Loumpasefski,
Vincent Gaudilliere,
Carol Martinez,
Arunkumar Rathinam,
Andreas Hein,
Miguel Olivares-Mendez,
Djamila Aouada
Abstract:
The use of Deep Learning (DL) algorithms has improved the performance of vision-based space applications in recent years. However, generating large amounts of annotated data for training these DL algorithms has proven challenging. While synthetically generated images can be used, the DL models trained on synthetic data are often susceptible to performance degradation, when tested in real-world env…
▽ More
The use of Deep Learning (DL) algorithms has improved the performance of vision-based space applications in recent years. However, generating large amounts of annotated data for training these DL algorithms has proven challenging. While synthetically generated images can be used, the DL models trained on synthetic data are often susceptible to performance degradation, when tested in real-world environments. In this context, the Interdisciplinary Center of Security, Reliability and Trust (SnT) at the University of Luxembourg has developed the 'SnT Zero-G Lab', for training and validating vision-based space algorithms in conditions emulating real-world space environments. An important aspect of the SnT Zero-G Lab development was the equipment selection. From the lessons learned during the lab development, this article presents a systematic approach combining market survey and experimental analyses for equipment selection. In particular, the article focus on the image acquisition equipment in a space lab: background materials, cameras and illumination lamps. The results from the experiment analyses show that the market survey complimented by experimental analyses is required for effective equipment selection in a space lab development project.
△ Less
Submitted 6 December, 2022; v1 submitted 18 August, 2022;
originally announced August 2022.
-
Mental health concerns prelude the Great Resignation: Evidence from Social Media
Authors:
R. Maria del Rio-Chanona,
Alejandro Hermida-Carrillo,
Melody Sepahpour-Fard,
Luning Sun,
Renata Topinkova,
Ljubica Nedelkoska
Abstract:
To study the causes of the 2021 Great Resignation, we use text analysis to investigate the changes in work- and quit-related posts between 2018 and 2021 on Reddit. We find that the Reddit discourse evolution resembles the dynamics of the U.S. quit and layoff rates. Furthermore, when the COVID-19 pandemic started, conversations related to working from home, switching jobs, work-related distress, an…
▽ More
To study the causes of the 2021 Great Resignation, we use text analysis to investigate the changes in work- and quit-related posts between 2018 and 2021 on Reddit. We find that the Reddit discourse evolution resembles the dynamics of the U.S. quit and layoff rates. Furthermore, when the COVID-19 pandemic started, conversations related to working from home, switching jobs, work-related distress, and mental health increased. We distinguish between general work-related and specific quit-related discourse changes using a difference-in-differences method. Our main finding is that mental health and work-related distress topics disproportionally increased among quit-related posts since the onset of the pandemic, likely contributing to the Great Resignation. Along with better labor market conditions, some relief came beginning-to-mid-2021 when these concerns decreased. Our study validates the use of forums such as Reddit for studying emerging economic phenomena in real time, complementing traditional labor market surveys and administrative data.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey
Authors:
Xiaoyu Shen,
Svitlana Vakulenko,
Marco del Tredici,
Gianni Barlacchi,
Bill Byrne,
Adrià de Gispert
Abstract:
Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR…
▽ More
Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR performance under low-resource scenarios. These works differ in what resources they require for training and employ a diverse set of techniques. Understanding such differences is crucial for choosing the right technique under a specific low-resource scenario. To facilitate this understanding, we provide a thorough structured overview of mainstream techniques for low-resource DR. Based on their required resources, we divide the techniques into three main categories: (1) only documents are needed; (2) documents and questions are needed; and (3) documents and question-answer pairs are needed. For every technique, we introduce its general-form algorithm, highlight the open issues and pros and cons. Promising directions are outlined for future research.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
ROI: A method for identifying organizations receiving personal data
Authors:
David Rodriguez,
Jose M. Del Alamo,
Miguel Cozar,
Boni Garcia
Abstract:
Many studies have exposed the massive collection of personal data in the digital ecosystem through, for instance, websites, mobile apps, or smart devices. This fact goes unnoticed by most users, who are also unaware that the collectors are sharing their personal data with many different organizations around the globe. This paper assesses techniques available in the state of the art to identify the…
▽ More
Many studies have exposed the massive collection of personal data in the digital ecosystem through, for instance, websites, mobile apps, or smart devices. This fact goes unnoticed by most users, who are also unaware that the collectors are sharing their personal data with many different organizations around the globe. This paper assesses techniques available in the state of the art to identify the organizations receiving this personal data. Based on our findings, we propose ROI (Receiver Organization Identifier), a fully automated method that combines different techniques to achieve a 95.71% precision score in identifying an organization receiving personal data. We demonstrate our method in the wild by evaluating 10,000 Android apps and exposing the organizations that receive users' personal data.
△ Less
Submitted 25 July, 2023; v1 submitted 19 April, 2022;
originally announced April 2022.
-
From Rewriting to Remembering: Common Ground for Conversational QA Models
Authors:
Marco Del Tredici,
Xiaoyu Shen,
Gianni Barlacchi,
Bill Byrne,
Adrià de Gispert
Abstract:
In conversational QA, models have to leverage information in previous turns to answer upcoming questions. Current approaches, such as Question Rewriting, struggle to extract relevant information as the conversation unwinds. We introduce the Common Ground (CG), an approach to accumulate conversational information as it emerges and select the relevant information at every turn. We show that CG offer…
▽ More
In conversational QA, models have to leverage information in previous turns to answer upcoming questions. Current approaches, such as Question Rewriting, struggle to extract relevant information as the conversation unwinds. We introduce the Common Ground (CG), an approach to accumulate conversational information as it emerges and select the relevant information at every turn. We show that CG offers a more efficient and human-like way to exploit conversational information compared to existing approaches, leading to improvements on Open Domain Conversational QA.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.