-
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Authors:
Hao Xu,
Xiangru Jian,
Xinjian Zhao,
Wei Pang,
Chao Zhang,
Suyuchen Wang,
Qixin Zhang,
Joao Monteiro,
Qiuzhuang Sun,
Tianshu Yu
Abstract:
In this paper, we presented GraphOmni, a comprehensive benchmark framework for systematically evaluating the graph reasoning capabilities of LLMs. By analyzing critical dimensions, including graph types, serialization formats, and prompt schemes, we provided extensive insights into the strengths and limitations of current LLMs. Our empirical findings emphasize that no single serialization or promp…
▽ More
In this paper, we presented GraphOmni, a comprehensive benchmark framework for systematically evaluating the graph reasoning capabilities of LLMs. By analyzing critical dimensions, including graph types, serialization formats, and prompt schemes, we provided extensive insights into the strengths and limitations of current LLMs. Our empirical findings emphasize that no single serialization or prompting strategy consistently outperforms others. Motivated by these insights, we propose a reinforcement learning-based approach that dynamically selects the best serialization-prompt pairings, resulting in significant accuracy improvements. GraphOmni's modular and extensible design establishes a robust foundation for future research, facilitating advancements toward general-purpose graph reasoning models.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
CountPath: Automating Fragment Counting in Digital Pathology
Authors:
Ana Beatriz Vieira,
Maria Valente,
Diana Montezuma,
Tomé Albuquerque,
Liliana Ribeiro,
Domingos Oliveira,
João Monteiro,
Sofia Gonçalves,
Isabel M. Pinto,
Jaime S. Cardoso,
Arlindo L. Oliveira
Abstract:
Quality control of medical images is a critical component of digital pathology, ensuring that diagnostic images meet required standards. A pre-analytical task within this process is the verification of the number of specimen fragments, a process that ensures that the number of fragments on a slide matches the number documented in the macroscopic report. This step is important to ensure that the sl…
▽ More
Quality control of medical images is a critical component of digital pathology, ensuring that diagnostic images meet required standards. A pre-analytical task within this process is the verification of the number of specimen fragments, a process that ensures that the number of fragments on a slide matches the number documented in the macroscopic report. This step is important to ensure that the slides contain the appropriate diagnostic material from the grossing process, thereby guaranteeing the accuracy of subsequent microscopic examination and diagnosis. Traditionally, this assessment is performed manually, requiring significant time and effort while being subject to significant variability due to its subjective nature. To address these challenges, this study explores an automated approach to fragment counting using the YOLOv9 and Vision Transformer models. Our results demonstrate that the automated system achieves a level of performance comparable to expert assessments, offering a reliable and efficient alternative to manual counting. Additionally, we present findings on interobserver variability, showing that the automated approach achieves an accuracy of 86%, which falls within the range of variation observed among experts (82-88%), further supporting its potential for integration into routine pathology workflows.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Authors:
Andrea Gurioli,
Federico Pennino,
João Monteiro,
Maurizio Gabbrielli
Abstract:
Deploying language models often requires handling model size vs. performance trade-offs to satisfy downstream latency constraints while preserving the model's usefulness. Model distillation is commonly employed to reduce model size while maintaining acceptable performance. However, distillation can be inefficient since it involves multiple training steps. In this work, we introduce MODULARSTARENCO…
▽ More
Deploying language models often requires handling model size vs. performance trade-offs to satisfy downstream latency constraints while preserving the model's usefulness. Model distillation is commonly employed to reduce model size while maintaining acceptable performance. However, distillation can be inefficient since it involves multiple training steps. In this work, we introduce MODULARSTARENCODER, a modular multi-exit encoder with 1B parameters, useful for multiple tasks within the scope of code retrieval. MODULARSTARENCODER is trained with a novel self-distillation mechanism that significantly improves lower-layer representations-allowing different portions of the model to be used while still maintaining a good trade-off in terms of performance. Our architecture focuses on enhancing text-to-code and code-to-code search by systematically capturing syntactic and semantic structures across multiple levels of representation. Specific encoder layers are targeted as exit heads, allowing higher layers to guide earlier layers during training. This self-distillation effect improves intermediate representations, increasing retrieval recall at no extra training cost. In addition to the multi-exit scheme, our approach integrates a repository-level contextual loss that maximally utilizes the training context window, further enhancing the learned representations. We also release a new dataset constructed via code translation, seamlessly expanding traditional text-to-code benchmarks with code-to-code pairs across diverse programming languages. Experimental results highlight the benefits of self-distillation through multi-exit supervision.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Software development projects as a way for multidisciplinary soft and future skills education
Authors:
Krzysztof Podlaski,
Michal Beczkowski,
Katharina Simbeck,
Katrin Dziergwa,
Derek O'Reilly,
Shane Dowdall,
Joao Monteiro,
Catarina Oliveira Lucas,
Johanna Hautamaki,
Heikki Ahonen,
Hiram Bollaert,
Philippe Possemiers,
Zofia Stawska
Abstract:
Soft and future skills are in high demand in the modern job market. These skills are required for both technical and non-technical people. It is difficult to teach these competencies in a classical academic environment.
The paper presents a possible approach to teaching in soft and future skills in a short, intensive joint project. In our case, it is a project within the Erasmus+ framework, but…
▽ More
Soft and future skills are in high demand in the modern job market. These skills are required for both technical and non-technical people. It is difficult to teach these competencies in a classical academic environment.
The paper presents a possible approach to teaching in soft and future skills in a short, intensive joint project. In our case, it is a project within the Erasmus+ framework, but it can be organized in many different frameworks.
In the project we use problem based learning, active learning and group-work teaching methodologies. Moreover, the approach put high emphasizes diversity. We arrange a set of multidisciplinary students in groups. Each group is working on software development tasks. This type of projects demand diversity, and only a part of the team needs technical skills. In our case less than half of participants had computer science background. Additionally, software development projects are usually interesting for non-technical students.
The multicultural, multidisciplinary and international aspects are very important in a modern global working environment. On the other hand, short time of the project and its intensity allow to simulate stressful situations in a real word tasks. The effects of the project on the required competencies are measured using the KYSS method.
The results prove that the presented method increased participants soft skills in communication, cooperation, digital skills and self reflection.
△ Less
Submitted 20 March, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
PairBench: A Systematic Framework for Selecting Reliable Judge VLMs
Authors:
Aarash Feizi,
Sai Rajeswar,
Adriana Romero-Soriano,
Reihaneh Rabbany,
Spandana Gella,
Valentina Zantedeschi,
João Monteiro
Abstract:
As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare data pairs as instructed in the prompt becomes essential. To address this, we present PairBench, a low-cost framework that systematically evaluates VLMs as customizable similarity tools across various modalities and scenarios. Through PairBench, we introduce four…
▽ More
As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare data pairs as instructed in the prompt becomes essential. To address this, we present PairBench, a low-cost framework that systematically evaluates VLMs as customizable similarity tools across various modalities and scenarios. Through PairBench, we introduce four metrics that represent key desiderata of similarity scores: alignment with human annotations, consistency for data pairs irrespective of their order, smoothness of similarity distributions, and controllability through prompting. Our analysis demonstrates that no model, whether closed- or open-source, is superior on all metrics; the optimal choice depends on an auto evaluator's desired behavior (e.g., a smooth vs. a sharp judge), highlighting risks of widespread adoption of VLMs as evaluators without thorough assessment. For instance, the majority of VLMs struggle with maintaining symmetric similarity scores regardless of order. Additionally, our results show that the performance of VLMs on the metrics in PairBench closely correlates with popular benchmarks, showcasing its predictive power in ranking models.
△ Less
Submitted 24 February, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Authors:
Saeid Asgari Taghanaki,
Joao Monteiro
Abstract:
Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend the concepts they articulate remains unclear. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline where models: (i) given a t…
▽ More
Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend the concepts they articulate remains unclear. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline where models: (i) given a topic generate an excerpt with information about the topic, (ii) given an excerpt generate question-answer pairs, and finally (iii) given a question generate an answer. We refer to this self-evaluation approach as Explain-Query-Test (EQT). Interestingly, the accuracy on generated questions resulting from running the EQT pipeline correlates strongly with the model performance as verified by typical benchmarks such as MMLU-Pro. In other words, EQT's performance is predictive of MMLU-Pro's, and EQT can be used to rank models without the need for any external source of evaluation data other than lists of topics of interest. Moreover, our results reveal a disparity between the models' ability to produce detailed explanations and their performance on questions related to those explanations. This gap highlights fundamental limitations in the internal knowledge representation and reasoning abilities of current LLMs. We release the code at https://github.com/asgsaeid/EQT.
△ Less
Submitted 8 March, 2025; v1 submitted 20 January, 2025;
originally announced January 2025.
-
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
Authors:
Mehrnaz Mofakhami,
Reza Bayat,
Ioannis Mitliagkas,
Joao Monteiro,
Valentina Zantedeschi
Abstract:
Early Exiting (EE) is a promising technique for speeding up inference by adaptively allocating compute resources to data points based on their difficulty. The approach enables predictions to exit at earlier layers for simpler samples while reserving more computation for challenging ones. In this study, we first present a novel perspective on the EE approach, showing that larger models deployed wit…
▽ More
Early Exiting (EE) is a promising technique for speeding up inference by adaptively allocating compute resources to data points based on their difficulty. The approach enables predictions to exit at earlier layers for simpler samples while reserving more computation for challenging ones. In this study, we first present a novel perspective on the EE approach, showing that larger models deployed with EE can achieve higher performance than smaller models while maintaining similar computational costs. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute-performance trade-off. We introduce Performance Control Early Exiting (PCEE), a method that enables accuracy thresholding by basing decisions not on a data point's confidence but on the average accuracy of samples with similar confidence levels from a held-out validation set. In our experiments, we show that PCEE offers a simple yet computationally efficient approach that provides better control over performance than standard confidence-based approaches, and allows us to scale up model sizes to yield performance gain while reducing the computational cost.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Authors:
Juan Rodriguez,
Xiangru Jian,
Siba Smarak Panigrahi,
Tianyu Zhang,
Aarash Feizi,
Abhay Puri,
Akshay Kalkunte,
François Savard,
Ahmed Masry,
Shravan Nayak,
Rabiul Awal,
Mahsa Massoud,
Amirhossein Abaskohi,
Zichao Li,
Suyuchen Wang,
Pierre-André Noël,
Mats Leon Richter,
Saverio Vadacchino,
Shubham Agarwal,
Sanket Biswas,
Sara Shanian,
Ying Zhang,
Noah Bolger,
Kurt MacDonald,
Simon Fauvel
, et al. (18 additional authors not shown)
Abstract:
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training da…
▽ More
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .
△ Less
Submitted 17 March, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Explorative Imitation Learning: A Path Signature Approach for Continuous Environments
Authors:
Nathan Gavenski,
Juarez Monteiro,
Felipe Meneguzzi,
Michael Luck,
Odinaldo Rodrigues
Abstract:
Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting i…
▽ More
Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting imitation learning with two important features: (i) exploration, allowing for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations; and (ii) path signatures, allowing for automatic encoding of constraints, through the creation of non-parametric representations of agents and expert trajectories. We compared CILO with a baseline and two leading imitation learning methods in five environments. It had the best overall performance of all methods in all environments, outperforming the expert in two of them.
△ Less
Submitted 22 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Authors:
Joao Monteiro,
Pierre-Andre Noel,
Etienne Marcotte,
Sai Rajeswar,
Valentina Zantedeschi,
David Vazquez,
Nicolas Chapados,
Christopher Pal,
Perouz Taslakian
Abstract:
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training se…
▽ More
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.
△ Less
Submitted 5 November, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Authors:
João Monteiro,
Étienne Marcotte,
Pierre-André Noël,
Valentina Zantedeschi,
David Vázquez,
Nicolas Chapados,
Christopher Pal,
Perouz Taslakian
Abstract:
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right contex…
▽ More
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.
△ Less
Submitted 1 November, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Parallelization Strategies for the Randomized Kaczmarz Algorithm on Large-Scale Dense Systems
Authors:
Inês Ferreira,
Juan A. Acebrón,
José Monteiro
Abstract:
The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per iteration. This characteristic makes it especially useful in solving very large systems. The recent introduction of a randomized version, the Randomized Kaczmarz method, renewed interest in the algorithm…
▽ More
The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per iteration. This characteristic makes it especially useful in solving very large systems. The recent introduction of a randomized version, the Randomized Kaczmarz method, renewed interest in the algorithm, leading to the development of numerous variations. Subsequently, parallel implementations for both the original and Randomized Kaczmarz method have since then been proposed. However, previous work has addressed sparse linear systems, whereas we focus on solving dense systems. In this paper, we explore in detail approaches to parallelizing the Kaczmarz method for both shared and distributed memory for large dense systems. In particular, we implemented the Randomized Kaczmarz with Averaging (RKA) method that, for inconsistent systems, unlike the standard Randomized Kaczmarz algorithm, reduces the final error of the solution. While efficient parallelization of this algorithm is not achievable, we introduce a block version of the averaging method that can outperform the RKA method.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Group Robust Classification Without Any Group Information
Authors:
Christos Tsirigotis,
Joao Monteiro,
Pau Rodriguez,
David Vazquez,
Aaron Courville
Abstract:
Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bia…
▽ More
Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Authors:
Charles Guille-Escuret,
Pierre-André Noël,
Ioannis Mitliagkas,
David Vazquez,
Joao Monteiro
Abstract:
Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where…
▽ More
Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.
△ Less
Submitted 9 December, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
A Fast Monte Carlo algorithm for evaluating matrix functions with application in complex networks
Authors:
Nicolas L. Guidotti,
Juan A. Acebrón,
José Monteiro
Abstract:
We propose a novel stochastic algorithm that randomly samples entire rows and columns of the matrix as a way to approximate an arbitrary matrix function using the power series expansion. This contrasts with existing Monte Carlo methods, which only work with one entry at a time, resulting in a significantly better convergence rate than the original approach. To assess the applicability of our metho…
▽ More
We propose a novel stochastic algorithm that randomly samples entire rows and columns of the matrix as a way to approximate an arbitrary matrix function using the power series expansion. This contrasts with existing Monte Carlo methods, which only work with one entry at a time, resulting in a significantly better convergence rate than the original approach. To assess the applicability of our method, we compute the subgraph centrality and total communicability of several large networks. In all benchmarks analyzed so far, the performance of our method was significantly superior to the competition, being able to scale up to 64 CPU cores with remarkable efficiency.
△ Less
Submitted 20 September, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
StarCoder: may the source be with you!
Authors:
Raymond Li,
Loubna Ben Allal,
Yangtian Zi,
Niklas Muennighoff,
Denis Kocetkov,
Chenghao Mou,
Marc Marone,
Christopher Akiki,
Jia Li,
Jenny Chim,
Qian Liu,
Evgenii Zheltonozhskii,
Terry Yue Zhuo,
Thomas Wang,
Olivier Dehaene,
Mishig Davaadorj,
Joel Lamy-Poirier,
João Monteiro,
Oleh Shliazhko,
Nicolas Gontier,
Nicholas Meade,
Armel Zebaze,
Ming-Ho Yee,
Logesh Kumar Umapathi,
Jian Zhu
, et al. (42 additional authors not shown)
Abstract:
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle…
▽ More
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
△ Less
Submitted 13 December, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Self-Supervised Adversarial Imitation Learning
Authors:
Juarez Monteiro,
Nathan Gavenski,
Felipe Meneguzzi,
Rodrigo C. Barros
Abstract:
Behavioural cloning is an imitation learning technique that teaches an agent how to behave via expert demonstrations. Recent approaches use self-supervision of fully-observable unlabelled snapshots of the states to decode state pairs into actions. However, the iterative learning scheme employed by these techniques is prone to get trapped into bad local minima. Previous work uses goal-aware strateg…
▽ More
Behavioural cloning is an imitation learning technique that teaches an agent how to behave via expert demonstrations. Recent approaches use self-supervision of fully-observable unlabelled snapshots of the states to decode state pairs into actions. However, the iterative learning scheme employed by these techniques is prone to get trapped into bad local minima. Previous work uses goal-aware strategies to solve this issue. However, this requires manual intervention to verify whether an agent has reached its goal. We address this limitation by incorporating a discriminator into the original framework, offering two key advantages and directly solving a learning problem previous work had. First, it disposes of the manual intervention requirement. Second, it helps in learning by guiding function approximation based on the state transition of the expert's trajectories. Third, the discriminator solves a learning issue commonly present in the policy model, which is to sometimes perform a `no action' within the environment until the agent finally halts.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
An interpretable machine learning system for colorectal cancer diagnosis from pathology slides
Authors:
Pedro C. Neto,
Diana Montezuma,
Sara P. Oliveira,
Domingos Oliveira,
João Fraga,
Ana Monteiro,
João Monteiro,
Liliana Ribeiro,
Sofia Gonçalves,
Stefan Reinhard,
Inti Zlobec,
Isabel M. Pinto,
Jaime S. Cardoso
Abstract:
Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an app…
▽ More
Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.
△ Less
Submitted 30 April, 2024; v1 submitted 6 January, 2023;
originally announced January 2023.
-
CADet: Fully Self-Supervised Out-Of-Distribution Detection With Contrastive Learning
Authors:
Charles Guille-Escuret,
Pau Rodriguez,
David Vazquez,
Ioannis Mitliagkas,
Joao Monteiro
Abstract:
Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. First, we pair self-supervised contrastive learning with the maximum mean discrepancy (MMD) two-sample…
▽ More
Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. First, we pair self-supervised contrastive learning with the maximum mean discrepancy (MMD) two-sample test. This approach enables us to robustly test whether two independent sets of samples originate from the same distribution, and we demonstrate its effectiveness by discriminating between CIFAR-10 and CIFAR-10.1 with higher confidence than previous work. Motivated by this success, we introduce CADet (Contrastive Anomaly Detection), a novel method for OOD detection of single samples. CADet draws inspiration from MMD, but leverages the similarity between contrastive transformations of a same sample. CADet outperforms existing adversarial detection methods in identifying adversarially perturbed samples on ImageNet and achieves comparable performance to unseen label detection methods on two challenging benchmarks: ImageNet-O and iNaturalist. Significantly, CADet is fully self-supervised and requires neither labels for in-distribution samples nor access to OOD examples.
△ Less
Submitted 9 December, 2024; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Constraining Representations Yields Models That Know What They Don't Know
Authors:
Joao Monteiro,
Pau Rodriguez,
Pierre-Andre Noel,
Issam Laradji,
David Vazquez
Abstract:
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal act…
▽ More
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal activation patterns. Specifically, we assign to each class a unique, fixed, randomly-generated binary vector - hereafter called class code - and train the model so that its cross-depths activation patterns predict the appropriate class code according to the input sample's class. The resulting predictors are dubbed Total Activation Classifiers (TAC), and TACs may either be trained from scratch, or used with negligible cost as a thin add-on on top of a frozen, pre-trained neural network. The distance between a TAC's activation pattern and the closest valid code acts as an additional confidence score, besides the default unTAC'ed prediction head's. In the add-on case, the original neural network's inference head is completely unaffected (so its accuracy remains the same) but we now have the option to use TAC's own confidence and prediction when determining which course of action to take in an hypothetical production workflow. In particular, we show that TAC strictly improves the value derived from models allowed to reject/defer. We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models.
△ Less
Submitted 19 April, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification
Authors:
Joao Monteiro,
Mohamed Osama Ahmed,
Hossein Hajimirsadeghi,
Greg Mori
Abstract:
We study settings where gradient penalties are used alongside risk minimization with the goal of obtaining predictors satisfying different notions of monotonicity. Specifically, we present two sets of contributions. In the first part of the paper, we show that different choices of penalties define the regions of the input space where the property is observed. As such, previous methods result in mo…
▽ More
We study settings where gradient penalties are used alongside risk minimization with the goal of obtaining predictors satisfying different notions of monotonicity. Specifically, we present two sets of contributions. In the first part of the paper, we show that different choices of penalties define the regions of the input space where the property is observed. As such, previous methods result in models that are monotonic only in a small volume of the input space. We thus propose an approach that uses mixtures of training instances and random points to populate the space and enforce the penalty in a much larger region. As a second set of contributions, we introduce regularization strategies that enforce other notions of monotonicity in different settings. In this case, we consider applications, such as image classification and generative modeling, where monotonicity is not a hard constraint but can help improve some aspects of the model. Namely, we show that inducing monotonicity can be beneficial in applications such as: (1) allowing for controllable data generation, (2) defining strategies to detect anomalous data, and (3) generating explanations for predictions. Our proposed approaches do not introduce relevant computational overhead while leading to efficient procedures that provide extra benefits over baseline models.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Efficient and Eventually Consistent Collective Operations
Authors:
Roman Iakymchuk,
Amandio Faustino,
Andrew Emerson,
Joao Barreto,
Valeria Bartsch,
Rodrigo Rodrigues,
Jose C. Monteiro
Abstract:
Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increase…
▽ More
Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically.
In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations -- such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Dynamic Page Placement on Real Persistent Memory Systems
Authors:
Miguel Marques,
Ilia Kuzmin,
João Barreto,
José Monteiro,
Rodrigo Rodrigues
Abstract:
As persistent memory (PM) technologies emerge, hybrid memory architectures combining DRAM with PM bring the potential to provide a tiered, byte-addressable main memory of unprecedented capacity. Nearly a decade after the first proposals for these hybrid architectures, the real technology has finally reached commercial availability with Intel Optane(TM) DC Persistent Memory (DCPMM). This raises the…
▽ More
As persistent memory (PM) technologies emerge, hybrid memory architectures combining DRAM with PM bring the potential to provide a tiered, byte-addressable main memory of unprecedented capacity. Nearly a decade after the first proposals for these hybrid architectures, the real technology has finally reached commercial availability with Intel Optane(TM) DC Persistent Memory (DCPMM). This raises the challenge of designing systems that realize this potential in practice, namely through effective approaches that dynamically decide at which memory tier should pages be placed. In this paper, we are the first, to our knowledge, to systematically analyze tiered page placement on real DCPMM-based systems. To this end, we start by revisiting the assumptions of state-of-the-art proposals, and confronting them with the idiosyncrasies of today's off-the-shelf DCPMM-equipped architectures. This empirical study reveals that some of the key design choices in the literature rely on important assumptions that are not verified in present-day DRAM-DCPMM memory architectures. Based on the lessons from this study, we design and implement HyPlacer, a tool for tiered page placement in off-the-shelf Linux-based systems equipped with DRAM+DCPMM. In contrast to previous proposals, HyPlacer follows an approach guided by two main practicality principles: 1) it is tailored to the performance idiosyncrasies of off-theshelf DRAM+DCPMM systems; and 2) it can be seamlessly integrated into Linux with minimal kernel-mode components, while ensuring extensibility to other HMAs and other data placement policies. Our experimental evaluation of HyPlacer shows that it outperforms both solutions proposed in past literature and placement options that are currently available in off-the-shelf DCPMM-equipped Linux systems, reaching an improvement of up to 11x when compared to the default memory policy in Linux.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Domain Conditional Predictors for Domain Adaptation
Authors:
Joao Monteiro,
Xavier Gibert,
Jianqiao Feng,
Vincent Dumoulin,
Dar-Shyang Lee
Abstract:
Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which exp…
▽ More
Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more generally applicable than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Particle-In-Cell Simulation using Asynchronous Tasking
Authors:
Nicolas Guidotti,
Pedro Ceyrat,
João Barreto,
José Monteiro,
Rodrigo Rodrigues,
Ricardo Fonseca,
Xavier Martorell,
Antonio J. Peña
Abstract:
Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks. However, tasking models are yet to be widely adopted by the HPC community and their effective advantages wh…
▽ More
Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks. However, tasking models are yet to be widely adopted by the HPC community and their effective advantages when applied to non-trivial, real-world HPC applications are still not well comprehended. In this paper, we study the parallelization of a production electromagnetic particle-in-cell (EM-PIC) code for kinetic plasma simulations exploring different strategies using asynchronous task-based models. Our fully asynchronous implementation not only significantly outperforms a conventional, synchronous approach but also achieves near perfect scaling for 48 cores.
△ Less
Submitted 29 August, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
The Challenges of Assessing and Evaluating the Students at Distance
Authors:
Fernando Almeida,
José Monteiro
Abstract:
The COVID-19 pandemic has caused a strong effect on higher education institutions with the closure of classroom teaching activities. In this unprecedented crisis, of global proportion, educators and families had to deal with unpredictability and learn new ways of teaching. This short essay aims to explore the challenges posed to Portuguese higher education institutions and to analyze the challenge…
▽ More
The COVID-19 pandemic has caused a strong effect on higher education institutions with the closure of classroom teaching activities. In this unprecedented crisis, of global proportion, educators and families had to deal with unpredictability and learn new ways of teaching. This short essay aims to explore the challenges posed to Portuguese higher education institutions and to analyze the challenges posed to evaluation models. To this end, the relevance of formative and summative assessment models in distance education is explored and the perception of teachers and students about the practices adopted in remote assessment is discussed. On the teachers' side, there is a high concern about adopting fraud-free models, and an excessive focus on the summative assessment component that in the distance learning model has less preponderance when compared to the gradual monitoring and assessment processes of the students, while on the students' side, problems arise regarding equipment to follow the teaching sessions and concerns about their privacy, particularly when intrusive IT solutions request the access to their cameras, audio, and desktop.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
Imitating Unknown Policies via Exploration
Authors:
Nathan Gavenski,
Juarez Monteiro,
Roger Granada,
Felipe Meneguzzi,
Rodrigo C. Barros
Abstract:
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporat…
▽ More
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Augmented Behavioral Cloning from Observation
Authors:
Juarez Monteiro,
Nathan Gavenski,
Roger Granada,
Felipe Meneguzzi,
Rodrigo Barros
Abstract:
Imitation from observation is a computational technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations. Recent approaches learn the inverse dynamics of the environment and an imitation policy by interleaving epochs of both models while changing the demonstration data. However, such approaches often get stuck i…
▽ More
Imitation from observation is a computational technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations. Recent approaches learn the inverse dynamics of the environment and an imitation policy by interleaving epochs of both models while changing the demonstration data. However, such approaches often get stuck into sub-optimal solutions that are distant from the expert, limiting their imitation effectiveness. We address this problem with a novel approach that overcomes the problem of reaching bad local minima by exploring: (I) a self-attention mechanism that better captures global features of the states; and (ii) a sampling strategy that regulates the observations that are used for learning. We show empirically that our approach outperforms the state-of-the-art approaches in four different environments by a large margin.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
HAPRec: Hybrid Activity and Plan Recognizer
Authors:
Roger Granada,
Ramon Fraga Pereira,
Juarez Monteiro,
Leonardo Amado,
Rodrigo C. Barros,
Duncan Ruiz,
Felipe Meneguzzi
Abstract:
Computer-based assistants have recently attracted much interest due to its applicability to ambient assisted living. Such assistants have to detect and recognize the high-level activities and goals performed by the assisted human beings. In this work, we demonstrate activity recognition in an indoor environment in order to identify the goal towards which the subject of the video is pursuing. Our h…
▽ More
Computer-based assistants have recently attracted much interest due to its applicability to ambient assisted living. Such assistants have to detect and recognize the high-level activities and goals performed by the assisted human beings. In this work, we demonstrate activity recognition in an indoor environment in order to identify the goal towards which the subject of the video is pursuing. Our hybrid approach combines an action recognition module and a goal recognition algorithm to identify the ultimate goal of the subject in the video.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
An end-to-end approach for the verification problem: learning the right distance
Authors:
Joao Monteiro,
Isabela Albuquerque,
Jahangir Alam,
R Devon Hjelm,
Tiago Falk
Abstract:
In this contribution, we augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder. Several interpretations are thus drawn for the learned distance-like model's output. We first show it approximates a likelihood ratio which can be used for hypothesis tests, and that it further induces a large divergence across the joint distributions of pairs…
▽ More
In this contribution, we augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder. Several interpretations are thus drawn for the learned distance-like model's output. We first show it approximates a likelihood ratio which can be used for hypothesis tests, and that it further induces a large divergence across the joint distributions of pairs of examples from the same and from different classes. Evaluation is performed under the verification setting consisting of determining whether sets of examples belong to the same class, even if such classes are novel and were never presented to the model during training. Empirical evaluation shows such method defines an end-to-end approach for the verification problem, able to attain better performance than simple scorers such as those based on cosine similarity and further outperforming widely used downstream classifiers. We further observe training is much simplified under the proposed approach compared to metric learning with actual distances, requiring no complex scheme to harvest pairs of examples.
△ Less
Submitted 14 August, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Multi-task self-supervised learning for Robust Speech Recognition
Authors:
Mirco Ravanelli,
Jianyuan Zhong,
Santiago Pascual,
Pawel Swietojanski,
Joao Monteiro,
Jan Trmal,
Yoshua Bengio
Abstract:
Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require ma…
▽ More
Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.
△ Less
Submitted 17 April, 2020; v1 submitted 24 January, 2020;
originally announced January 2020.
-
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Authors:
Alex Bie,
Bharat Venkitesh,
Joao Monteiro,
Md. Akmal Haidar,
Mehdi Rezagholizadeh
Abstract:
While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task.…
▽ More
While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task. We empirically introduce a more compact Speech-Transformer by investigating the impact of discarding particular modules on the performance of the model. Moreover, we evaluate reducing the numerical precision of our network's weights and activations while maintaining the performance of the full-precision model. Our experiments show that we can reduce the number of parameters of the full-precision model and then further compress the model 4x by fully quantizing to 8-bit fixed point precision.
△ Less
Submitted 24 March, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
Generalizing to unseen domains via distribution matching
Authors:
Isabela Albuquerque,
João Monteiro,
Mohammad Darvishi,
Tiago H. Falk,
Ioannis Mitliagkas
Abstract:
Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice. In this work, we tackle such problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on the following lemma: by minimizing…
▽ More
Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice. In this work, we tackle such problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on the following lemma: by minimizing a notion of discrepancy between all pairs from a set of given domains, we also minimize the discrepancy between any pairs of mixtures of domains. Using this result, we derive a generalization bound for our setting. We then show that low risk over unseen domains can be achieved by representing the data in a space where (i) the training distributions are indistinguishable, and (ii) relevant information for the task at hand is preserved. Minimizing the terms in our bound yields an adversarial formulation which estimates and minimizes pairwise discrepancies. We validate our proposed strategy on standard domain generalization benchmarks, outperforming a number of recently introduced methods. Notably, we tackle a real-world application where the underlying data corresponds to multi-channel electroencephalography time series from different subjects, each considered as a distinct domain.
△ Less
Submitted 15 September, 2021; v1 submitted 2 November, 2019;
originally announced November 2019.
-
Cross-Subject Statistical Shift Estimation for Generalized Electroencephalography-based Mental Workload Assessment
Authors:
Isabela Albuquerque,
João Monteiro,
Olivier Rosanne,
Abhishek Tiwari,
Jean-François Gagnon,
Tiago H. Falk
Abstract:
Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously prese…
▽ More
Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously presenting reliable performance across users. Domain adaptation consists of a set of strategies that aim at allowing for improving machine learning systems performance on unseen data at training time. Such methods, however, might rely on assumptions over the considered data distributions, which typically do not hold for applications of EEG data. Motivated by this observation, in this work we propose a strategy to estimate two types of discrepancies between multiple data distributions, namely marginal and conditional shifts, observed on data collected from different subjects. Besides shedding light on the assumptions that hold for a particular dataset, the estimates of statistical shifts obtained with the proposed approach can be used for investigating other aspects of a machine learning pipeline, such as quantitatively assessing the effectiveness of domain adaptation strategies. In particular, we consider EEG data collected from individuals performing mental tasks while running on a treadmill and pedaling on a stationary bike and explore the effects of different normalization strategies commonly used to mitigate cross-subject variability. We show the effects that different normalization schemes have on statistical shifts and their relationship with the accuracy of mental workload prediction as assessed on unseen participants at training time.
△ Less
Submitted 22 September, 2021; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Classifying Norm Conflicts using Learned Semantic Representations
Authors:
João Paulo Aires,
Roger Granada,
Juarez Monteiro,
Rodrigo C. Barros,
Felipe Meneguzzi
Abstract:
While most social norms are informal, they are often formalized by companies in contracts to regulate trades of goods and services. When poorly written, contracts may contain normative conflicts resulting from opposing deontic meanings or contradict specifications. As contracts tend to be long and contain many norms, manually identifying such conflicts requires human-effort, which is time-consumin…
▽ More
While most social norms are informal, they are often formalized by companies in contracts to regulate trades of goods and services. When poorly written, contracts may contain normative conflicts resulting from opposing deontic meanings or contradict specifications. As contracts tend to be long and contain many norms, manually identifying such conflicts requires human-effort, which is time-consuming and error-prone. Automating such task benefits contract makers increasing productivity and making conflict identification more reliable. To address this problem, we introduce an approach to detect and classify norm conflicts in contracts by converting them into latent representations that preserve both syntactic and semantic information and training a model to classify norm conflicts in four conflict types. Our results reach the new state of the art when compared to a previous approach.
△ Less
Submitted 13 May, 2019;
originally announced June 2019.
-
Learning to navigate image manifolds induced by generative adversarial networks for unsupervised video generation
Authors:
Isabela Albuquerque,
João Monteiro,
Tiago H. Falk
Abstract:
In this work, we introduce a two-step framework for generative modeling of temporal data. Specifically, the generative adversarial networks (GANs) setting is employed to generate synthetic scenes of moving objects. To do so, we propose a two-step training scheme within which: a generator of static frames is trained first. Afterwards, a recurrent model is trained with the goal of providing a sequen…
▽ More
In this work, we introduce a two-step framework for generative modeling of temporal data. Specifically, the generative adversarial networks (GANs) setting is employed to generate synthetic scenes of moving objects. To do so, we propose a two-step training scheme within which: a generator of static frames is trained first. Afterwards, a recurrent model is trained with the goal of providing a sequence of inputs to the previously trained frames generator, thus yielding scenes which look natural. The adversarial setting is employed in both training steps. However, with the aim of avoiding known training instabilities in GANs, a multiple discriminator approach is used to train both models. Results in the studied video dataset indicate that, by employing such an approach, the recurrent part is able to learn how to coherently navigate the image manifold induced by the frames generator, thus yielding more natural-looking scenes.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Multi-objective training of Generative Adversarial Networks with multiple discriminators
Authors:
Isabela Albuquerque,
João Monteiro,
Thang Doan,
Breandan Considine,
Tiago Falk,
Ioannis Mitliagkas
Abstract:
Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in contrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator s…
▽ More
Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in contrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction can be computed efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-End Speaker Verification
Authors:
Gautam Bhattacharya,
Joao Monteiro,
Jahangir Alam,
Patrick Kenny
Abstract:
This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or target domains. We train several GAN variants using our proposed framework and apply them to the speaker verification task. On the challenging NIST-SRE 2016 datase…
▽ More
This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or target domains. We train several GAN variants using our proposed framework and apply them to the speaker verification task. On the challenging NIST-SRE 2016 dataset, we are able to match the performance of a strong baseline x-vector system. In contrast to the the baseline systems which are dependent on dimensionality reduction (LDA) and an external classifier (PLDA), our proposed speaker embeddings can be scored using simple cosine distance. This is achieved by optimizing our models end-to-end, using an angular margin loss function. Furthermore, we are able to significantly boost verification performance by averaging our different GAN models at the score level, achieving a relative improvement of 7.2% over the baseline.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
On-line Adaptative Curriculum Learning for GANs
Authors:
Thang Doan,
Joao Monteiro,
Isabela Albuquerque,
Bogdan Mazoure,
Audrey Durand,
Joelle Pineau,
R Devon Hjelm
Abstract:
Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen…
▽ More
Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage. Keywords: multiple discriminators, curriculum learning, multiple resolutions discriminators, multi-armed bandits, generative adversarial networks, smooth discriminators, multi-discriminator gan training, multiple experts.
△ Less
Submitted 11 March, 2019; v1 submitted 31 July, 2018;
originally announced August 2018.
-
Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch
Authors:
João Monteiro,
Isabela Albuquerque,
Zahid Akhtar,
Tiago H. Falk
Abstract:
Modern applications of artificial neural networks have yielded remarkable performance gains in a wide range of tasks. However, recent studies have discovered that such modelling strategy is vulnerable to Adversarial Examples, i.e. examples with subtle perturbations often too small and imperceptible to humans, but that can easily fool neural networks. Defense techniques against adversarial examples…
▽ More
Modern applications of artificial neural networks have yielded remarkable performance gains in a wide range of tasks. However, recent studies have discovered that such modelling strategy is vulnerable to Adversarial Examples, i.e. examples with subtle perturbations often too small and imperceptible to humans, but that can easily fool neural networks. Defense techniques against adversarial examples have been proposed, but ensuring robust performance against varying or novel types of attacks remains an open problem. In this work, we focus on the detection setting, in which case attackers become identifiable while models remain vulnerable. Particularly, we employ the decision layer of independently trained models as features for posterior detection. The proposed framework does not require any prior knowledge of adversarial examples generation techniques, and can be directly employed along with unmodified off-the-shelf models. Experiments on the standard MNIST and CIFAR10 datasets deliver empirical evidence that such detection approach generalizes well across not only different adversarial examples generation methods but also quality degradation attacks. Non-linear binary classifiers trained on top of our proposed features can achieve a high detection rate (>90%) in a set of white-box attacks and maintain such performance when tested against unseen attacks.
△ Less
Submitted 22 April, 2019; v1 submitted 21 February, 2018;
originally announced February 2018.
-
A Tutorial on Canonical Correlation Methods
Authors:
Viivi Uurtio,
João M. Monteiro,
Jaz Kandola,
John Shawe-Taylor,
Delmiro Fernandez-Reyes,
Juho Rousu
Abstract:
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and…
▽ More
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
△ Less
Submitted 7 November, 2017;
originally announced November 2017.
-
Building an Effective Data Warehousing for Financial Sector
Authors:
Jose Ferreira,
Fernando Almeida,
Jose Monteiro
Abstract:
This article presents the implementation process of a Data Warehouse and a multidimensional analysis of business data for a holding company in the financial sector. The goal is to create a business intelligence system that, in a simple, quick but also versatile way, allows the access to updated, aggregated, real and/or projected information, regarding bank account balances. The established system…
▽ More
This article presents the implementation process of a Data Warehouse and a multidimensional analysis of business data for a holding company in the financial sector. The goal is to create a business intelligence system that, in a simple, quick but also versatile way, allows the access to updated, aggregated, real and/or projected information, regarding bank account balances. The established system extracts and processes the operational database information which supports cash management information by using Integration Services and Analysis Services tools from Microsoft SQL Server. The end-user interface is a pivot table, properly arranged to explore the information available by the produced cube. The results have shown that the adoption of online analytical processing cubes offers better performance and provides a more automated and robust process to analyze current and provisional aggregated financial data balances compared to the current process based on static reports built from transactional databases.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
e-commerce business models in the context of web3.0 paradigm
Authors:
Fernando Almeida,
José D. Santos,
José A. Monteiro
Abstract:
Web 3.0 promises to have a significant effect in users and businesses. It will change how people work and play, how companies use information to market and sell their products, as well as operate their businesses. The basic shift occurring in Web 3.0 is from information-centric to knowledge-centric patterns of computing. Web 3.0 will enable people and machines to connect, evolve, share and use kno…
▽ More
Web 3.0 promises to have a significant effect in users and businesses. It will change how people work and play, how companies use information to market and sell their products, as well as operate their businesses. The basic shift occurring in Web 3.0 is from information-centric to knowledge-centric patterns of computing. Web 3.0 will enable people and machines to connect, evolve, share and use knowledge on an unprecedented scale and in new ways that make our experience of the Internet better. Additionally, semantic technologies have the potential to drive significant improvements in capabilities and life cycle economics through cost reductions, improved efficiencies, enhanced effectiveness, and new functionalities that were not possible or economically feasible before. In this paper we look to the semantic web and Web 3.0 technologies as enablers for the creation of value and appearance of new business models. For that, we analyze the role and impact of Web 3.0 in business and we identify nine potential business models, based in direct and undirected revenue sources, which have emerged with the appearance of semantic web technologies.
△ Less
Submitted 9 January, 2014;
originally announced January 2014.
-
Optimally Solving the MCM Problem Using Pseudo-Boolean Satisfiability
Authors:
Nuno P. Lopes,
Levent Aksoy,
Vasco Manquinho,
José Monteiro
Abstract:
In this report, we describe three encodings of the multiple constant multiplication (MCM) problem to pseudo-boolean satisfiability (PBS), and introduce an algorithm to solve the MCM problem optimally. To the best of our knowledge, the proposed encodings and the optimization algorithm are the first formalization of the MCM problem in a PBS manner. This report evaluates the complexity of the problem…
▽ More
In this report, we describe three encodings of the multiple constant multiplication (MCM) problem to pseudo-boolean satisfiability (PBS), and introduce an algorithm to solve the MCM problem optimally. To the best of our knowledge, the proposed encodings and the optimization algorithm are the first formalization of the MCM problem in a PBS manner. This report evaluates the complexity of the problem size and the performance of several PBS solvers over three encodings.
△ Less
Submitted 17 May, 2011; v1 submitted 11 November, 2010;
originally announced November 2010.