Skip to main content

Showing 1–16 of 16 results for author: Martins, A F T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.07131  [pdf, other

    cs.CL cs.LG stat.ML

    Reranking Laws for Language Generation: A Communication-Theoretic Perspective

    Authors: António Farinhas, Haau-Sing Li, André F. T. Martins

    Abstract: To ensure large language models (LLMs) are used safely, one must reduce their propensity to hallucinate or to generate unacceptable answers. A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy comm… ▽ More

    Submitted 10 February, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024 (spotlight)

  2. arXiv:2310.01262  [pdf, other

    cs.LG stat.ML

    Non-Exchangeable Conformal Risk Control

    Authors: António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins

    Abstract: Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in m… ▽ More

    Submitted 26 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  3. arXiv:2301.07473  [pdf, other

    cs.LG stat.ML

    Discrete Latent Structure in Neural Networks

    Authors: Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins

    Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions.… ▽ More

    Submitted 22 November, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

    ACM Class: I.2.6

  4. arXiv:2203.02336  [pdf, other

    cs.LG stat.ME

    Differentiable Causal Discovery Under Latent Interventions

    Authors: Gonçalo R. A. Faria, André F. T. Martins, Mário A. T. Figueiredo

    Abstract: Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Journal ref: Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:253-274, 2022

  5. arXiv:2108.01988  [pdf, other

    cs.LG cs.AI stat.ML

    Sparse Continuous Distributions and Fenchel-Young Losses

    Authors: André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

    Abstract: Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused… ▽ More

    Submitted 4 August, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.07214

  6. arXiv:2007.01919  [pdf, other

    cs.LG stat.ML

    Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

    Authors: Gonçalo M. Correia, Vlad Niculae, Wilker Aziz, André F. T. Martins

    Abstract: Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with l… ▽ More

    Submitted 28 December, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  7. arXiv:2006.07214  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Sparse and Continuous Attention Mechanisms

    Authors: André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

    Abstract: Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and a… ▽ More

    Submitted 29 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  8. arXiv:2001.04437  [pdf, other

    cs.LG cs.CL stat.ML

    LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction

    Authors: Vlad Niculae, André F. T. Martins

    Abstract: Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some… ▽ More

    Submitted 5 August, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: 34 pages, 5 tables, 4 figures. ICML 2020

  9. arXiv:1909.00015  [pdf, other

    cs.CL stat.ML

    Adaptively Sparse Transformers

    Authors: Gonçalo M. Correia, Vlad Niculae, André F. T. Martins

    Abstract: Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduc… ▽ More

    Submitted 6 September, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China

  10. arXiv:1907.10348  [pdf, ps, other

    cs.LG stat.ML

    Notes on Latent Structure Models and SPIGOT

    Authors: André F. T. Martins, Vlad Niculae

    Abstract: These notes aim to shed light on the recently proposed structured projected intermediate gradient optimization technique (SPIGOT, Peng et al., 2018). SPIGOT is a variant of the straight-through estimator (Bengio et al., 2013) which bypasses gradients of the argmax function by back-propagating a surrogate "gradient." We provide a new interpretation to the proposed gradient and put this technique in… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: 7 pages

  11. arXiv:1901.02324  [pdf, other

    stat.ML cs.LG

    Learning with Fenchel-Young Losses

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their st… ▽ More

    Submitted 2 March, 2020; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: In Journal of Machine Learning Research, volume 21

  12. arXiv:1809.00653  [pdf, other

    cs.CL cs.LG stat.ML

    Towards Dynamic Computation Graphs via Sparse Latent Structure

    Authors: Vlad Niculae, André F. T. Martins, Claire Cardie

    Abstract: Deep NLP models benefit from underlying structures in the data---e.g., parse trees---typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribu… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018; 9 pages (incl. appendix)

    MSC Class: 68T50 ACM Class: I.2.6; I.2.7

  13. arXiv:1805.09717  [pdf, other

    stat.ML cs.LG

    Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distr… ▽ More

    Submitted 22 February, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: In proceedings of AISTATS 2019

  14. arXiv:1802.04223  [pdf, other

    stat.ML cs.CL cs.LG

    SparseMAP: Differentiable Sparse Structured Inference

    Authors: Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie

    Abstract: Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structu… ▽ More

    Submitted 20 June, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Published in ICML 2018. 14 pages, including appendix

    MSC Class: 68T50 ACM Class: I.2.6; I.2.6

  15. arXiv:1602.02068  [pdf, other

    cs.CL cs.LG stat.ML

    From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

    Authors: André F. T. Martins, Ramón Fernandez Astudillo

    Abstract: We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation. Then, we propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss. We reveal an unexpect… ▽ More

    Submitted 8 February, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

    Comments: Minor corrections

  16. arXiv:1010.2770  [pdf, other

    stat.ML

    Online Multiple Kernel Learning for Structured Prediction

    Authors: Andre F. T. Martins, Mario A. T. Figueiredo, Pedro M. Q. Aguiar, Noah A. Smith, Eric P. Xing

    Abstract: Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that dr… ▽ More

    Submitted 13 October, 2010; originally announced October 2010.