Skip to main content

Showing 1–22 of 22 results for author: Gallego, V

.
  1. arXiv:2506.11702  [pdf, ps, other

    cs.CL cs.AI

    Configurable Preference Tuning with Rubric-Guided Synthetic Data

    Authors: Víctor Gallego

    Abstract: Models of human feedback for AI alignment, such as those underpinning Direct Preference Optimization (DPO), often bake in a singular, static set of preferences, limiting adaptability. This paper challenges the assumption of monolithic preferences by introducing Configurable Preference Tuning (CPT), a novel framework for endowing language models with the ability to dynamically adjust their behavior… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025 Workshop on Models of Human Feedback for AI Alignment

  2. arXiv:2502.07985  [pdf, other

    cs.CL cs.AI

    MetaSC: Test-Time Safety Specification Optimization for Language Models

    Authors: Víctor Gallego

    Abstract: We propose a novel dynamic safety framework that optimizes language model (LM) safety reasoning at inference time without modifying model weights. Building on recent advances in self-critique methods, our approach leverages a meta-critique mechanism that iteratively updates safety prompts-termed specifications-to drive the critique and revision process adaptively. This test-time optimization not o… ▽ More

    Submitted 7 April, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Published at ICLR 2025 Workshop on Foundation Models in the Wild

    Journal ref: ICLR 2025 Workshop on Foundation Models in the Wild

  3. arXiv:2406.07188  [pdf, other

    cs.CL cs.AI

    Merging Improves Self-Critique Against Jailbreak Attacks

    Authors: Victor Gallego

    Abstract: The robustness of large language models (LLMs) against adversarial manipulations, such as jailbreak attacks, remains a significant challenge. In this work, we propose an approach that enhances the self-critique capability of the LLM and further fine-tunes it over sanitized synthetic data. This is done with the addition of an external critic model that can be merged with the original, thus bolsteri… ▽ More

    Submitted 14 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Published at ICML 2024 Workshop on Foundation Models in the Wild

  4. arXiv:2404.00495  [pdf, other

    cs.CL cs.AI

    Configurable Safety Tuning of Language Models with Synthetic Preference Data

    Authors: Victor Gallego

    Abstract: State-of-the-art language model fine-tuning techniques, such as Direct Preference Optimization (DPO), restrict user control by hard-coding predefined behaviors into the model. To address this, we propose a novel method, Configurable Safety Tuning (CST), that augments DPO using synthetic preference data to facilitate flexible safety configuration of LLMs at inference time. CST overcomes the constra… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  5. arXiv:2402.08005  [pdf, other

    cs.CL cs.LG

    Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs

    Authors: Víctor Gallego

    Abstract: In this paper, we introduce \emph{refined Direct Preference Optimization} (rDPO), a method for improving the behavioral alignment of Large Language Models (LLMs) without the need for human-annotated data. The method involves creating synthetic data using self-critique prompting by a teacher LLM and then utilising a generalized DPO loss function to distil to a student LLM. The loss function incorpo… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Pre-print. Submitted to the ICLR 2024 Workshop on Representational Alignment (Re-Align)

  6. arXiv:2312.01957  [pdf, other

    cs.CL cs.LG

    Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective

    Authors: Victor Gallego

    Abstract: This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs.… ▽ More

    Submitted 11 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted to ICLR 2024 (TinyPapers track)

    Journal ref: The Second Tiny Papers Track at ICLR 2024

  7. arXiv:2308.07929  [pdf, other

    cs.CV cs.LG

    Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation

    Authors: Victor Gallego

    Abstract: Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards… ▽ More

    Submitted 21 September, 2023; v1 submitted 15 July, 2023; originally announced August 2023.

    Comments: Accepted to Proceedings of the 23rd European Young Statisticians Meeting (EYSM)

  8. arXiv:2308.06385  [pdf, other

    cs.CL cs.AI

    ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF

    Authors: Victor Gallego

    Abstract: In this work, we address the problem of directing the text generation of a language model (LM) towards a desired behavior, aligning the generated text with the preferences of the human operator. We propose using another, instruction-tuned language model as a critic reward model in a zero-shot way thanks to the prompt of a Yes-No question that represents the user preferences, without requiring furt… ▽ More

    Submitted 14 December, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: pre-print, work in progress

  9. arXiv:2302.06427  [pdf

    cs.OH cs.AR cs.SE

    HERMES: qualification of High pErformance pRogrammable Microprocessor and dEvelopment of Software ecosystem

    Authors: Nadia Ibellaatti, Edouard Lepape, Alp Kilic, Kaya Akyel, Kassem Chouayakh, Fabrizio Ferrandi, Claudio Barone, Serena Curzel, Michele Fiorito, Giovanni Gozzi, Miguel Masmano, Ana Risquez Navarro, Manuel Muñoz, Vicente Nicolau Gallego, Patricia Lopez Cueva, Jean-noel Letrillard, Franck Wartel

    Abstract: European efforts to boost competitiveness in the sector of space services promote the research and development of advanced software and hardware solutions. The EU-funded HERMES project contributes to the effort by qualifying radiation-hardened, high-performance programmable microprocessors, and by developing a software ecosystem that facilitates the deployment of complex applications on such platf… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted for publication at DATE 2023

  10. arXiv:2209.12330  [pdf, other

    cs.CV cs.LG

    Personalizing Text-to-Image Generation via Aesthetic Gradients

    Authors: Victor Gallego

    Abstract: This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images. The approach is validated with qualitative and quantitative experiments, using the recent stable diffusion model and several aesthetically-filtered datasets. Code is released at https://github.com/vi… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Submitted to NeurIPS 2022 Machine Learning for Creativity and Design Workshop

  11. arXiv:2208.01740  [pdf, other

    cs.AI eess.SY

    From Single Aircraft to Communities: A Neutral Interpretation of Air Traffic Complexity Dynamics

    Authors: Ralvi Isufaj, Marsel Omeri, Miquel Angel Piera, Jaume Saez Valls, Christian Eduardo Verdonk Gallego

    Abstract: Present air traffic complexity metrics are defined considering the interests of different management layers of ATM. These layers have different objectives which in practice compete to maximize their own goals, which leads to fragmented decision making. This fragmentation together with competing KPAs requires transparent and neutral air traffic information to pave the way for an explainable set of… ▽ More

    Submitted 15 July, 2022; originally announced August 2022.

    Comments: 21 pages, 30 figures, 2 tables, submitted to Research Transportation Part C

  12. arXiv:2207.07049  [pdf, other

    stat.ML cs.LG

    How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns

    Authors: Manuel Navarro-García, Daniel Precioso, Kathryn Gavira-O'Neill, Alberto Torres-Barrán, David Gordo, Víctor Gallego, David Gómez-Ullate

    Abstract: Based on the data gathered by echo-sounder buoys attached to drifting Fish Aggregating Devices (dFADs) across tropical oceans, the current study applies a Machine Learning protocol to examine the temporal trends of tuna schools' association to drifting objects. Using a binary output, metrics typically used in the literature were adapted to account for the fact that the entire tuna aggregation unde… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  13. arXiv:2109.13232  [pdf, other

    stat.ML cs.LG

    Contributions to Large Scale Bayesian Inference and Adversarial Machine Learning

    Authors: Víctor Gallego

    Abstract: The rampant adoption of ML methodologies has revealed that models are usually adopted to make decisions without taking into account the uncertainties in their predictions. More critically, they can be vulnerable to adversarial examples. Thus, we believe that developing ML systems that take into account predictive uncertainties and are robust against adversarial examples is a must for critical, rea… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

    Comments: PhD thesis

  14. arXiv:2101.10721  [pdf, other

    cs.GT cs.LG econ.TH

    Data sharing games

    Authors: Víctor Gallego, Roi Naveiro, David Ríos Insua, Wolfram Rozas

    Abstract: Data sharing issues pervade online social and economic environments. To foster social progress, it is important to develop models of the interaction between data producers and consumers that can promote the rise of cooperation between the involved parties. We formalize this interaction as a game, the data sharing game, based on the Iterated Prisoner's Dilemma and deal with it through multi-agent r… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  15. arXiv:2007.02613  [pdf, ps, other

    cs.GT stat.AP

    Adversarial Risk Analysis (Overview)

    Authors: David Banks, Víctor Gallego, Roi Naveiro, David Ríos Insua

    Abstract: Adversarial risk analysis (ARA) is a relatively new area of research that informs decision-making when facing intelligent opponents and uncertain outcomes. It enables an analyst to express her Bayesian beliefs about an opponent's utilities, capabilities, probabilities and the type of strategic calculation that the opponent is using. Within that framework, the analyst then solves the problem from t… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  16. arXiv:2004.08705  [pdf, other

    stat.ML cs.CR cs.LG stat.CO

    Protecting Classifiers From Attacks. A Bayesian Approach

    Authors: Victor Gallego, Roi Naveiro, Alberto Redondo, David Rios Insua, Fabrizio Ruggeri

    Abstract: Classification problems in security settings are usually modeled as confrontations in which an adversary tries to fool a classifier manipulating the covariates of instances to obtain a benefit. Most approaches to such problems have focused on game-theoretic ideas with strong underlying common knowledge assumptions, which are not realistic in the security realm. We provide an alternative Bayesian f… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

  17. arXiv:2003.03546  [pdf, other

    cs.AI cs.LG stat.CO stat.ML

    Adversarial Machine Learning: Bayesian Perspectives

    Authors: David Rios Insua, Roi Naveiro, Victor Gallego, Jason Poulos

    Abstract: Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting machine learning (ML) systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems. This creates a new class of security vulnerabilities that ML systems may face, and a new desirable property called adversarial robustness essential to t… ▽ More

    Submitted 22 February, 2024; v1 submitted 7 March, 2020; originally announced March 2020.

    Journal ref: Journal of the American Statistical Association. Volume 118, 2023 - Issue 543

  18. arXiv:1908.09744  [pdf, other

    cs.LG stat.ML

    Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs

    Authors: Victor Gallego, David Rios Insua

    Abstract: A framework to boost the efficiency of Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation. We call it the refined variational approximation. Its strength lies both in ease of implementation and automatically tuning of the sampler parameters to speed up mixing time using automatic differentiation. Several strategies to appr… ▽ More

    Submitted 22 February, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

  19. arXiv:1908.08773  [pdf, other

    cs.LG stat.ML

    Opponent Aware Reinforcement Learning

    Authors: Victor Gallego, Roi Naveiro, David Rios Insua, David Gomez-Ullate Oteiza

    Abstract: We introduce Threatened Markov Decision Processes (TMDPs) as an extension of the classical Markov Decision Process framework for Reinforcement Learning (RL). TMDPs allow suporting a decision maker against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical… ▽ More

    Submitted 26 August, 2019; v1 submitted 22 August, 2019; originally announced August 2019.

    Comments: Substantially extends the previous work: https://www.aaai.org/ojs/index.php/AAAI/article/view/5106. This article draws heavily from arXiv arXiv:1809.01560

  20. arXiv:1812.00071  [pdf, other

    stat.ML cs.LG

    Stochastic Gradient MCMC with Repulsive Forces

    Authors: Victor Gallego, David Rios Insua

    Abstract: We propose a unifying view of two different Bayesian inference algorithms, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) and Stein Variational Gradient Descent (SVGD), leading to improved and efficient novel sampling schemes. We show that SVGD combined with a noise term can be framed as a multiple chain SG-MCMC method. Instead of treating each parallel chain independently from others, our… ▽ More

    Submitted 22 February, 2020; v1 submitted 30 November, 2018; originally announced December 2018.

    Comments: Extends the workshop version

  21. arXiv:1809.01560  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Reinforcement Learning under Threats

    Authors: Victor Gallego, Roi Naveiro, David Rios Insua

    Abstract: In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-$k$ thinking scheme resulting in a new lear… ▽ More

    Submitted 30 July, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: Extends the verson published at the Proceedings of the AAAI Conference on Artificial Intelligence 33, https://www.aaai.org/ojs/index.php/AAAI/article/view/5106

  22. arXiv:1801.03050  [pdf, other

    stat.ML econ.EM q-fin.RM stat.AP

    Assessing the effect of advertising expenditures upon sales: a Bayesian structural time series model

    Authors: Víctor Gallego, Pablo Suárez-García, Pablo Angulo, David Gómez-Ullate

    Abstract: We propose a robust implementation of the Nerlove--Arrow model using a Bayesian structural time series model to explain the relationship between advertising expenditures of a country-wide fast-food franchise network with its weekly sales. Thanks to the flexibility and modularity of the model, it is well suited to generalization to other markets or situations. Its Bayesian nature facilitates incorp… ▽ More

    Submitted 29 May, 2019; v1 submitted 9 January, 2018; originally announced January 2018.

    Comments: Published at Applied Stochastic Models in Business and Industry, https://onlinelibrary.wiley.com/doi/full/10.1002/asmb.2460

    Journal ref: Appl Stochastic Models Bus Ind. 2019; 1-13