Skip to main content

Showing 1–27 of 27 results for author: Borde, H S d O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05300  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)

    Authors: Nicholas Merchant, Haitz Sáez de Ocáriz Borde, Andrei Cristian Popescu, Carlos Garcia Jurado Suarez

    Abstract: We argue that generative text-to-image models often struggle with prompt adherence due to the noisy and unstructured nature of large-scale datasets like LAION-5B. This forces users to rely heavily on prompt engineering to elicit desirable outputs. In this work, we propose that enforcing a consistent caption structure during training can significantly improve model controllability and alignment. We… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: 7-page main paper + appendix, 18 figures

  2. arXiv:2507.02944  [pdf, ps, other

    cs.LG

    Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention

    Authors: Haitz Sáez de Ocáriz Borde

    Abstract: Multi-head attention powers Transformer networks, the primary deep learning architecture behind the success of large language models (LLMs). Yet, the theoretical advantages of multi-head versus single-head attention, beyond mere parallel processing, remain underexplored. In this paper, we reframe multi-head attention as a system of potentially synergistic computational graphs, where each head func… ▽ More

    Submitted 28 June, 2025; originally announced July 2025.

  3. arXiv:2507.01806  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

    Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios

    Abstract: Low-Rank Adapters (LoRAs) have transformed the fine-tuning of Large Language Models (LLMs) by enabling parameter-efficient updates. However, their widespread adoption remains limited by the reliance on GPU-based training. In this work, we propose a theoretically grounded approach to LoRA fine-tuning designed specifically for users with limited computational resources, particularly those restricted… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 5-page main paper (excluding references) + 11-page appendix, 3 tables, 1 figure. Accepted to ICML 2025 Workshop on Efficient Systems for Foundation Models

  4. arXiv:2506.14530  [pdf, ps, other

    stat.ML cs.AI cs.LG cs.NE math.ST

    Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters

    Authors: Anastasis Kratsios, Tin Sum Cheng, Aurelien Lucchi, Haitz Sáez de Ocáriz Borde

    Abstract: Low-Rank Adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning (PEFT) technique for foundation models. Recent work has highlighted an inherent asymmetry in the initialization of LoRA's low-rank factors, which has been present since its inception and was presumably derived experimentally. This paper focuses on providing a comprehensive theoretical characterization of asy… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  5. arXiv:2505.23331  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization

    Authors: Matteo Gallici, Haitz Sáez de Ocáriz Borde

    Abstract: Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs more closely with nuanced human preferences. In this paper, we investigate the application of Group Relative Policy Optimization (GRPO) to fine-tune next-scale visual autoregressive (VAR) models. Our empirical results demonstrate that this approach enables alignment… ▽ More

    Submitted 28 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  6. arXiv:2503.09008  [pdf, other

    cs.LG cs.AI

    Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

    Authors: Huidong Liang, Haitz Sáez de Ocáriz Borde, Baskaran Sripathmanathan, Michael Bronstein, Xiaowen Dong

    Abstract: Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: work in progress

  7. arXiv:2502.04226  [pdf, other

    cs.CV cs.LG cs.NE stat.CO stat.ML

    Keep It Light! Simplifying Image Clustering Via Text-Free Adapters

    Authors: Yicen Li, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Paul D. McNicholas

    Abstract: Many competitive clustering pipelines have a multi-modal design, leveraging large language models (LLMs) or other text encoders, and text-image pairs, which are often unavailable in real-world downstream applications. Additionally, such frameworks are generally complicated to train and require substantial computational resources, making widespread adoption challenging. In this work, we show that i… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  8. arXiv:2411.00835  [pdf, other

    cs.LG

    Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

    Authors: Haitz Sáez de Ocáriz Borde, Artem Lukoianov, Anastasis Kratsios, Michael Bronstein, Xiaowen Dong

    Abstract: We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive… ▽ More

    Submitted 29 October, 2024; originally announced November 2024.

  9. arXiv:2408.13885  [pdf, other

    cs.LG cs.DM cs.NE math.MG stat.ML

    Neural Spacetimes for DAG Representation Learning

    Authors: Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Marc T. Law, Xiaowen Dong, Michael Bronstein

    Abstract: We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in it… ▽ More

    Submitted 9 March, 2025; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 12 pages: main body and 19 pages: appendix

  10. arXiv:2407.09926  [pdf, ps, other

    cs.LG cs.AI

    Metric Learning for Clifford Group Equivariant Neural Networks

    Authors: Riccardo Ali, Paulina Kulytė, Haitz Sáez de Ocáriz Borde, Pietro Liò

    Abstract: Clifford Group Equivariant Neural Networks (CGENNs) leverage Clifford algebras and multivectors as an alternative approach to incorporating group equivariance to ensure symmetry constraints in neural representations. In principle, this formulation generalizes to orthogonal groups and preserves equivariance regardless of the metric signature. However, previous works have restricted internal network… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM) at the ICML 2024

  11. arXiv:2405.15891  [pdf, other

    cs.CV cs.GR cs.LG

    Score Distillation via Reparametrized DDIM

    Authors: Artem Lukoianov, Haitz Sáez de Ocáriz Borde, Kristjan Greenewald, Vitor Campagnolo Guizilini, Timur Bagautdinov, Vincent Sitzmann, Justin Solomon

    Abstract: While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice… ▽ More

    Submitted 10 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024. 28 pages, 30 figures. Revision: additional comparisons and ablations studies

  12. arXiv:2402.16842  [pdf, other

    cs.LG

    Asymmetry in Low-Rank Adapters of Foundation Models

    Authors: Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon

    Abstract: Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically,… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages, 2 figures, 9 tables

  13. arXiv:2402.16308  [pdf, other

    cs.RO

    DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer

    Authors: Yizhe Wu, Haitz Sáez de Ocáriz Borde, Jack Collins, Oiwi Parker Jones, Ingmar Posner

    Abstract: 3D scene understanding for robotic applications exhibits a unique set of requirements including real-time inference, object-centric latent representation learning, accurate 6D pose estimation and 3D reconstruction of objects. Current methods for scene understanding typically rely on a combination of trained models paired with either an explicit or learnt volumetric representation, all of which hav… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  14. arXiv:2402.03460  [pdf, other

    stat.ML cs.LG cs.NE math.CO math.NA

    Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts

    Authors: Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Takashi Furuya, Marc T. Law

    Abstract: Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models by employing a routing strategy in which each input is processed by a single "expert" deep learning model. This strategy allows us to scale up the number of parameters defining the MoE while maintaining sparse activation, i.e., MoEs only load a small number of their total parameters into GPU VRAM for the forward pass de… ▽ More

    Submitted 25 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  15. arXiv:2311.11891  [pdf, other

    cs.LG cs.SI stat.ML

    AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference

    Authors: Yuan Lu, Haitz Sáez de Ocáriz Borde, Pietro Liò

    Abstract: In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data, dynamically learning the necessary graph structure. These graphs are often derived from a latent embedding space, w… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  16. arXiv:2310.15003  [pdf, other

    cs.LG cs.DM cs.NE math.MG

    Neural Snowflakes: Universal Latent Graph Inference via Trainable Latent Geometries

    Authors: Haitz Sáez de Ocáriz Borde, Anastasis Kratsios

    Abstract: The inductive bias of a graph neural network (GNN) is largely encoded in its specified graph. Latent graph inference relies on latent geometric representations to dynamically rewire or infer a GNN's graph to maximize the GNN's predictive downstream performance, but it lacks solid theoretical foundations in terms of embedding-based representation guarantees. This paper addresses this issue by intro… ▽ More

    Submitted 9 March, 2025; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 9 Pages + Appendix, 2 Figures, 9 Tables

  17. arXiv:2310.12395  [pdf, other

    cs.LG stat.ML

    Closed-Form Diffusion Models

    Authors: Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, Justin Solomon

    Abstract: Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via sco… ▽ More

    Submitted 5 May, 2025; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Published in TMLR, May 2025

  18. arXiv:2309.05678  [pdf, other

    cs.LG

    Gromov-Hausdorff Distances for Comparing Product Manifolds of Model Spaces

    Authors: Haitz Saez de Ocariz Borde, Alvaro Arroyo, Ismael Morales, Ingmar Posner, Xiaowen Dong

    Abstract: Recent studies propose enhancing machine learning models by aligning the geometric characteristics of the latent space with the underlying data structure. Instead of relying solely on Euclidean space, researchers have suggested using hyperbolic and spherical spaces with constant curvature, or their combinations (known as product manifolds), to improve model performance. However, there exists no pr… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2309.04810

  19. arXiv:2309.04810  [pdf, other

    cs.LG stat.ML

    Neural Latent Geometry Search: Product Manifold Inference via Gromov-Hausdorff-Informed Bayesian Optimization

    Authors: Haitz Saez de Ocariz Borde, Alvaro Arroyo, Ismael Morales, Ingmar Posner, Xiaowen Dong

    Abstract: Recent research indicates that the performance of machine learning models can be improved by aligning the geometry of the latent space with the underlying data structure. Rather than relying solely on Euclidean space, researchers have proposed using hyperbolic and spherical spaces with constant curvature, or combinations thereof, to better model the latent space and enhance model performance. Howe… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

  20. arXiv:2308.09250  [pdf, other

    cs.LG cs.DM cs.NE math.MG math.NA

    Capacity Bounds for Hyperbolic Neural Network Representations of Latent Tree Structures

    Authors: Anastasis Kratsios, Ruiyang Hong, Haitz Sáez de Ocáriz Borde

    Abstract: We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function. We establish the first proof that HNNs can $\varepsilon$-isometrically embed any finite weighted tree into a hyperbolic space of dimension $d$ at least equal to $2$ with prescribed sectional curvature $κ<0$, for any $\varepsilon> 1$ (where $\varepsilon=1$ being optimal). We establish rig… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 22 Pages + References, 1 Table, 4 Figures

    MSC Class: 68T07; 30L05; 68R12; 05C05

  21. arXiv:2303.11754  [pdf, ps, other

    cs.LG

    Projections of Model Spaces for Latent Graph Inference

    Authors: Haitz Sáez de Ocáriz Borde, Álvaro Arroyo, Ingmar Posner

    Abstract: Graph Neural Networks leverage the connectivity structure of graphs as an inductive bias. Latent graph inference focuses on learning an adequate graph structure to diffuse information on and improve the downstream performance of the model. In this work we employ stereographic projections of the hyperbolic and spherical model spaces, as well as products of Riemannian manifolds, for the purpose of l… ▽ More

    Submitted 12 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted at the ICLR 2023 Workshop on Physics for Machine Learning

  22. arXiv:2211.16199  [pdf, other

    cs.LG

    Latent Graph Inference using Product Manifolds

    Authors: Haitz Sáez de Ocáriz Borde, Anees Kazi, Federico Barbero, Pietro Liò

    Abstract: Graph Neural Networks usually rely on the assumption that the graph topology is available to the network as well as optimal for the downstream task. Latent graph inference allows models to dynamically learn the intrinsic graph structure of problems where the connectivity patterns of data may not be directly accessible. In this work, we generalize the discrete Differentiable Graph Module (dDGM) for… ▽ More

    Submitted 27 June, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

  23. arXiv:2209.13410  [pdf, other

    cs.LG q-bio.BM q-bio.QM

    Graph Neural Network Expressivity and Meta-Learning for Molecular Property Regression

    Authors: Haitz Sáez de Ocáriz Borde, Federico Barbero

    Abstract: We demonstrate the applicability of model-agnostic algorithms for meta-learning, specifically Reptile, to GNN models in molecular regression tasks. Using meta-learning we are able to learn new chemical prediction tasks with only a few model updates, as compared to using randomly initialized GNNs which require learning each regression task from scratch. We experimentally show that GNN layer express… ▽ More

    Submitted 24 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

  24. arXiv:2206.08702  [pdf, other

    cs.LG math.AT math.DG

    Sheaf Neural Networks with Connection Laplacians

    Authors: Federico Barbero, Cristian Bodnar, Haitz Sáez de Ocáriz Borde, Michael Bronstein, Petar Veličković, Pietro Liò

    Abstract: A Sheaf Neural Network (SNN) is a type of Graph Neural Network (GNN) that operates on a sheaf, an object that equips a graph with vector spaces over its nodes and edges and linear maps between these spaces. SNNs have been shown to have useful theoretical properties that help tackle issues arising from heterophily and over-smoothing. One complication intrinsic to these models is finding a good shea… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Presented at the ICML 2022 Workshop on Topology, Algebra, and Geometry in Machine Learning

  25. arXiv:2111.13297  [pdf, other

    cs.LG

    Latent Space based Memory Replay for Continual Learning in Artificial Neural Networks

    Authors: Haitz Sáez de Ocáriz Borde

    Abstract: Memory replay may be key to learning in biological brains, which manage to learn new tasks continually without catastrophically interfering with previous knowledge. On the other hand, artificial neural networks suffer from catastrophic forgetting and tend to only perform well on tasks that they were recently trained on. In this work we explore the application of latent space based memory replay fo… ▽ More

    Submitted 5 January, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  26. arXiv:2111.00328  [pdf, other

    physics.flu-dyn cs.LG

    Multi-Task Learning based Convolutional Models with Curriculum Learning for the Anisotropic Reynolds Stress Tensor in Turbulent Duct Flow

    Authors: Haitz Sáez de Ocáriz Borde, David Sondak, Pavlos Protopapas

    Abstract: The Reynolds-averaged Navier-Stokes (RANS) equations require accurate modeling of the anisotropic Reynolds stress tensor. Traditional closure models, while sophisticated, often only apply to restricted flow configurations. Researchers have started using machine learning approaches to tackle this problem by developing more general closure models informed by data. In this work we build upon recent c… ▽ More

    Submitted 31 January, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

  27. arXiv:2104.09476  [pdf, other

    q-fin.PR cs.LG

    Interpretability in deep learning for finance: a case study for the Heston model

    Authors: Damiano Brigo, Xiaoshan Huang, Andrea Pallavicini, Haitz Saez de Ocariz Borde

    Abstract: Deep learning is a powerful tool whose applications in quantitative finance are growing every day. Yet, artificial neural networks behave as black boxes and this hinders validation and accountability processes. Being able to interpret the inner functioning and the input-output relationship of these networks has become key for the acceptance of such tools. In this paper we focus on the calibration… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    MSC Class: 68T07; 91G20; 91G60