Search | arXiv e-print repository

doi 10.1109/MCI.2025.3563859

Don't Forget your Inverse DDIM for Image Editing

Authors: Guillermo Gomez-Trenado, Pablo Mesejo, Oscar Cordón, Stéphane Lathuilière

Abstract: The field of text-to-image generation has undergone significant advancements with the introduction of diffusion models. Nevertheless, the challenge of editing real images persists, as most methods are either computationally intensive or produce poor reconstructions. This paper introduces SAGE (Self-Attention Guidance for image Editing) - a novel technique leveraging pre-trained diffusion models fo… ▽ More The field of text-to-image generation has undergone significant advancements with the introduction of diffusion models. Nevertheless, the challenge of editing real images persists, as most methods are either computationally intensive or produce poor reconstructions. This paper introduces SAGE (Self-Attention Guidance for image Editing) - a novel technique leveraging pre-trained diffusion models for image editing. SAGE builds upon the DDIM algorithm and incorporates a novel guidance mechanism utilizing the self-attention layers of the diffusion U-Net. This mechanism computes a reconstruction objective based on attention maps generated during the inverse DDIM process, enabling efficient reconstruction of unedited regions without the need to precisely reconstruct the entire input image. Thus, SAGE directly addresses the key challenges in image editing. The superiority of SAGE over other methods is demonstrated through quantitative and qualitative evaluations and confirmed by a statistically validated comprehensive user study, in which all 47 surveyed users preferred SAGE over competing methods. Additionally, SAGE ranks as the top-performing method in seven out of 10 quantitative analyses and secures second and third places in the remaining three. △ Less

Submitted 14 May, 2025; originally announced May 2025.

Comments: 12 pages, 12 figures, code available at https://guillermogotre.github.io/sage/

ACM Class: I.2.10; I.5.0

arXiv:2501.08068 [pdf, ps, other]

A Roadmap to Guide the Integration of LLMs in Hierarchical Planning

Authors: Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

Abstract: Recent advances in Large Language Models (LLMs) are fostering their integration into several reasoning-related fields, including Automated Planning (AP). However, their integration into Hierarchical Planning (HP), a subfield of AP that leverages hierarchical knowledge to enhance planning performance, remains largely unexplored. In this preliminary work, we propose a roadmap to address this gap and… ▽ More Recent advances in Large Language Models (LLMs) are fostering their integration into several reasoning-related fields, including Automated Planning (AP). However, their integration into Hierarchical Planning (HP), a subfield of AP that leverages hierarchical knowledge to enhance planning performance, remains largely unexplored. In this preliminary work, we propose a roadmap to address this gap and harness the potential of LLMs for HP. To this end, we present a taxonomy of integration methods, exploring how LLMs can be utilized within the HP life cycle. Additionally, we provide a benchmark with a standardized dataset for evaluating the performance of future LLM-based HP approaches, and present initial results for a state-of-the-art HP planner and LLM planner. As expected, the latter exhibits limited performance (3\% correct plans, and none with a correct hierarchical decomposition) but serves as a valuable baseline for future approaches. △ Less

Submitted 14 January, 2025; originally announced January 2025.

Comments: 5 pages, 0 figures, to be published in the AAAI Workshop on Planning in the Era of LLMs ( https://llmforplanning.github.io )

arXiv:2310.02167 [pdf, ps, other]

Towards a Unified Framework for Sequential Decision Making

Authors: Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

Abstract: In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from… ▽ More In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 10 pages, 0 figures

MSC Class: I.2.8

Journal ref: Carlos Núñez Molina, Pablo Mesejo, & Juan Fernández-Olivares. (2023). Towards a Unified Framework for Sequential Decision Making. In ICAPS PRL Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning

arXiv:2308.11905 [pdf, other]

doi 10.24963/ijcai.2024/747

On Using Admissible Bounds for Learning Forward Search Heuristics

Authors: Carlos Núñez-Molina, Masataro Asai, Pablo Mesejo, Juan Fernández-Olivares

Abstract: In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs a… ▽ More In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics. △ Less

Submitted 7 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: 19 pages, 2 figures

MSC Class: I.2.8

Journal ref: Carlos Núñez Molina, Masataro Asai, Pablo Mesejo, & Juan Fernández-Olivares. (2024). On using admissible bounds for learning forward search heuristics. In IJCAI, pages 6761-6769

arXiv:2304.10590 [pdf, other]

doi 10.1145/3663366

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Authors: Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

Abstract: In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a… ▽ More In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation. △ Less

Submitted 5 July, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 35 pages, 16 figures

ACM Class: A.2; I.2.4; I.2.6; I.2.8

Journal ref: Carlos Núñez Molina, Pablo Mesejo, & Juan Fernández-Olivares. (2024). A review of symbolic, subsymbolic and hybrid methods for sequential decision making. ACM Computing Surveys, 56(11), Article 272, 1-36

arXiv:2302.09899 [pdf, other]

A Survey on Semi-Supervised Semantic Segmentation

Authors: Adrian Peláez-Vegas, Pablo Mesejo, Julián Luengo

Abstract: Semantic segmentation is one of the most challenging tasks in computer vision. However, in many applications, a frequent obstacle is the lack of labeled images, due to the high cost of pixel-level labeling. In this scenario, it makes sense to approach the problem from a semi-supervised point of view, where both labeled and unlabeled images are exploited. In recent years this line of research has g… ▽ More Semantic segmentation is one of the most challenging tasks in computer vision. However, in many applications, a frequent obstacle is the lack of labeled images, due to the high cost of pixel-level labeling. In this scenario, it makes sense to approach the problem from a semi-supervised point of view, where both labeled and unlabeled images are exploited. In recent years this line of research has gained much interest and many approaches have been published in this direction. Therefore, the main objective of this study is to provide an overview of the current state of the art in semi-supervised semantic segmentation, offering an updated taxonomy of all existing methods to date. This is complemented by an experimentation with a variety of models representing all the categories of the taxonomy on the most widely used becnhmark datasets in the literature, and a final discussion on the results obtained, the challenges and the most promising lines of future research. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2301.10280 [pdf, other]

doi 10.3233/FAIA240978

NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems

Authors: Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

Abstract: In the field of Automated Planning there is often the need for a set of planning problems from a particular domain, e.g., to be used as training data for Machine Learning or as benchmarks in planning competitions. In most cases, these problems are created either by hand or by a domain-specific generator, putting a burden on the human designers. In this paper we propose NeSIG, to the best of our kn… ▽ More In the field of Automated Planning there is often the need for a set of planning problems from a particular domain, e.g., to be used as training data for Machine Learning or as benchmarks in planning competitions. In most cases, these problems are created either by hand or by a domain-specific generator, putting a burden on the human designers. In this paper we propose NeSIG, to the best of our knowledge the first domain-independent method for automatically generating planning problems that are valid, diverse and difficult to solve. We formulate problem generation as a Markov Decision Process and train two generative policies with Deep Reinforcement Learning to generate problems with the desired properties. We conduct experiments on three classical domains, comparing our approach against handcrafted, domain-specific instance generators and various ablations. Results show NeSIG is able to automatically generate valid and diverse problems of much greater difficulty (15.5 times more on geometric average) than domain-specific generators, while simultaneously reducing human effort when compared to them. Additionally, it can generalize to larger problems than those seen during training. △ Less

Submitted 16 July, 2024; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 15 pages, 9 figures

ACM Class: I.2.6; I.2.8

Journal ref: Carlos Núñez Molina, Pablo Mesejo, & Juan Fernández-Olivares. (2024). NeSIG: A neuro-symbolic method for learning to generate planning problems. In ECAI, volume 392, pages 4084-4091

arXiv:2209.06399 [pdf, other]

doi 10.1109/TEVC.2022.3220747

A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends

Authors: Ying Bi, Bing Xue, Pablo Mesejo, Stefano Cagnoni, Mengjie Zhang

Abstract: Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, image-related tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distor… ▽ More Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, image-related tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distortions. Evolutionary computation (EC) approaches have been widely used for image analysis with significant achievement. However, there is no comprehensive survey of existing EC approaches to image analysis. To fill this gap, this paper provides a comprehensive survey covering all essential EC approaches to important image analysis tasks including edge detection, image segmentation, image feature analysis, image classification, object detection, and others. This survey aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches and exploring how and why EC is used for CV and image analysis. The applications, challenges, issues, and trends associated to this research field are also discussed and summarised to provide further guidelines and opportunities for future research. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Conditionally accepted by IEEE Transactions on Evolutionary Computation

Journal ref: IEEE Transactions on Evolutionary Computationm, 2022, https://ieeexplore.ieee.org/document/9943992/

arXiv:2207.11025 [pdf, other]

Custom Structure Preservation in Face Aging

Authors: Guillermo Gomez-Trenado, Stéphane Lathuilière, Pablo Mesejo, Óscar Cordón

Abstract: In this work, we propose a novel architecture for face age editing that can produce structural modifications while maintaining relevant details present in the original image. We disentangle the style and content of the input image and propose a new decoder network that adopts a style-based strategy to combine the style and content representations of the input image while conditioning the output on… ▽ More In this work, we propose a novel architecture for face age editing that can produce structural modifications while maintaining relevant details present in the original image. We disentangle the style and content of the input image and propose a new decoder network that adopts a style-based strategy to combine the style and content representations of the input image while conditioning the output on the target age. We go beyond existing aging methods allowing users to adjust the degree of structure preservation in the input image during inference. To this purpose, we introduce a masking mechanism, the CUstom Structure Preservation module, that distinguishes relevant regions in the input image from those that should be discarded. CUSP requires no additional supervision. Finally, our quantitative and qualitative analysis which include a user study, show that our method outperforms prior art and demonstrates the effectiveness of our strategy regarding image editing and adjustable structure preservation. Code and pretrained models are available at https://github.com/guillermogotre/CUSP. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 36 pages, 21 figures

arXiv:2009.11204 [pdf, other]

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Authors: Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud

Abstract: Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used… ▽ More Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing V-VAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets in-the-wild -- WildVVAD -- based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset. △ Less

Submitted 16 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: International Conference on Pattern Recognition, Milan, Italy, January 2021

arXiv:1902.10953 [pdf, other]

Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View

Authors: Benoit Massé, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud

Abstract: In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial r… ▽ More In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial representation of the gaze directions adopting a top-view perspective. Second, we develop several convolutional encoder/decoder networks to predict object locations and compare them with heuristics and with classical learning-based approaches. Third, in order to train the proposed models, we generate a very large number of synthetic scenarios employing a probabilistic formulation. Finally, our methodology is empirically validated using a publicly available dataset. △ Less

Submitted 28 February, 2019; originally announced February 2019.

Comments: FG 2019

arXiv:1810.05193 [pdf, other]

Understanding Priors in Bayesian Neural Networks at the Unit Level

Authors: Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel

Abstract: We investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. Bayesian neural networks with Gaussian priors are well known to induce an L2, "weight decay", regularization. Our results characterize a more intricate regularization effect at the level of the unit activations. Our main result establishes that the induced prior distribution on the uni… ▽ More We investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. Bayesian neural networks with Gaussian priors are well known to induce an L2, "weight decay", regularization. Our results characterize a more intricate regularization effect at the level of the unit activations. Our main result establishes that the induced prior distribution on the units before and after activation becomes increasingly heavy-tailed with the depth of the layer. We show that first layer units are Gaussian, second layer units are sub-exponential, and units in deeper layers are characterized by sub-Weibull distributions. Our results provide new theoretical insight on deep Bayesian neural networks, which we corroborate with simulation experiments. △ Less

Submitted 10 May, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: 10 pages, 5 figures, ICML'19 conference

arXiv:1808.09211 [pdf, other]

DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model

Authors: Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

Abstract: In this paper, we address the problem of how to robustly train a ConvNet for regression, or deep robust regression. Traditionally, deep regression employs the L2 loss function, known to be sensitive to outliers, i.e. samples that either lie at an abnormal distance away from the majority of the training samples, or that correspond to wrongly annotated targets. This means that, during back-propagati… ▽ More In this paper, we address the problem of how to robustly train a ConvNet for regression, or deep robust regression. Traditionally, deep regression employs the L2 loss function, known to be sensitive to outliers, i.e. samples that either lie at an abnormal distance away from the majority of the training samples, or that correspond to wrongly annotated targets. This means that, during back-propagation, outliers may bias the training process due to the high magnitude of their gradient. In this paper, we propose DeepGUM: a deep regression model that is robust to outliers thanks to the use of a Gaussian-uniform mixture model. We derive an optimization algorithm that alternates between the unsupervised detection of outliers using expectation-maximization, and the supervised training with cleaned samples using stochastic gradient descent. DeepGUM is able to adapt to a continuously evolving outlier distribution, avoiding to manually impose any threshold on the proportion of outliers in the training set. Extensive experimental evaluations on four different tasks (facial and fashion landmark detection, age and head pose estimation) lead us to conclude that our novel robust technique provides reliability in the presence of various types of noise and protection against a high percentage of outliers. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Comments: accepted at ECCV 2018

arXiv:1803.08450 [pdf, other]

doi 10.1109/TPAMI.2019.2910523

A Comprehensive Analysis of Deep Regression

Authors: Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

Abstract: Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimizatio… ▽ More Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others. This situation motivates the current study, in which we perform a systematic evaluation and statistical analysis of vanilla deep regression, i.e. convolutional neural networks with a linear regression top layer. This is the first comprehensive analysis of deep regression techniques. We perform experiments on four vision problems, and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture. Our results reinforce the hypothesis according to which, in general, a general-purpose network (e.g. VGG-16 or ResNet-50) adequately tuned can yield results close to the state-of-the-art without having to resort to more complex and ad-hoc regression models. △ Less

Submitted 24 September, 2020; v1 submitted 22 March, 2018; originally announced March 2018.

Comments: Published in IEEE TPAMI

Journal ref: IEEE TPAMI Volume: 42 , Issue: 9 , Sept. 1 2020

arXiv:1711.06834 [pdf, other]

doi 10.1016/j.patrec.2018.05.023

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Authors: Stéphane Lathuilière, Benoit Massé, Pablo Mesejo, Radu Horaud

Abstract: This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of t… ▽ More This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior. △ Less

Submitted 23 April, 2018; v1 submitted 18 November, 2017; originally announced November 2017.

Comments: Paper submitted to Pattern Recognition Letters

Journal ref: Pattern Recognition Letters, vol. 118, 2019, 61-71

Showing 1–15 of 15 results for author: Mesejo, P