Search | arXiv e-print repository

arXiv:2505.19896 [pdf, ps, other]

Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program

Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares

Abstract: Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompts. We intend to apply these concepts to the field of Control in space, enabling LLMs to play a significant role in the decision-making process for autonomous satellite operations. As a first step towards this goal, we have developed a pure LLM-bas… ▽ More Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompts. We intend to apply these concepts to the field of Control in space, enabling LLMs to play a significant role in the decision-making process for autonomous satellite operations. As a first step towards this goal, we have developed a pure LLM-based solution for the Kerbal Space Program Differential Games (KSPDG) challenge, a public software design competition where participants create autonomous agents for maneuvering satellites involved in non-cooperative space operations, running on the KSP game engine. Our approach leverages prompt engineering, few-shot prompting, and fine-tuning techniques to create an effective LLM-based agent that ranked 2nd in the competition. To the best of our knowledge, this work pioneers the integration of LLM agents into space research. The project comprises several open repositories to facilitate replication and further research. The codebase is accessible on \href{https://github.com/ARCLab-MIT/kspdg}{GitHub}, while the trained models and datasets are available on \href{https://huggingface.co/OhhTuRnz}{Hugging Face}. Additionally, experiment tracking and detailed results can be reviewed on \href{https://wandb.ai/carrusk/huggingface}{Weights \& Biases △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Non revised version for paper going to be published in Journal of Advances in Space Research

arXiv:2504.20099 [pdf, other]

Decoding Latent Spaces: Assessing the Interpretability of Time Series Foundation Models for Visual Analytics

Authors: Inmaculada Santamaria-Valenzuela, Victor Rodriguez-Fernandez, Javier Huertas-Tato, Jong Hyuk Park, David Camacho

Abstract: The present study explores the interpretability of latent spaces produced by time series foundation models, focusing on their potential for visual analysis tasks. Specifically, we evaluate the MOMENT family of models, a set of transformer-based, pre-trained architectures for multivariate time series tasks such as: imputation, prediction, classification, and anomaly detection. We evaluate the capac… ▽ More The present study explores the interpretability of latent spaces produced by time series foundation models, focusing on their potential for visual analysis tasks. Specifically, we evaluate the MOMENT family of models, a set of transformer-based, pre-trained architectures for multivariate time series tasks such as: imputation, prediction, classification, and anomaly detection. We evaluate the capacity of these models on five datasets to capture the underlying structures in time series data within their latent space projection and validate whether fine tuning improves the clarity of the resulting embedding spaces. Notable performance improvements in terms of loss reduction were observed after fine tuning. Visual analysis shows limited improvement in the interpretability of the embeddings, requiring further work. Results suggest that, although Time Series Foundation Models such as MOMENT are robust, their latent spaces may require additional methodological refinements to be adequately interpreted, such as alternative projection techniques, loss functions, or data preprocessing strategies. Despite the limitations of MOMENT, foundation models supose a big reduction in execution time and so a great advance for interactive visual analytics. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: Currently under review at the International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)

arXiv:2501.16588 [pdf, other]

Fine-Tuned Language Models as Space Systems Controllers

Authors: Enrico M. Zucchelli, Di Wu, Julia Briden, Christian Hofmann, Victor Rodriguez-Fernandez, Richard Linares

Abstract: Large language models (LLMs), or foundation models (FMs), are pretrained transformers that coherently complete sentences auto-regressively. In this paper, we show that LLMs can control simplified space systems after some additional training, called fine-tuning. We look at relatively small language models, ranging between 7 and 13 billion parameters. We focus on four problems: a three-dimensional s… ▽ More Large language models (LLMs), or foundation models (FMs), are pretrained transformers that coherently complete sentences auto-regressively. In this paper, we show that LLMs can control simplified space systems after some additional training, called fine-tuning. We look at relatively small language models, ranging between 7 and 13 billion parameters. We focus on four problems: a three-dimensional spring toy problem, low-thrust orbit transfer, low-thrust cislunar control, and powered descent guidance. The fine-tuned LLMs are capable of controlling systems by generating sufficiently accurate outputs that are multi-dimensional vectors with up to 10 significant digits. We show that for several problems the amount of data required to perform fine-tuning is smaller than what is generally required of traditional deep neural networks (DNNs), and that fine-tuned LLMs are good at generalizing outside of the training dataset. Further, the same LLM can be fine-tuned with data from different problems, with only minor performance degradation with respect to LLMs trained for a single application. This work is intended as a first step towards the development of a general space systems controller. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Journal ref: Proceedings of the AAS/AIAA Astrodynamics Specialist Conference, paper number AAS 24-445, Broomfield, CO, August 2024

arXiv:2501.07802 [pdf, other]

doi 10.2514/6.2025-1543

Visual Language Models as Operator Agents in the Space Domain

Authors: Alejandro Carrasco, Marco Nedungadi, Enrico M. Zucchelli, Amit Jain, Victor Rodriguez-Fernandez, Richard Linares

Abstract: This paper explores the application of Vision-Language Models (VLMs) as operator agents in the space domain, focusing on both software and hardware operational paradigms. Building on advances in Large Language Models (LLMs) and their multimodal extensions, we investigate how VLMs can enhance autonomous control and decision-making in space missions. In the software context, we employ VLMs within th… ▽ More This paper explores the application of Vision-Language Models (VLMs) as operator agents in the space domain, focusing on both software and hardware operational paradigms. Building on advances in Large Language Models (LLMs) and their multimodal extensions, we investigate how VLMs can enhance autonomous control and decision-making in space missions. In the software context, we employ VLMs within the Kerbal Space Program Differential Games (KSPDG) simulation environment, enabling the agent to interpret visual screenshots of the graphical user interface to perform complex orbital maneuvers. In the hardware context, we integrate VLMs with robotic systems equipped with cameras to inspect and diagnose physical space objects, such as satellites. Our results demonstrate that VLMs can effectively process visual and textual data to generate contextually appropriate actions, competing with traditional methods and non-multimodal LLMs in simulation tasks, and showing promise in real-world applications. △ Less

Submitted 13 January, 2025; originally announced January 2025.

Comments: Updated version of the paper presented in 2025 AIAA SciTech. https://arc.aiaa.org/doi/10.2514/6.2025-1543

arXiv:2410.05097 [pdf]

DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objects

Authors: Nidhi Mathihalli, Audrey Wei, Giovanni Lavezzi, Peng Mun Siew, Victor Rodriguez-Fernandez, Hodei Urrutxua, Richard Linares

Abstract: Novel view synthesis (NVS) enables to generate new images of a scene or convert a set of 2D images into a comprehensive 3D model. In the context of Space Domain Awareness, since space is becoming increasingly congested, NVS can accurately map space objects and debris, improving the safety and efficiency of space operations. Similarly, in Rendezvous and Proximity Operations missions, 3D models can… ▽ More Novel view synthesis (NVS) enables to generate new images of a scene or convert a set of 2D images into a comprehensive 3D model. In the context of Space Domain Awareness, since space is becoming increasingly congested, NVS can accurately map space objects and debris, improving the safety and efficiency of space operations. Similarly, in Rendezvous and Proximity Operations missions, 3D models can provide details about a target object's shape, size, and orientation, allowing for better planning and prediction of the target's behavior. In this work, we explore the generalization abilities of these reconstruction techniques, aiming to avoid the necessity of retraining for each new scene, by presenting a novel approach to 3D spacecraft reconstruction from single-view images, DreamSat, by fine-tuning the Zero123 XL, a state-of-the-art single-view reconstruction model, on a high-quality dataset of 190 high-quality spacecraft models and integrating it into the DreamGaussian framework. We demonstrate consistent improvements in reconstruction quality across multiple metrics, including Contrastive Language-Image Pretraining (CLIP) score (+0.33%), Peak Signal-to-Noise Ratio (PSNR) (+2.53%), Structural Similarity Index (SSIM) (+2.38%), and Learned Perceptual Image Patch Similarity (LPIPS) (+0.16%) on a test set of 30 previously unseen spacecraft images. Our method addresses the lack of domain-specific 3D reconstruction tools in the space industry by leveraging state-of-the-art diffusion models and 3D Gaussian splatting techniques. This approach maintains the efficiency of the DreamGaussian framework while enhancing the accuracy and detail of spacecraft reconstructions. The code for this work can be accessed on GitHub (https://github.com/ARCLab-MIT/space-nvs). △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: Presented at the 75th International Astronautical Congress, October 2024, Milan, Italy

arXiv:2408.08676 [pdf, other]

doi 10.5281/zenodo.13885579

Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program

Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares

Abstract: Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompt. This study explores the use of fine-tuned Large Language Models (LLMs) for autonomous spacecraft control, using the Kerbal Space Program Differential Games suite (KSPDG) as a testing environment. Traditional Reinforcement Learning (RL) approache… ▽ More Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompt. This study explores the use of fine-tuned Large Language Models (LLMs) for autonomous spacecraft control, using the Kerbal Space Program Differential Games suite (KSPDG) as a testing environment. Traditional Reinforcement Learning (RL) approaches face limitations in this domain due to insufficient simulation capabilities and data. By leveraging LLMs, specifically fine-tuning models like GPT-3.5 and LLaMA, we demonstrate how these models can effectively control spacecraft using language-based inputs and outputs. Our approach integrates real-time mission telemetry into textual prompts processed by the LLM, which then generate control actions via an agent. The results open a discussion about the potential of LLMs for space operations beyond their nominal use for text-related tasks. Future work aims to expand this methodology to other space control tasks and evaluate the performance of different LLM families. The code is available at this URL: \texttt{https://github.com/ARCLab-MIT/kspdg}. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: ESA SPAICE Conference 2024. arXiv admin note: text overlap with arXiv:2404.00413

arXiv:2408.04692 [pdf, other]

doi 10.1007/978-3-031-74127-2_21

Exploring Scalability in Large-Scale Time Series in DeepVATS framework

Authors: Inmaculada Santamaria-Valenzuela, Victor Rodriguez-Fernandez, David Camacho

Abstract: Visual analytics is essential for studying large time series due to its ability to reveal trends, anomalies, and insights. DeepVATS is a tool that merges Deep Learning (Deep) with Visual Analytics (VA) for the analysis of large time series data (TS). It has three interconnected modules. The Deep Learning module, developed in R, manages the load of datasets and Deep Learning models from and to the… ▽ More Visual analytics is essential for studying large time series due to its ability to reveal trends, anomalies, and insights. DeepVATS is a tool that merges Deep Learning (Deep) with Visual Analytics (VA) for the analysis of large time series data (TS). It has three interconnected modules. The Deep Learning module, developed in R, manages the load of datasets and Deep Learning models from and to the Storage module. This module also supports models training and the acquisition of the embeddings from the latent space of the trained model. The Storage module operates using the Weights and Biases system. Subsequently, these embeddings can be analyzed in the Visual Analytics module. This module, based on an R Shiny application, allows the adjustment of the parameters related to the projection and clustering of the embeddings space. Once these parameters are set, interactive plots representing both the embeddings, and the time series are shown. This paper introduces the tool and examines its scalability through log analytics. The execution time evolution is examined while the length of the time series is varied. This is achieved by resampling a large data series into smaller subsets and logging the main execution and rendering times for later analysis of scalability. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Admitted pending publication in Lecture Notes in Network and Systems (LNNS) series (Springer). Code available at https://github.com/vrodriguezf/deepvats

Report number: The 13th Conference on Information Technology and its Applications

Journal ref: The 13th Conference on Information Technology and its Applications, Lecture Notes in Networks and Systems, vol. 937, Springer, Cham, 2024, pp. 244-255

arXiv:2408.03691 [pdf, other]

doi 10.5281/zenodo.13885649

Generative Design of Periodic Orbits in the Restricted Three-Body Problem

Authors: Alvaro Francisco Gil, Walther Litteri, Victor Rodriguez-Fernandez, David Camacho, Massimiliano Vasile

Abstract: The Three-Body Problem has fascinated scientists for centuries and it has been crucial in the design of modern space missions. Recent developments in Generative Artificial Intelligence hold transformative promise for addressing this longstanding problem. This work investigates the use of Variational Autoencoder (VAE) and its internal representation to generate periodic orbits. We utilize a compreh… ▽ More The Three-Body Problem has fascinated scientists for centuries and it has been crucial in the design of modern space missions. Recent developments in Generative Artificial Intelligence hold transformative promise for addressing this longstanding problem. This work investigates the use of Variational Autoencoder (VAE) and its internal representation to generate periodic orbits. We utilize a comprehensive dataset of periodic orbits in the Circular Restricted Three-Body Problem (CR3BP) to train deep-learning architectures that capture key orbital characteristics, and we set up physical evaluation metrics for the generated trajectories. Through this investigation, we seek to enhance the understanding of how Generative AI can improve space mission planning and astrodynamics research, leading to novel, data-driven approaches in the field. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: SPAICE Conference 2024 (7 pages)

Journal ref: Proceedings of SPAICE 2024, pp. 430-436

arXiv:2406.15847 [pdf, other]

Enhancing Solar Driver Forecasting with Multivariate Transformers

Authors: Sergio Sanchez-Hurtado, Victor Rodriguez-Fernandez, Julia Briden, Peng Mun Siew, Richard Linares

Abstract: In this work, we develop a comprehensive framework for F10.7, S10.7, M10.7, and Y10.7 solar driver forecasting with a time series Transformer (PatchTST). To ensure an equal representation of high and low levels of solar activity, we construct a custom loss function to weight samples based on the distance between the solar driver's historical distribution and the training set. The solar driver fore… ▽ More In this work, we develop a comprehensive framework for F10.7, S10.7, M10.7, and Y10.7 solar driver forecasting with a time series Transformer (PatchTST). To ensure an equal representation of high and low levels of solar activity, we construct a custom loss function to weight samples based on the distance between the solar driver's historical distribution and the training set. The solar driver forecasting framework includes an 18-day lookback window and forecasts 6 days into the future. When benchmarked against the Space Environment Technologies (SET) dataset, our model consistently produces forecasts with a lower standard mean error in nearly all cases, with improved prediction accuracy during periods of high solar activity. All the code is available on Github https://github.com/ARCLab-MIT/sw-driver-forecaster. △ Less

Submitted 1 August, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: Short paper accepted for oral presentation at the SPAICE Conference 2024 (https://spaice.esa.int/)

arXiv:2404.00413 [pdf, other]

Language Models are Spacecraft Operators

Authors: Victor Rodriguez-Fernandez, Alejandro Carrasco, Jason Cheng, Eli Scharf, Peng Mun Siew, Richard Linares

Abstract: Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompts. We intend to apply these concepts to the field of Guidance, Navigation, and Control in space, enabling LLMs to have a significant role in the decision-making process for autonomous satellite operations. As a first step towards this goal, we hav… ▽ More Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompts. We intend to apply these concepts to the field of Guidance, Navigation, and Control in space, enabling LLMs to have a significant role in the decision-making process for autonomous satellite operations. As a first step towards this goal, we have developed a pure LLM-based solution for the Kerbal Space Program Differential Games (KSPDG) challenge, a public software design competition where participants create autonomous agents for maneuvering satellites involved in non-cooperative space operations, running on the KSP game engine. Our approach leverages prompt engineering, few-shot prompting, and fine-tuning techniques to create an effective LLM-based agent that ranked 2nd in the competition. To the best of our knowledge, this work pioneers the integration of LLM agents into space research. Code is available at https://github.com/ARCLab-MIT/kspdg. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Source code available on Github at: https://github.com/ARCLab-MIT/kspdg

arXiv:2402.18743 [pdf, other]

doi 10.1016/j.eswa.2020.113708

A revision on Multi-Criteria Decision Making methods for Multi-UAV Mission Planning Support

Authors: Cristian Ramirez-Atencia, Victor Rodriguez-Fernandez, David Camacho

Abstract: Over the last decade, Unmanned Aerial Vehicles (UAVs) have been extensively used in many commercial applications due to their manageability and risk avoidance. One of the main problems considered is the Mission Planning for multiple UAVs, where a solution plan must be found satisfying the different constraints of the problem. This problem has multiple variables that must be optimized simultaneousl… ▽ More Over the last decade, Unmanned Aerial Vehicles (UAVs) have been extensively used in many commercial applications due to their manageability and risk avoidance. One of the main problems considered is the Mission Planning for multiple UAVs, where a solution plan must be found satisfying the different constraints of the problem. This problem has multiple variables that must be optimized simultaneously, such as the makespan, the cost of the mission or the risk. Therefore, the problem has a lot of possible optimal solutions, and the operator must select the final solution to be executed among them. In order to reduce the workload of the operator in this decision process, a Decision Support System (DSS) becomes necessary. In this work, a DSS consisting of ranking and filtering systems, which order and reduce the optimal solutions, has been designed. With regard to the ranking system, a wide range of Multi-Criteria Decision Making (MCDM) methods, including some fuzzy MCDM, are compared on a multi-UAV mission planning scenario, in order to study which method could fit better in a multi-UAV decision support system. Expert operators have evaluated the solutions returned, and the results show, on the one hand, that fuzzy methods generally achieve better average scores, and on the other, that all of the tested methods perform better when the preferences of the operators are biased towards a specific variable, and worse when their preferences are balanced. For the filtering system, a similarity function based on the proximity of the solutions has been designed, and on top of that, a threshold is tuned empirically to decide how to filter solutions without losing much of the hypervolume of the space of solutions. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Preprint submitted and acepted in Expert Systems with Applications

Journal ref: Expert Systems with Applications, Volume 160, 2020, 113708

arXiv:2401.04212 [pdf, other]

Towards a Machine Learning-Based Approach to Predict Space Object Density Distributions

Authors: Victor Rodriguez-Fernandez, Sumiyajav Sarangerel, Peng Mun Siew, Pablo Machuca, Daniel Jang, Richard Linares

Abstract: With the rapid increase in the number of Anthropogenic Space Objects (ASOs), Low Earth Orbit (LEO) is facing significant congestion, thereby posing challenges to space operators and risking the viability of the space environment for varied uses. Current models for examining this evolution, while detailed, are computationally demanding. To address these issues, we propose a novel machine learning-b… ▽ More With the rapid increase in the number of Anthropogenic Space Objects (ASOs), Low Earth Orbit (LEO) is facing significant congestion, thereby posing challenges to space operators and risking the viability of the space environment for varied uses. Current models for examining this evolution, while detailed, are computationally demanding. To address these issues, we propose a novel machine learning-based model, as an extension of the MIT Orbital Capacity Tool (MOCAT). This advanced model is designed to accelerate the propagation of ASO density distributions, and it is trained on hundreds of simulations generated by an established and accurate model of the space environment evolution. We study how different deep learning-based solutions can potentially be good candidates for ASO propagation and manage the high-dimensionality of the data. To assess the model's capabilities, we conduct experiments in long term forecasting scenarios (around 100 years), analyze how and why the performance degrades over time, and discuss potential solutions to make this solution better. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 2024 AIAA SciTech Forum, 8-12 January 2024, Orlando, FL, USA

arXiv:2312.06854 [pdf, other]

Self-supervised Machine Learning Based Approach to Orbit Modelling Applied to Space Traffic Management

Authors: Emma Stevenson, Victor Rodriguez-Fernandez, Hodei Urrutxua, Vincent Morand, David Camacho

Abstract: This paper presents a novel methodology for improving the performance of machine learning based space traffic management tasks through the use of a pre-trained orbit model. Taking inspiration from BERT-like self-supervised language models in the field of natural language processing, we introduce ORBERT, and demonstrate the ability of such a model to leverage large quantities of readily available o… ▽ More This paper presents a novel methodology for improving the performance of machine learning based space traffic management tasks through the use of a pre-trained orbit model. Taking inspiration from BERT-like self-supervised language models in the field of natural language processing, we introduce ORBERT, and demonstrate the ability of such a model to leverage large quantities of readily available orbit data to learn meaningful representations that can be used to aid in downstream tasks. As a proof of concept of this approach we consider the task of all vs. all conjunction screening, phrased here as a machine learning time series classification task. We show that leveraging unlabelled orbit data leads to improved performance, and that the proposed approach can be particularly beneficial for tasks where the availability of labelled data is limited. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Presented at the 2021 International Association for the Advancement of Space Safety (IAASS) Conf

Journal ref: Proc. 11th International Association for the Advancement of Space Safety (IAASS 2021) Conf

arXiv:2310.16912 [pdf, other]

Transformer-based Atmospheric Density Forecasting

Authors: Julia Briden, Peng Mun Siew, Victor Rodriguez-Fernandez, Richard Linares

Abstract: As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric den… ▽ More As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric density, deep learning-based forecasting has the ability to capture nonlinearities in data. By learning multiple layer weights from historical atmospheric density data, long-term dependencies in the dataset are captured in the mapping between the current atmospheric density state and control input to the atmospheric density state at the next timestep. This work improves upon previous linear propagation methods for atmospheric density forecasting, by developing a nonlinear transformer-based architecture for atmospheric density forecasting. Empirical NRLMSISE-00 and JB2008, as well as physics-based TIEGCM atmospheric density models are compared for forecasting with DMDc and with the transformer-based propagator. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Conference: 24th Advanced Maui Optical and Space Surveillance Technologies At: Maui, Hawaii, United States

arXiv:2302.03858 [pdf, other]

doi 10.1016/j.knosys.2023.110793

DeepVATS: Deep Visual Analytics for Time Series

Authors: Victor Rodriguez-Fernandez, David Montalvo, Francesco Piccialli, Grzegorz J. Nalepa, David Camacho

Abstract: The field of Deep Visual Analytics (DVA) has recently arisen from the idea of developing Visual Interactive Systems supported by deep learning, in order to provide them with large-scale data processing capabilities and to unify their implementation across different data and domains. In this paper we present DeepVATS, an open-source tool that brings the field of DVA into time series data. DeepVATS… ▽ More The field of Deep Visual Analytics (DVA) has recently arisen from the idea of developing Visual Interactive Systems supported by deep learning, in order to provide them with large-scale data processing capabilities and to unify their implementation across different data and domains. In this paper we present DeepVATS, an open-source tool that brings the field of DVA into time series data. DeepVATS trains, in a self-supervised way, a masked time series autoencoder that reconstructs patches of a time series, and projects the knowledge contained in the embeddings of that model in an interactive plot, from which time series patterns and anomalies emerge and can be easily spotted. The tool includes a back-end for data processing pipeline and model training, as well as a front-end with a interactive user interface. We report on results that validate the utility of DeepVATS, running experiments on both synthetic and real datasets. The code is publicly available on https://github.com/vrodriguezf/deepvats △ Less

Submitted 19 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: Submitted to Elsevier's Knowledge Based Systems journal. Code available at https://github.com/vrodriguezf/deepvats

Journal ref: Knowledge-Based Systems, 277, 2023, p.110793

arXiv:2112.00599 [pdf, other]

doi 10.1007/978-3-030-91608-4_41

An implementation of the "Guess who?" game using CLIP

Authors: Arnau Martí Sarri, Victor Rodriguez-Fernandez

Abstract: CLIP (Contrastive Language-Image Pretraining) is an efficient method for learning computer vision tasks from natural language supervision that has powered a recent breakthrough in deep learning due to its zero-shot transfer capabilities. By training from image-text pairs available on the internet, the CLIP model transfers non-trivially to most tasks without the need for any data set specific train… ▽ More CLIP (Contrastive Language-Image Pretraining) is an efficient method for learning computer vision tasks from natural language supervision that has powered a recent breakthrough in deep learning due to its zero-shot transfer capabilities. By training from image-text pairs available on the internet, the CLIP model transfers non-trivially to most tasks without the need for any data set specific training. In this work, we use CLIP to implement the engine of the popular game "Guess who?", so that the player interacts with the game using natural language prompts and CLIP automatically decides whether an image in the game board fulfills that prompt or not. We study the performance of this approach by benchmarking on different ways of prompting the questions to CLIP, and show the limitations of its zero-shot capabilites. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: Code available at https://github.com/ArnauDIMAI/CLIP-GuessWho

Journal ref: Intelligent Data Engineering and Automated Learning (IDEAL 2021). Lecture Notes in Computer Science, vol 13113

Showing 1–16 of 16 results for author: Rodriguez-Fernandez, V