-
i-QLS: Quantum-supported Algorithm for Least Squares Optimization in Non-Linear Regression
Authors:
Supreeth Mysore Venkatesh,
Antonio Macaluso,
Diego Arenas,
Matthias Klusch,
Andreas Dengel
Abstract:
We propose an iterative quantum-assisted least squares (i-QLS) optimization method that leverages quantum annealing to overcome the scalability and precision limitations of prior quantum least squares approaches. Unlike traditional QUBO-based formulations, which suffer from a qubit overhead due to fixed discretization, our approach refines the solution space iteratively, enabling exponential conve…
▽ More
We propose an iterative quantum-assisted least squares (i-QLS) optimization method that leverages quantum annealing to overcome the scalability and precision limitations of prior quantum least squares approaches. Unlike traditional QUBO-based formulations, which suffer from a qubit overhead due to fixed discretization, our approach refines the solution space iteratively, enabling exponential convergence while maintaining a constant qubit requirement per iteration. This iterative refinement transforms the problem into an anytime algorithm, allowing for flexible computational trade-offs. Furthermore, we extend our framework beyond linear regression to non-linear function approximation via spline-based modeling, demonstrating its adaptability to complex regression tasks. We empirically validate i-QLS on the D-Wave quantum annealer, showing that our method efficiently scales to high-dimensional problems, achieving competitive accuracy with classical solvers while outperforming prior quantum approaches. Experiments confirm that i-QLS enables near-term quantum hardware to perform regression tasks with improved precision and scalability, paving the way for practical quantum-assisted machine learning applications.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
On What Depends the Robustness of Multi-source Models to Missing Data in Earth Observation?
Authors:
Francisco Mena,
Diego Arenas,
Miro Miranda,
Andreas Dengel
Abstract:
In recent years, the development of robust multi-source models has emerged in the Earth Observation (EO) field. These are models that leverage data from diverse sources to improve predictive accuracy when there is missing data. Despite these advancements, the factors influencing the varying effectiveness of such models remain poorly understood. In this study, we evaluate the predictive performance…
▽ More
In recent years, the development of robust multi-source models has emerged in the Earth Observation (EO) field. These are models that leverage data from diverse sources to improve predictive accuracy when there is missing data. Despite these advancements, the factors influencing the varying effectiveness of such models remain poorly understood. In this study, we evaluate the predictive performance of six state-of-the-art multi-source models in predicting scenarios where either a single data source is missing or only a single source is available. Our analysis reveals that the efficacy of these models is intricately tied to the nature of the task, the complementarity among data sources, and the model design. Surprisingly, we observe instances where the removal of certain data sources leads to improved predictive performance, challenging the assumption that incorporating all available data is always beneficial. These findings prompt critical reflections on model complexity and the necessity of all collected data sources, potentially shaping the way for more streamlined approaches in EO applications.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Missing Data as Augmentation in the Earth Observation Domain: A Multi-View Learning Approach
Authors:
Francisco Mena,
Diego Arenas,
Andreas Dengel
Abstract:
Multi-view learning (MVL) leverages multiple sources or views of data to enhance machine learning model performance and robustness. This approach has been successfully used in the Earth Observation (EO) domain, where views have a heterogeneous nature and can be affected by missing data. Despite the negative effect that missing data has on model predictions, the ML literature has used it as an augm…
▽ More
Multi-view learning (MVL) leverages multiple sources or views of data to enhance machine learning model performance and robustness. This approach has been successfully used in the Earth Observation (EO) domain, where views have a heterogeneous nature and can be affected by missing data. Despite the negative effect that missing data has on model predictions, the ML literature has used it as an augmentation technique to improve model generalization, like masking the input data. Inspired by this, we introduce novel methods for EO applications tailored to MVL with missing views. Our methods integrate the combination of a set to simulate all combinations of missing views as different training samples. Instead of replacing missing data with a numerical value, we use dynamic merge functions, like average, and more complex ones like Transformer. This allows the MVL model to entirely ignore the missing views, enhancing its predictive robustness. We experiment on four EO datasets with temporal and static views, including state-of-the-art methods from the EO domain. The results indicate that our methods improve model robustness under conditions of moderate missingness, and improve the predictive performance when all views are present. The proposed methods offer a single adaptive solution to operate effectively with any combination of available views.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Increasing the Robustness of Model Predictions to Missing Sensors in Earth Observation
Authors:
Francisco Mena,
Diego Arenas,
Andreas Dengel
Abstract:
Multi-sensor ML models for EO aim to enhance prediction accuracy by integrating data from various sources. However, the presence of missing data poses a significant challenge, particularly in non-persistent sensors that can be affected by external factors. Existing literature has explored strategies like temporal dropout and sensor-invariant models to address the generalization to missing data iss…
▽ More
Multi-sensor ML models for EO aim to enhance prediction accuracy by integrating data from various sources. However, the presence of missing data poses a significant challenge, particularly in non-persistent sensors that can be affected by external factors. Existing literature has explored strategies like temporal dropout and sensor-invariant models to address the generalization to missing data issues. Inspired by these works, we study two novel methods tailored for multi-sensor scenarios, namely Input Sensor Dropout (ISensD) and Ensemble Sensor Invariant (ESensI). Through experimentation on three multi-sensor temporal EO datasets, we demonstrate that these methods effectively increase the robustness of model predictions to missing sensors. Particularly, we focus on how the predictive performance of models drops when sensors are missing at different levels. We observe that ensemble multi-sensor models are the most robust to the lack of sensors. In addition, the sensor dropout component in ISensD shows promising robustness results.
△ Less
Submitted 4 September, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
In the Search for Optimal Multi-view Learning Models for Crop Classification with Global Remote Sensing Data
Authors:
Francisco Mena,
Diego Arenas,
Andreas Dengel
Abstract:
Studying and analyzing cropland is a difficult task due to its dynamic and heterogeneous growth behavior. Usually, diverse data sources can be collected for its estimation. Although deep learning models have proven to excel in the crop classification task, they face substantial challenges when dealing with multiple inputs, named Multi-View Learning (MVL). The methods used in the MVL scenario can b…
▽ More
Studying and analyzing cropland is a difficult task due to its dynamic and heterogeneous growth behavior. Usually, diverse data sources can be collected for its estimation. Although deep learning models have proven to excel in the crop classification task, they face substantial challenges when dealing with multiple inputs, named Multi-View Learning (MVL). The methods used in the MVL scenario can be structured based on the encoder architecture, the fusion strategy, and the optimization technique. The literature has primarily focused on using specific encoder architectures for local regions, lacking a deeper exploration of other components in the MVL methodology. In contrast, we investigate the simultaneous selection of the fusion strategy and encoder architecture, assessing global-scale cropland and crop-type classifications. We use a range of five fusion strategies (Input, Feature, Decision, Ensemble, Hybrid) and five temporal encoders (LSTM, GRU, TempCNN, TAE, L-TAE) as possible configurations in the MVL method. We use the CropHarvest dataset for validation, which provides optical, radar, weather time series, and topographic information as input data. We found that in scenarios with a limited number of labeled samples, a unique configuration is insufficient for all the cases. Instead, a specialized combination should be meticulously sought, including an encoder and fusion strategy. To streamline this search process, we suggest identifying the optimal encoder architecture tailored for a particular fusion strategy, and then determining the most suitable fusion strategy for the classification task. We provide a methodological framework for researchers exploring crop classification through an MVL methodology.
△ Less
Submitted 4 September, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications
Authors:
Francisco Mena,
Diego Arenas,
Marcela Charfuelan,
Marlon Nuske,
Andreas Dengel
Abstract:
Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal a…
▽ More
Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks. We compare the predictive quality of different methods and find that some are naturally more robust to missing data. The Ensemble strategy, in particular, achieves a prediction robustness up to 100%. We evidence that missing scenarios are significantly more challenging in regression than classification tasks. Finally, we find that the optical view is the most critical view when it is missing individually.
△ Less
Submitted 13 May, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction
Authors:
Francisco Mena,
Deepak Pathak,
Hiba Najjar,
Cristhian Sanchez,
Patrick Helber,
Benjamin Bischke,
Peter Habelitz,
Miro Miranda,
Jayanth Siddamsetty,
Marlon Nuske,
Marcela Charfuelan,
Diego Arenas,
Michaela Vollmer,
Andreas Dengel
Abstract:
Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the p…
▽ More
Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level
Authors:
Deepak Pathak,
Miro Miranda,
Francisco Mena,
Cristhian Sanchez,
Patrick Helber,
Benjamin Bischke,
Peter Habelitz,
Hiba Najjar,
Jayanth Siddamsetty,
Diego Arenas,
Michaela Vollmer,
Marcela Charfuelan,
Marlon Nuske,
Andreas Dengel
Abstract:
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other co…
▽ More
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
A Comparative Assessment of Multi-view fusion learning for Crop Classification
Authors:
Francisco Mena,
Diego Arenas,
Marlon Nuske,
Andreas Dengel
Abstract:
With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditi…
▽ More
With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Common Practices and Taxonomy in Deep Multi-view Fusion for Remote Sensing Applications
Authors:
Francisco Mena,
Diego Arenas,
Marlon Nuske,
Andreas Dengel
Abstract:
The advances in remote sensing technologies have boosted applications for Earth observation. These technologies provide multiple observations or views with different levels of information. They might contain static or temporary views with different levels of resolution, in addition to having different types and amounts of noise due to sensor calibration or deterioration. A great variety of deep le…
▽ More
The advances in remote sensing technologies have boosted applications for Earth observation. These technologies provide multiple observations or views with different levels of information. They might contain static or temporary views with different levels of resolution, in addition to having different types and amounts of noise due to sensor calibration or deterioration. A great variety of deep learning models have been applied to fuse the information from these multiple views, known as deep multi-view or multi-modal fusion learning. However, the approaches in the literature vary greatly since different terminology is used to refer to similar concepts or different illustrations are given to similar techniques. This article gathers works on multi-view fusion for Earth observation by focusing on the common practices and approaches used in the literature. We summarize and structure insights from several different publications concentrating on unifying points and ideas. In this manuscript, we provide a harmonized terminology while at the same time mentioning the various alternative terms that are used in literature. The topics covered by the works reviewed focus on supervised learning with the use of neural network models. We hope this review, with a long list of recent references, can support future research and lead to a unified advance in the area.
△ Less
Submitted 20 December, 2022;
originally announced January 2023.
-
MLJ: A Julia package for composable machine learning
Authors:
Anthony D. Blaom,
Franz Kiraly,
Thibaut Lienart,
Yiannis Simillides,
Diego Arenas,
Sebastian J. Vollmer
Abstract:
MLJ (Machine Learing in Julia) is an open source software package providing a common interface for interacting with machine learning models written in Julia and other languages. It provides tools and meta-algorithms for selecting, tuning, evaluating, composing and comparing those models, with a focus on flexible model composition. In this design overview we detail chief novelties of the framework,…
▽ More
MLJ (Machine Learing in Julia) is an open source software package providing a common interface for interacting with machine learning models written in Julia and other languages. It provides tools and meta-algorithms for selecting, tuning, evaluating, composing and comparing those models, with a focus on flexible model composition. In this design overview we detail chief novelties of the framework, together with the clear benefits of Julia over the dominant multi-language alternatives.
△ Less
Submitted 3 November, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Design choices for productive, secure, data-intensive research at scale in the cloud
Authors:
Diego Arenas,
Jon Atkins,
Claire Austin,
David Beavan,
Alvaro Cabrejas Egea,
Steven Carlysle-Davies,
Ian Carter,
Rob Clarke,
James Cunningham,
Tom Doel,
Oliver Forrest,
Evelina Gabasova,
James Geddes,
James Hetherington,
Radka Jersakova,
Franz Kiraly,
Catherine Lawrence,
Jules Manser,
Martin T. O'Reilly,
James Robinson,
Helen Sherwood-Taylor,
Serena Tierney,
Catalina A. Vallejos,
Sebastian Vollmer,
Kirstie Whitaker
Abstract:
We present a policy and process framework for secure environments for productive data science research projects at scale, by combining prevailing data security threat and risk profiles into five sensitivity tiers, and, at each tier, specifying recommended policies for data classification, data ingress, software ingress, data egress, user access, user device control, and analysis environments. By p…
▽ More
We present a policy and process framework for secure environments for productive data science research projects at scale, by combining prevailing data security threat and risk profiles into five sensitivity tiers, and, at each tier, specifying recommended policies for data classification, data ingress, software ingress, data egress, user access, user device control, and analysis environments. By presenting design patterns for security choices for each tier, and using software defined infrastructure so that a different, independent, secure research environment can be instantiated for each project appropriate to its classification, we hope to maximise researcher productivity and minimise risk, allowing research organisations to operate with confidence.
△ Less
Submitted 15 September, 2019; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Solving the Periodic Timetabling Problem using a Genetic Algorithm
Authors:
Diego Arenas,
Remy Chevirer,
Said Hanafi,
Joaquin Rodriguez
Abstract:
In railway operations, a timetable is established to determine the departure and arrival times for the trains or other rolling stock at the different stations or relevant points inside the rail network or a subset of this network. The elaboration of this timetable is done to respond to the commercial requirements for both passenger and freight traffic, but also it must respect a set of security an…
▽ More
In railway operations, a timetable is established to determine the departure and arrival times for the trains or other rolling stock at the different stations or relevant points inside the rail network or a subset of this network. The elaboration of this timetable is done to respond to the commercial requirements for both passenger and freight traffic, but also it must respect a set of security and capacity constraints associated with the railway network, rolling stock and legislation. Combining these requirements and constraints, as well as the important number of trains and schedules to plan, makes the preparation of a feasible timetable a complex and time-consuming process, that normally takes several months to be completed. This article addresses the problem of generating periodic timetables, which means that the involved trains operate in a recurrent pattern. For instance, the trains belonging to the same train line, depart from some station every 15 minutes or one hour. To tackle the problem, we present a constraint-based model suitable for this kind of problem. Then, we propose a genetic algorithm, allowing a rapid generation of feasible periodic timetables. Finally, two case studies are presented, the first, describing a sub-set of the Netherlands rail network, and the second a large portion of the Nord-pas-de-Calais regional rail network, both of them are then solved using our algorithm and the results are presented and discussed.
△ Less
Submitted 24 November, 2014;
originally announced November 2014.