-
Graph-Radiomic Learning (GrRAiL) Descriptor to Characterize Imaging Heterogeneity in Confounding Tumor Pathologies
Authors:
Dheerendranath Battalapalli,
Apoorva Safai,
Maria Jaramillo,
Hyemin Um,
Gustavo Adalfo Pineda Ortiz,
Ulas Bagci,
Manmeet Singh Ahluwalia,
Marwa Ismail,
Pallavi Tiwari
Abstract:
A significant challenge in solid tumors is reliably distinguishing confounding pathologies from malignant neoplasms on routine imaging. While radiomics methods seek surrogate markers of lesion heterogeneity on CT/MRI, many aggregate features across the region of interest (ROI) and miss complex spatial relationships among varying intensity compositions. We present a new Graph-Radiomic Learning (GrR…
▽ More
A significant challenge in solid tumors is reliably distinguishing confounding pathologies from malignant neoplasms on routine imaging. While radiomics methods seek surrogate markers of lesion heterogeneity on CT/MRI, many aggregate features across the region of interest (ROI) and miss complex spatial relationships among varying intensity compositions. We present a new Graph-Radiomic Learning (GrRAiL) descriptor for characterizing intralesional heterogeneity (ILH) on clinical MRI scans. GrRAiL (1) identifies clusters of sub-regions using per-voxel radiomic measurements, then (2) computes graph-theoretic metrics to quantify spatial associations among clusters. The resulting weighted graphs encode higher-order spatial relationships within the ROI, aiming to reliably capture ILH and disambiguate confounding pathologies from malignancy. To assess efficacy and clinical feasibility, GrRAiL was evaluated in n=947 subjects spanning three use cases: differentiating tumor recurrence from radiation effects in glioblastoma (GBM; n=106) and brain metastasis (n=233), and stratifying pancreatic intraductal papillary mucinous neoplasms (IPMNs) into no+low vs high risk (n=608). In a multi-institutional setting, GrRAiL consistently outperformed state-of-the-art baselines - Graph Neural Networks (GNNs), textural radiomics, and intensity-graph analysis. In GBM, cross-validation (CV) and test accuracies for recurrence vs pseudo-progression were 89% and 78% with >10% test-accuracy gains over comparators. In brain metastasis, CV and test accuracies for recurrence vs radiation necrosis were 84% and 74% (>13% improvement). For IPMN risk stratification, CV and test accuracies were 84% and 75%, showing >10% improvement.
△ Less
Submitted 23 September, 2025;
originally announced September 2025.
-
SKYLINK: Scalable and Resilient Link Management in LEO Satellite Network
Authors:
Wanja de Sombre,
Arash Asadi,
Debopam Bhattacherjee,
Deepak Vasisht,
Andrea Ortiz
Abstract:
The rapid growth of space-based services has established LEO satellite networks as a promising option for global broadband connectivity. Next-generation LEO networks leverage inter-satellite links (ISLs) to provide faster and more reliable communications compared to traditional bent-pipe architectures, even in remote regions. However, the high mobility of satellites, dynamic traffic patterns, and…
▽ More
The rapid growth of space-based services has established LEO satellite networks as a promising option for global broadband connectivity. Next-generation LEO networks leverage inter-satellite links (ISLs) to provide faster and more reliable communications compared to traditional bent-pipe architectures, even in remote regions. However, the high mobility of satellites, dynamic traffic patterns, and potential link failures pose significant challenges for efficient and resilient routing. To address these challenges, we model the LEO satellite network as a time-varying graph comprising a constellation of satellites and ground stations. Our objective is to minimize a weighted sum of average delay and packet drop rate. Each satellite independently decides how to distribute its incoming traffic to neighboring nodes in real time. Given the infeasibility of finding optimal solutions at scale, due to the exponential growth of routing options and uncertainties in link capacities, we propose SKYLINK, a novel fully distributed learning strategy for link management in LEO satellite networks. SKYLINK enables each satellite to adapt to the time-varying network conditions, ensuring real-time responsiveness, scalability to millions of users, and resilience to network failures, while maintaining low communication overhead and computational complexity. To support the evaluation of SKYLINK at global scale, we develop a new simulator for large-scale LEO satellite networks. For 25.4 million users, SKYLINK reduces the weighted sum of average delay and drop rate by 29% compared to the bent-pipe approach, and by 92% compared to Dijkstra. It lowers drop rates by 95% relative to k-shortest paths, 99% relative to Dijkstra, and 74% compared to the bent-pipe baseline, while achieving up to 46% higher throughput. At the same time, SKYLINK maintains constant computational complexity with respect to constellation size.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data
Authors:
Akram Zaytar,
Caleb Robinson,
Girmaw Abebe Tadesse,
Tammy Glazer,
Gilles Hacheme,
Anthony Ortiz,
Rahul M Dodhia,
Juan M Lavista Ferres
Abstract:
Training deep learning models on petabyte-scale Earth observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming GeoTIFF files directly from cloud storage. In this work, we benchmark GeoTIFF loading throughput from both cloud object storage and local SSD, systematically testing different load…
▽ More
Training deep learning models on petabyte-scale Earth observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming GeoTIFF files directly from cloud storage. In this work, we benchmark GeoTIFF loading throughput from both cloud object storage and local SSD, systematically testing different loader configurations and data parameters. We focus on tile-aligned reads and worker thread pools, using Bayesian optimization to find optimal settings for each storage type. Our optimized configurations increase remote data loading throughput by 20x and local throughput by 4x compared to default settings. On three public EO benchmarks, models trained with optimized remote loading achieve the same accuracy as local training within identical time budgets. We improve validation IoU by 6-15% and maintain 85-95% GPU utilization versus 0-30% with standard configurations. Code is publicly available at https://github.com/microsoft/pytorch-cloud-geotiff-optimization
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Core-Set Selection for Data-efficient Land Cover Segmentation
Authors:
Keiller Nogueira,
Akram Zaytar,
Wanli Ma,
Ribana Roscher,
Ronny Hänsch,
Caleb Robinson,
Anthony Ortiz,
Simone Nsutezo,
Rahul Dodhia,
Juan M. Lavista Ferres,
Oktay Karakuş,
Paul L. Rosin
Abstract:
The increasing accessibility of remotely sensed data and the potential of such data to inform large-scale decision-making has driven the development of deep learning models for many Earth Observation tasks. Traditionally, such models must be trained on large datasets. However, the common assumption that broadly larger datasets lead to better outcomes tends to overlook the complexities of the data…
▽ More
The increasing accessibility of remotely sensed data and the potential of such data to inform large-scale decision-making has driven the development of deep learning models for many Earth Observation tasks. Traditionally, such models must be trained on large datasets. However, the common assumption that broadly larger datasets lead to better outcomes tends to overlook the complexities of the data distribution, the potential for introducing biases and noise, and the computational resources required for processing and storing vast datasets. Therefore, effective solutions should consider both the quantity and quality of data. In this paper, we propose six novel core-set selection methods for selecting important subsets of samples from remote sensing image segmentation datasets that rely on imagery only, labels only, and a combination of each. We benchmark these approaches against a random-selection baseline on three commonly used land cover classification datasets: DFC2022, Vaihingen, and Potsdam. In each of the datasets, we demonstrate that training on a subset of samples outperforms the random baseline, and some approaches outperform training on all available data. This result shows the importance and potential of data-centric learning for the remote sensing domain. The code is available at https://github.com/keillernogueira/data-centric-rs-classification/.
△ Less
Submitted 1 August, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations
Authors:
Abdul Mannan Mohammed,
Azhar Ali Mohammad,
Jason A. Ortiz,
Carsten Neumann,
Grace Bochenek,
Dirk Reiners,
Carolina Cruz-Neira
Abstract:
Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) in tasks, missions, and continuous coordinated activities. A major challenge is enabling humans to maintain awareness and control over autonomous assets, while also building trust and supporting shared contextual understanding. To address this, we present a…
▽ More
Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) in tasks, missions, and continuous coordinated activities. A major challenge is enabling humans to maintain awareness and control over autonomous assets, while also building trust and supporting shared contextual understanding. To address this, we present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface.
The system applies a metacognitive approach to enable personalized, context-aware responses aligned with the human teammate's expectations. The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review. Our architecture includes speech recognition, context processing, AI-driven dialogue, emotion modeling, lip-syncing, and multimodal feedback. We describe the system design, performance metrics, and future development directions for more adaptive and realistic HAT systems.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
What Can 240,000 New Credit Transactions Tell Us About the Impact of NGEU Funds?
Authors:
Alvaro Ortiz,
Tomasa Rodrigo,
David Sarasa,
Sirenia Vazquez
Abstract:
Using a panel data local projections model and controlling for firm characteristics, procurement bid attributes, and macroeconomic conditions, the study estimates the dynamic effects of procurement awards on new lending, a more precise measure than the change in the stock of credit. The analysis further examines heterogeneity in credit responses based on firm size, industry, credit maturity, and v…
▽ More
Using a panel data local projections model and controlling for firm characteristics, procurement bid attributes, and macroeconomic conditions, the study estimates the dynamic effects of procurement awards on new lending, a more precise measure than the change in the stock of credit. The analysis further examines heterogeneity in credit responses based on firm size, industry, credit maturity, and value chain position of the firms. The empirical evidence confirms that public procurement awards significantly increase new lending, with NGEU-funded contracts generating stronger credit expansion than traditional procurement during the recent period. The results show that the impact of NGEU procurement programs aligns closely with historical procurement impacts, with differences driven mainly by lower utilization rates. Moreover, integrating high-frequency financial data with procurement records highlights the potential of Big Data in refining public policy design.
△ Less
Submitted 4 April, 2025; v1 submitted 16 March, 2025;
originally announced April 2025.
-
Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery
Authors:
Caleb Robinson,
Anthony Ortiz,
Allen Kim,
Rahul Dodhia,
Andrew Zolli,
Shivaprakash K Nagaraju,
James Oakleaf,
Joe Kiesecker,
Juan M. Lavista Ferres
Abstract:
We present a comprehensive global temporal dataset of commercial solar photovoltaic (PV) farms and onshore wind turbines, derived from high-resolution satellite imagery analyzed quarterly from the fourth quarter of 2017 to the second quarter of 2024. We create this dataset by training deep learning-based segmentation models to identify these renewable energy installations from satellite imagery, t…
▽ More
We present a comprehensive global temporal dataset of commercial solar photovoltaic (PV) farms and onshore wind turbines, derived from high-resolution satellite imagery analyzed quarterly from the fourth quarter of 2017 to the second quarter of 2024. We create this dataset by training deep learning-based segmentation models to identify these renewable energy installations from satellite imagery, then deploy them on over 13 trillion pixels covering the world. For each detected feature, we estimate the construction date and the preceding land use type. This dataset offers crucial insights into progress toward sustainable development goals and serves as a valuable resource for policymakers, researchers, and stakeholders aiming to assess and promote effective strategies for renewable energy deployment. Our final spatial dataset includes 375,197 individual wind turbines and 86,410 solar PV installations. We aggregate our predictions to the country level -- estimating total power capacity based on construction date, solar PV area, and number of windmills -- and find an $r^2$ value of $0.96$ and $0.93$ for solar PV and onshore wind respectively compared to IRENA's most recent 2023 country-level capacity estimates.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing
Authors:
Isaac Corley,
Simone Fobi Nsutezo,
Anthony Ortiz,
Caleb Robinson,
Rahul Dodhia,
Juan M. Lavista Ferres,
Peyman Najafirad
Abstract:
Remote sensing imagery is dense with objects and contextual visual information. There is a recent trend to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, vision-only downstream performance tends to degrade compar…
▽ More
Remote sensing imagery is dense with objects and contextual visual information. There is a recent trend to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, vision-only downstream performance tends to degrade compared to image-only pretraining, such as MAE. In this paper, we propose FLAVARS, a pretraining method that combines the best of both contrastive learning and masked modeling, along with geospatial alignment via contrastive location encoding. We find that FLAVARS significantly outperforms a baseline of SkyCLIP for vision-only tasks such as KNN classification and semantic segmentation, +6\% mIOU on SpaceNet1, while retaining the ability to perform zero-shot classification, unlike MAE pretrained methods.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
PGRID: Power Grid Reconstruction in Informal Developments Using High-Resolution Aerial Imagery
Authors:
Simone Fobi Nsutezo,
Amrita Gupta,
Duncan Kebut,
Seema Iyer,
Luana Marotti,
Rahul Dodhia,
Juan M. Lavista Ferres,
Anthony Ortiz
Abstract:
As of 2023, a record 117 million people have been displaced worldwide, more than double the number from a decade ago [22]. Of these, 32 million are refugees under the UNHCR mandate, with 8.7 million residing in refugee camps. A critical issue faced by these populations is the lack of access to electricity, with 80% of the 8.7 million refugees and displaced persons in camps globally relying on trad…
▽ More
As of 2023, a record 117 million people have been displaced worldwide, more than double the number from a decade ago [22]. Of these, 32 million are refugees under the UNHCR mandate, with 8.7 million residing in refugee camps. A critical issue faced by these populations is the lack of access to electricity, with 80% of the 8.7 million refugees and displaced persons in camps globally relying on traditional biomass for cooking and lacking reliable power for essential tasks such as cooking and charging phones. Often, the burden of collecting firewood falls on women and children, who frequently travel up to 20 kilometers into dangerous areas, increasing their vulnerability.[7]
Electricity access could significantly alleviate these challenges, but a major obstacle is the lack of accurate power grid infrastructure maps, particularly in resource-constrained environments like refugee camps, needed for energy access planning. Existing power grid maps are often outdated, incomplete, or dependent on costly, complex technologies, limiting their practicality. To address this issue, PGRID is a novel application-based approach, which utilizes high-resolution aerial imagery to detect electrical poles and segment electrical lines, creating precise power grid maps. PGRID was tested in the Turkana region of Kenya, specifically the Kakuma and Kalobeyei Camps, covering 84 km2 and housing over 200,000 residents.
Our findings show that PGRID delivers high-fidelity power grid maps especially in unplanned settlements, with F1-scores of 0.71 and 0.82 for pole detection and line segmentation, respectively. This study highlights a practical application for leveraging open data and limited labels to improve power grid mapping in unplanned settlements, where the growing number of displaced persons urgently need sustainable energy infrastructure solutions.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Residual-based Attention Physics-informed Neural Networks for Spatio-Temporal Ageing Assessment of Transformers Operated in Renewable Power Plants
Authors:
Ibai Ramirez,
Joel Pino,
David Pardo,
Mikel Sanz,
Luis del Rio,
Alvaro Ortiz,
Kateryna Morozovska,
Jose I. Aizpurua
Abstract:
Transformers are crucial for reliable and efficient power system operations, particularly in supporting the integration of renewable energy. Effective monitoring of transformer health is critical to maintain grid stability and performance. Thermal insulation ageing is a key transformer failure mode, which is generally tracked by monitoring the hotspot temperature (HST). However, HST measurement is…
▽ More
Transformers are crucial for reliable and efficient power system operations, particularly in supporting the integration of renewable energy. Effective monitoring of transformer health is critical to maintain grid stability and performance. Thermal insulation ageing is a key transformer failure mode, which is generally tracked by monitoring the hotspot temperature (HST). However, HST measurement is complex, costly, and often estimated from indirect measurements. Existing HST models focus on space-agnostic thermal models, providing worst-case HST estimates. This article introduces a spatio-temporal model for transformer winding temperature and ageing estimation, which leverages physics-based partial differential equations (PDEs) with data-driven Neural Networks (NN) in a Physics Informed Neural Networks (PINNs) configuration to improve prediction accuracy and acquire spatio-temporal resolution. The computational accuracy of the PINN model is improved through the implementation of the Residual-Based Attention (PINN-RBA) scheme that accelerates the PINN model convergence. The PINN-RBA model is benchmarked against self-adaptive attention schemes and classical vanilla PINN configurations. For the first time, PINN based oil temperature predictions are used to estimate spatio-temporal transformer winding temperature values, validated through PDE numerical solution and fiber optic sensor measurements. Furthermore, the spatio-temporal transformer ageing model is inferred, which supports transformer health management decision-making. Results are validated with a distribution transformer operating on a floating photovoltaic power plant.
△ Less
Submitted 3 October, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
ConvDTW-ACS: Audio Segmentation for Track Type Detection During Car Manufacturing
Authors:
Álvaro López-Chilet,
Zhaoyi Liu,
Jon Ander Gómez,
Carlos Alvarez,
Marivi Alonso Ortiz,
Andres Orejuela Mesa,
David Newton,
Friedrich Wolf-Monheim,
Sam Michiels,
Danny Hughes
Abstract:
This paper proposes a method for Acoustic Constrained Segmentation (ACS) in audio recordings of vehicles driven through a production test track, delimiting the boundaries of surface types in the track. ACS is a variant of classical acoustic segmentation where the sequence of labels is known, contiguous and invariable, which is especially useful in this work as the test track has a standard configu…
▽ More
This paper proposes a method for Acoustic Constrained Segmentation (ACS) in audio recordings of vehicles driven through a production test track, delimiting the boundaries of surface types in the track. ACS is a variant of classical acoustic segmentation where the sequence of labels is known, contiguous and invariable, which is especially useful in this work as the test track has a standard configuration of surface types. The proposed ConvDTW-ACS method utilizes a Convolutional Neural Network for classifying overlapping image chunks extracted from the full audio spectrogram. Then, our custom Dynamic Time Warping algorithm aligns the sequence of predicted probabilities to the sequence of surface types in the track, from which timestamps of the surface type boundaries can be extracted. The method was evaluated on a real-world dataset collected from the Ford Manufacturing Plant in Valencia (Spain), achieving a mean error of 166 milliseconds when delimiting, within the audio, the boundaries of the surfaces in the track. The results demonstrate the effectiveness of the proposed method in accurately segmenting different surface types, which could enable the development of more specialized AI systems to improve the quality inspection process.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
A Change Detection Reality Check
Authors:
Isaac Corley,
Caleb Robinson,
Anthony Ortiz
Abstract:
In recent years, there has been an explosion of proposed change detection deep learning architectures in the remote sensing literature. These approaches claim to offer state-of-the-art performance on different standard benchmark datasets. However, has the field truly made significant progress? In this paper we perform experiments which conclude a simple U-Net segmentation baseline without training…
▽ More
In recent years, there has been an explosion of proposed change detection deep learning architectures in the remote sensing literature. These approaches claim to offer state-of-the-art performance on different standard benchmark datasets. However, has the field truly made significant progress? In this paper we perform experiments which conclude a simple U-Net segmentation baseline without training tricks or complicated architectural changes is still a top performer for the task of change detection.
△ Less
Submitted 12 April, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Is K-fold cross validation the best model selection method for Machine Learning?
Authors:
Juan M Gorriz,
R. Martin Clemente,
F Segovia,
J Ramirez,
A Ortiz,
J. Suckling
Abstract:
As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance, and it frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine…
▽ More
As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance, and it frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e., folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems of partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. A novel statistical test based on K-fold CV and the Upper Bound of the actual risk (K-fold CUBV) is proposed, where uncertain predictions of machine learning with CV are bounded by the worst case through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV are derived and used to estimate the actual risk. The performance with simulated and neuroimaging datasets suggests that K-fold CUBV is a robust criterion for detecting effects and validating accuracy values obtained from machine learning and classical CV schemes, while avoiding excess false positives.
△ Less
Submitted 8 November, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
The Best Time for an Update: Risk-Sensitive Minimization of Age-Based Metrics
Authors:
Wanja de Sombre,
Andrea Ortiz,
Frank Aurzada,
Anja Klein
Abstract:
Popular methods to quantify transmitted data quality are the Age of Information (AoI), the Query Age of Information (QAoI), and the Age of Incorrect Information (AoII). We consider these metrics in a point-to-point wireless communication system, where the transmitter monitors a process and sends status updates to a receiver. The challenge is to decide on the best time for an update, balancing the…
▽ More
Popular methods to quantify transmitted data quality are the Age of Information (AoI), the Query Age of Information (QAoI), and the Age of Incorrect Information (AoII). We consider these metrics in a point-to-point wireless communication system, where the transmitter monitors a process and sends status updates to a receiver. The challenge is to decide on the best time for an update, balancing the transmission energy and the age-based metric at the receiver. Due to the inherent risk of high age-based metric values causing complications such as unstable system states, we introduce the new concept of risky states to denote states with high age-based metric. We use this new notion of risky states to quantify and minimize this risk of experiencing high age-based metrics by directly deriving the frequency of risky states as a novel risk-metric. Building on this foundation, we introduce two risk-sensitive strategies for AoI, QAoI and AoII. The first strategy uses system knowledge, i.e., channel quality and packet arrival probability, to find an optimal strategy that transmits when the age-based metric exceeds a tunable threshold. A lower threshold leads to higher risk-sensitivity. The second strategy uses an enhanced Q-learning approach and balances the age-based metric, the transmission energy and the frequency of risky states without requiring knowledge about the system. Numerical results affirm our risk-sensitive strategies' high effectiveness.
△ Less
Submitted 17 March, 2025; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery
Authors:
Caleb Robinson,
Isaac Corley,
Anthony Ortiz,
Rahul Dodhia,
Juan M. Lavista Ferres,
Peyman Najafirad
Abstract:
Fully understanding a complex high-resolution satellite or aerial imagery scene often requires spatial reasoning over a broad relevant context. The human object recognition system is able to understand object in a scene over a long-range relevant context. For example, if a human observes an aerial scene that shows sections of road broken up by tree canopy, then they will be unlikely to conclude th…
▽ More
Fully understanding a complex high-resolution satellite or aerial imagery scene often requires spatial reasoning over a broad relevant context. The human object recognition system is able to understand object in a scene over a long-range relevant context. For example, if a human observes an aerial scene that shows sections of road broken up by tree canopy, then they will be unlikely to conclude that the road has actually been broken up into disjoint pieces by trees and instead think that the canopy of nearby trees is occluding the road. However, there is limited research being conducted to understand long-range context understanding of modern machine learning models. In this work we propose a road segmentation benchmark dataset, Chesapeake Roads Spatial Context (RSC), for evaluating the spatial long-range context understanding of geospatial machine learning models and show how commonly used semantic segmentation models can fail at this task. For example, we show that a U-Net trained to segment roads from background in aerial imagery achieves an 84% recall on unoccluded roads, but just 63.5% recall on roads covered by tree canopy despite being trained to model both the same way. We further analyze how the performance of models changes as the relevant context for a decision (unoccluded roads in our case) varies in distance. We release the code to reproduce our experiments and dataset of imagery and masks to encourage future research in this direction -- https://github.com/isaaccorley/ChesapeakeRSC.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia
Authors:
Francisco Jesus Martinez-Murcia,
Andrés Ortiz,
Juan Manuel Górriz,
Javier Ramírez,
Pedro Javier Lopez-Perez,
Miguel López-Zamora,
Juan Luis Luque
Abstract:
The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12…
▽ More
The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12-40 Hz) rates, aimed at detecting differences in perception of oscillatory sampling that could be associated with dyslexia. The purpose of this work is to check whether these differences exist and how they are related to children's performance in different language and cognitive tasks commonly used to detect dyslexia. To this purpose, temporal and spectral inter-channel EEG connectivity was estimated, and a denoising autoencoder (DAE) was trained to learn a low-dimensional representation of the connectivity matrices. This representation was studied via correlation and classification analysis, which revealed ability in detecting dyslexic subjects with an accuracy higher than 0.8, and balanced accuracy around 0.7. Some features of the DAE representation were significantly correlated ($p<0.005$) with children's performance in language and cognitive tasks of the phonological hypothesis category such as phonological awareness and rapid symbolic naming, as well as reading efficiency and reading comprehension. Finally, a deeper analysis of the adjacency matrix revealed a reduced bilateral connection between electrodes of the temporal lobe (roughly the primary auditory cortex) in DD subjects, as well as an increased connectivity of the F7 electrode, placed roughly on Broca's area. These results pave the way for a complementary assessment of dyslexia using more objective methodologies such as EEG.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Convolutional Neural Networks for Neuroimaging in Parkinson's Disease: Is Preprocessing Needed?
Authors:
Francisco J. Martinez-Murcia,
Juan M. Górriz,
Javier Ramírez,
Andrés Ortiz
Abstract:
Spatial and intensity normalization are nowadays a prerequisite for neuroimaging analysis. Influenced by voxel-wise and other univariate comparisons, where these corrections are key, they are commonly applied to any type of analysis and imaging modalities. Nuclear imaging modalities such as PET-FDG or FP-CIT SPECT, a common modality used in Parkinson's Disease diagnosis, are especially dependent o…
▽ More
Spatial and intensity normalization are nowadays a prerequisite for neuroimaging analysis. Influenced by voxel-wise and other univariate comparisons, where these corrections are key, they are commonly applied to any type of analysis and imaging modalities. Nuclear imaging modalities such as PET-FDG or FP-CIT SPECT, a common modality used in Parkinson's Disease diagnosis, are especially dependent on intensity normalization. However, these steps are computationally expensive and furthermore, they may introduce deformations in the images, altering the information contained in them. Convolutional Neural Networks (CNNs), for their part, introduce position invariance to pattern recognition, and have been proven to classify objects regardless of their orientation, size, angle, etc. Therefore, a question arises: how well can CNNs account for spatial and intensity differences when analysing nuclear brain imaging? Are spatial and intensity normalization still needed? To answer this question, we have trained four different CNN models based on well-established architectures, using or not different spatial and intensity normalization preprocessing. The results show that a sufficiently complex model such as our three-dimensional version of the ALEXNET can effectively account for spatial differences, achieving a diagnosis accuracy of 94.1% with an area under the ROC curve of 0.984. The visualization of the differences via saliency maps shows that these models are correctly finding patterns that match those found in the literature, without the need of applying any complex spatial normalization procedure. However, the intensity normalization -- and its type -- is revealed as very influential in the results and accuracy of the trained model, and therefore must be well accounted.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
APIS: A paired CT-MRI dataset for ischemic stroke segmentation challenge
Authors:
Santiago Gómez,
Daniel Mantilla,
Gustavo Garzón,
Edgar Rangel,
Andrés Ortiz,
Franklin Sierra-Jerez,
Fabio Martínez
Abstract:
Stroke is the second leading cause of mortality worldwide. Immediate attention and diagnosis play a crucial role regarding patient prognosis. The key to diagnosis consists in localizing and delineating brain lesions. Standard stroke examination protocols include the initial evaluation from a non-contrast CT scan to discriminate between hemorrhage and ischemia. However, non-contrast CTs may lack se…
▽ More
Stroke is the second leading cause of mortality worldwide. Immediate attention and diagnosis play a crucial role regarding patient prognosis. The key to diagnosis consists in localizing and delineating brain lesions. Standard stroke examination protocols include the initial evaluation from a non-contrast CT scan to discriminate between hemorrhage and ischemia. However, non-contrast CTs may lack sensitivity in detecting subtle ischemic changes in the acute phase. As a result, complementary diffusion-weighted MRI studies are captured to provide valuable insights, allowing to recover and quantify stroke lesions. This work introduced APIS, the first paired public dataset with NCCT and ADC studies of acute ischemic stroke patients. APIS was presented as a challenge at the 20th IEEE International Symposium on Biomedical Imaging 2023, where researchers were invited to propose new computational strategies that leverage paired data and deal with lesion segmentation over CT sequences. Despite all the teams employing specialized deep learning tools, the results suggest that the ischemic stroke segmentation task from NCCT remains challenging. The annotated dataset remains accessible to the public upon registration, inviting the scientific community to deal with stroke characterization from NCCT but guided with paired DWI information.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing
Authors:
Bernd Simon,
Andrea Ortiz,
Walid Saad,
Anja Klein
Abstract:
The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must ad…
▽ More
The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must address two challenges: the MCSP's and MUs' conflicting goals, and the uncertainty about the MUs' required efforts and preferences. To overcome these challenges a novel decentralized approach combining matching theory and online learning, called collision-avoidance multi-armed bandit with strategic free sensing (CA-MAB-SFS), is proposed. The task assignment problem is modeled as a matching game considering the MCSP's and MUs' individual goals while the MUs learn their efforts online. Our innovative "free-sensing" mechanism significantly improves the MU's learning process while reducing collisions during task allocation. The stable regret of CA-MAB-SFS, i.e., the loss of learning, is analytically shown to be bounded by a sublinear function, ensuring the convergence to a stable optimal solution. Simulation results show that CA-MAB-SFS increases the MUs' and the MCSP's satisfaction compared to state-of-the-art methods while reducing the average task completion time by at least 16%.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
SeBaSi system-level Integrated Access and Backhaul simulator for self-backhauling
Authors:
Amir Ashtari Gargari,
Matteo Pagin,
Andrea Ortiz,
Nairy Moghadas Gholian,
Michele Polese,
Michele Zorzi
Abstract:
millimeter wave (mmWave) and sub-terahertz (THz) communications have the potential of increasing mobile network throughput drastically. However, the challenging propagation conditions experienced at mmWave and beyond frequencies can potentially limit the range of the wireless link down to a few meters, compared to up to kilometers for sub-6GHz links. Thus, increasing the density of base station de…
▽ More
millimeter wave (mmWave) and sub-terahertz (THz) communications have the potential of increasing mobile network throughput drastically. However, the challenging propagation conditions experienced at mmWave and beyond frequencies can potentially limit the range of the wireless link down to a few meters, compared to up to kilometers for sub-6GHz links. Thus, increasing the density of base station deployments is required to achieve sufficient coverage in the Radio Access Network (RAN). To such end, 3rd Generation Partnership Project (3GPP) introduced wireless backhauled base stations with Integrated Access and Backhaul (IAB), a key technology to achieve dense networks while preventing the need for costly fiber deployments. In this paper, we introduce SeBaSi, a system-level simulator for IAB networks, and demonstrate its functionality by simulating IAB deployments in Manhattan, New York City and Padova. Finally, we show how SeBaSi can represent a useful tool for the performance evaluation of self-backhauled cellular networks, thanks to its high level of network abstraction, coupled with its open and customizable design, which allows users to extend it to support novel technologies such as Reconfigurable Intelligent Surfaces (RISs)
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Poverty rate prediction using multi-modal survey and earth observation data
Authors:
Simone Fobi,
Manuel Cardona,
Elliott Collins,
Caleb Robinson,
Anthony Ortiz,
Tina Sederholm,
Rahul Dodhia,
Juan Lavista Ferres
Abstract:
This work presents an approach for combining household demographic and living standards survey questions with features derived from satellite imagery to predict the poverty rate of a region. Our approach utilizes visual features obtained from a single-step featurization method applied to freely available 10m/px Sentinel-2 surface reflectance satellite imagery. These visual features are combined wi…
▽ More
This work presents an approach for combining household demographic and living standards survey questions with features derived from satellite imagery to predict the poverty rate of a region. Our approach utilizes visual features obtained from a single-step featurization method applied to freely available 10m/px Sentinel-2 surface reflectance satellite imagery. These visual features are combined with ten survey questions in a proxy means test (PMT) to estimate whether a household is below the poverty line. We show that the inclusion of visual features reduces the mean error in poverty rate estimates from 4.09% to 3.88% over a nationally representative out-of-sample test set. In addition to including satellite imagery features in proxy means tests, we propose an approach for selecting a subset of survey questions that are complementary to the visual features extracted from satellite imagery. Specifically, we design a survey variable selection approach guided by the full survey and image features and use the approach to determine the most relevant set of small survey questions to include in a PMT. We validate the choice of small survey questions in a downstream task of predicting the poverty rate using the small set of questions. This approach results in the best performance -- errors in poverty rate decrease from 4.09% to 3.71%. We show that extracted visual features encode geographic and urbanization differences between regions.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Rapid building damage assessment workflow: An implementation for the 2023 Rolling Fork, Mississippi tornado event
Authors:
Caleb Robinson,
Simone Fobi Nsutezo,
Anthony Ortiz,
Tina Sederholm,
Rahul Dodhia,
Cameron Birge,
Kasie Richards,
Kris Pitcher,
Paulo Duarte,
Juan M. Lavista Ferres
Abstract:
Rapid and accurate building damage assessments from high-resolution satellite imagery following a natural disaster is essential to inform and optimize first responder efforts. However, performing such building damage assessments in an automated manner is non-trivial due to the challenges posed by variations in disaster-specific damage, diversity in satellite imagery, and the dearth of extensive, l…
▽ More
Rapid and accurate building damage assessments from high-resolution satellite imagery following a natural disaster is essential to inform and optimize first responder efforts. However, performing such building damage assessments in an automated manner is non-trivial due to the challenges posed by variations in disaster-specific damage, diversity in satellite imagery, and the dearth of extensive, labeled datasets. To circumvent these issues, this paper introduces a human-in-the-loop workflow for rapidly training building damage assessment models after a natural disaster. This article details a case study using this workflow, executed in partnership with the American Red Cross during a tornado event in Rolling Fork, Mississippi in March, 2023. The output from our human-in-the-loop modeling process achieved a precision of 0.86 and recall of 0.80 for damaged buildings when compared to ground truth data collected post-disaster. This workflow was implemented end-to-end in under 2 hours per satellite imagery scene, highlighting its potential for real-time deployment.
△ Less
Submitted 24 August, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Mask Conditional Synthetic Satellite Imagery
Authors:
Van Anh Le,
Varshini Reddy,
Zixi Chen,
Mengyuan Li,
Xinran Tang,
Anthony Ortiz,
Simone Fobi Nsutezo,
Caleb Robinson
Abstract:
In this paper we propose a mask-conditional synthetic image generation model for creating synthetic satellite imagery datasets. Given a dataset of real high-resolution images and accompanying land cover masks, we show that it is possible to train an upstream conditional synthetic imagery generator, use that generator to create synthetic imagery with the land cover masks, then train a downstream mo…
▽ More
In this paper we propose a mask-conditional synthetic image generation model for creating synthetic satellite imagery datasets. Given a dataset of real high-resolution images and accompanying land cover masks, we show that it is possible to train an upstream conditional synthetic imagery generator, use that generator to create synthetic imagery with the land cover masks, then train a downstream model on the synthetic imagery and land cover masks that achieves similar test performance to a model that was trained with the real imagery. Further, we find that incorporating a mixture of real and synthetic imagery acts as a data augmentation method, producing better models than using only real imagery (0.5834 vs. 0.5235 mIoU). Finally, we find that encouraging diversity of outputs in the upstream model is a necessary component for improved downstream task performance. We have released code for reproducing our work on GitHub, see https://github.com/ms-synthetic-satellite-image/synthetic-satellite-imagery .
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Safehaul: Risk-Averse Learning for Reliable mmWave Self-Backhauling in 6G Networks
Authors:
Amir Ashtari Gargari,
Andrea Ortiz,
Matteo Pagin,
Anja Klein,
Matthias Hollick,
Michele Zorzi,
Arash Asadi
Abstract:
Wireless backhauling at millimeter-wave frequencies (mmWave) in static scenarios is a well-established practice in cellular networks. However, highly directional and adaptive beamforming in today's mmWave systems have opened new possibilities for self-backhauling. Tapping into this potential, 3GPP has standardized Integrated Access and Backhaul (IAB) allowing the same base station serve both acces…
▽ More
Wireless backhauling at millimeter-wave frequencies (mmWave) in static scenarios is a well-established practice in cellular networks. However, highly directional and adaptive beamforming in today's mmWave systems have opened new possibilities for self-backhauling. Tapping into this potential, 3GPP has standardized Integrated Access and Backhaul (IAB) allowing the same base station serve both access and backhaul traffic. Although much more cost-effective and flexible, resource allocation and path selection in IAB mmWave networks is a formidable task. To date, prior works have addressed this challenge through a plethora of classic optimization and learning methods, generally optimizing a Key Performance Indicator (KPI) such as throughput, latency, and fairness, and little attention has been paid to the reliability of the KPI. We propose Safehaul, a risk-averse learning-based solution for IAB mmWave networks. In addition to optimizing average performance, Safehaul ensures reliability by minimizing the losses in the tail of the performance distribution. We develop a novel simulator and show via extensive simulations that Safehaul not only reduces the latency by up to 43.2% compared to the benchmarks but also exhibits significantly more reliable performance (e.g., 71.4% less variance in achieved latency).
△ Less
Submitted 12 January, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Neural source/sink phase connectivity in developmental dyslexia by means of interchannel causality
Authors:
I. RodrÍguez-RodrÍguez,
A. Ortiz,
N. J. Gallego-Molina,
M. A. Formoso,
W. L. Woo
Abstract:
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners a…
▽ More
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Fast building segmentation from satellite imagery and few local labels
Authors:
Caleb Robinson,
Anthony Ortiz,
Hogeun Park,
Nancy Lozano Gracia,
Jon Kher Kaw,
Tina Sederholm,
Rahul Dodhia,
Juan M. Lavista Ferres
Abstract:
Innovations in computer vision algorithms for satellite image analysis can enable us to explore global challenges such as urbanization and land use change at the planetary level. However, domain shift problems are a common occurrence when trying to replicate models that drive these analyses to new areas, particularly in the developing world. If a model is trained with imagery and labels from one l…
▽ More
Innovations in computer vision algorithms for satellite image analysis can enable us to explore global challenges such as urbanization and land use change at the planetary level. However, domain shift problems are a common occurrence when trying to replicate models that drive these analyses to new areas, particularly in the developing world. If a model is trained with imagery and labels from one location, then it usually will not generalize well to new locations where the content of the imagery and data distributions are different. In this work, we consider the setting in which we have a single large satellite imagery scene over which we want to solve an applied problem -- building footprint segmentation. Here, we do not necessarily need to worry about creating a model that generalizes past the borders of our scene but can instead train a local model. We show that surprisingly few labels are needed to solve the building segmentation problem with very high-resolution (0.5m/px) satellite imagery with this setting in mind. Our best model trained with just 527 sparse polygon annotations (an equivalent of 1500 x 1500 densely labeled pixels) has a recall of 0.87 over held out footprints and a R2 of 0.93 on the task of counting the number of buildings in 200 x 200-meter windows. We apply our models over high-resolution imagery in Amman, Jordan in a case study on urban change detection.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
A hypothesis-driven method based on machine learning for neuroimaging data analysis
Authors:
JM Gorriz,
R. Martin-Clemente,
C. G. Puntonet,
A. Ortiz,
J. Ramirez,
J. Suckling
Abstract:
There remains an open question about the usefulness and the interpretation of Machine learning (MLE) approaches for discrimination of spatial patterns of brain images between samples or activation states. In the last few decades, these approaches have limited their operation to feature extraction and linear classification tasks for between-group inference. In this context, statistical inference is…
▽ More
There remains an open question about the usefulness and the interpretation of Machine learning (MLE) approaches for discrimination of spatial patterns of brain images between samples or activation states. In the last few decades, these approaches have limited their operation to feature extraction and linear classification tasks for between-group inference. In this context, statistical inference is assessed by randomly permuting image labels or by the use of random effect models that consider between-subject variability. These multivariate MLE-based statistical pipelines, whilst potentially more effective for detecting activations than hypotheses-driven methods, have lost their mathematical elegance, ease of interpretation, and spatial localization of the ubiquitous General linear Model (GLM). Recently, the estimation of the conventional GLM has been demonstrated to be connected to an univariate classification task when the design matrix is expressed as a binary indicator matrix. In this paper we explore the complete connection between the univariate GLM and MLE \emph{regressions}. To this purpose we derive a refined statistical test with the GLM based on the parameters obtained by a linear Support Vector Regression (SVR) in the \emph{inverse} problem (SVR-iGLM). Subsequently, random field theory (RFT) is employed for assessing statistical significance following a conventional GLM benchmark. Experimental results demonstrate how parameter estimations derived from each model (mainly GLM and SVR) result in different experimental design estimates that are significantly related to the predefined functional task. Moreover, using real data from a multisite initiative the proposed MLE-based inference demonstrates statistical power and the control of false positives, outperforming the regular GLM.
△ Less
Submitted 17 February, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
An Artificial Intelligence Dataset for Solar Energy Locations in India
Authors:
Anthony Ortiz,
Dhaval Negandhi,
Sagar R Mysorekar,
Joseph Kiesecker,
Shivaprakash K Nagaraju,
Caleb Robinson,
Priyal Bhatia,
Aditi Khurana,
Jane Wang,
Felipe Oviedo,
Juan Lavista Ferres
Abstract:
Rapid development of renewable energy sources, particularly solar photovoltaics (PV), is critical to mitigate climate change. As a result, India has set ambitious goals to install 500 gigawatts of solar energy capacity by 2030. Given the large footprint projected to meet renewables energy targets, the potential for land use conflicts over environmental values is high. To expedite development of so…
▽ More
Rapid development of renewable energy sources, particularly solar photovoltaics (PV), is critical to mitigate climate change. As a result, India has set ambitious goals to install 500 gigawatts of solar energy capacity by 2030. Given the large footprint projected to meet renewables energy targets, the potential for land use conflicts over environmental values is high. To expedite development of solar energy, land use planners will need access to up-to-date and accurate geo-spatial information of PV infrastructure. In this work, we developed a spatially explicit machine learning model to map utility-scale solar projects across India using freely available satellite imagery with a mean accuracy of 92%. Our model predictions were validated by human experts to obtain a dataset of 1363 solar PV farms. Using this dataset, we measure the solar footprint across India and quantified the degree of landcover modification associated with the development of PV infrastructure. Our analysis indicates that over 74% of solar development In India was built on landcover types that have natural ecosystem preservation, or agricultural value.
△ Less
Submitted 30 June, 2022; v1 submitted 31 January, 2022;
originally announced February 2022.
-
TorchGeo: Deep Learning With Geospatial Data
Authors:
Adam J. Stewart,
Caleb Robinson,
Isaac A. Corley,
Anthony Ortiz,
Juan M. Lavista Ferres,
Arindam Banerjee
Abstract:
Remotely sensed geospatial data are critical for applications including precision agriculture, urban planning, disaster monitoring and response, and climate change research, among others. Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery a…
▽ More
Remotely sensed geospatial data are critical for applications including precision agriculture, urban planning, disaster monitoring and response, and climate change research, among others. Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available. However, the variance in data collection methods and handling of geospatial metadata make the application of deep learning methodology to remotely sensed data nontrivial. For example, satellite imagery often includes additional spectral bands beyond red, green, and blue and must be joined to other geospatial data sources that can have differing coordinate systems, bounds, and resolutions. To help realize the potential of deep learning for remote sensing applications, we introduce TorchGeo, a Python library for integrating geospatial data into the PyTorch deep learning ecosystem. TorchGeo provides data loaders for a variety of benchmark datasets, composable datasets for generic geospatial data sources, samplers for geospatial data, and transforms that work with multispectral imagery. TorchGeo is also the first library to provide pre-trained models for multispectral satellite imagery (e.g., models that use all bands from the Sentinel-2 satellites), allowing for advances in transfer learning on downstream remote sensing tasks with limited labeled data. We use TorchGeo to create reproducible benchmark results on existing datasets and benchmark our proposed method for preprocessing geospatial imagery on the fly. TorchGeo is open source and available on GitHub: https://github.com/microsoft/torchgeo.
△ Less
Submitted 17 September, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
MSC-VO: Exploiting Manhattan and Structural Constraints for Visual Odometry
Authors:
Joan P. Company-Corcoles,
Emilio Garcia-Fidalgo,
Alberto Ortiz
Abstract:
Visual odometry algorithms tend to degrade when facing low-textured scenes -from e.g. human-made environments-, where it is often difficult to find a sufficient number of point features. Alternative geometrical visual cues, such as lines, which can often be found within these scenarios, can become particularly useful. Moreover, these scenarios typically present structural regularities, such as par…
▽ More
Visual odometry algorithms tend to degrade when facing low-textured scenes -from e.g. human-made environments-, where it is often difficult to find a sufficient number of point features. Alternative geometrical visual cues, such as lines, which can often be found within these scenarios, can become particularly useful. Moreover, these scenarios typically present structural regularities, such as parallelism or orthogonality, and hold the Manhattan World assumption. Under these premises, in this work, we introduce MSC-VO, an RGB-D -based visual odometry approach that combines both point and line features and leverages, if exist, those structural regularities and the Manhattan axes of the scene. Within our approach, these structural constraints are initially used to estimate accurately the 3D position of the extracted lines. These constraints are also combined next with the estimated Manhattan axes and the reprojection errors of points and lines to refine the camera pose by means of local map optimization. Such a combination enables our approach to operate even in the absence of the aforementioned constraints, allowing the method to work for a wider variety of scenarios. Furthermore, we propose a novel multi-view Manhattan axes estimation procedure that mainly relies on line features. MSC-VO is assessed using several public datasets, outperforming other state-of-the-art solutions, and comparing favourably even with some SLAM methods.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
LiODOM: Adaptive Local Mapping for Robust LiDAR-Only Odometry
Authors:
Emilio Garcia-Fidalgo,
Joan P. Company-Corcoles,
Francisco Bonnin-Pascual,
Alberto Ortiz
Abstract:
In the last decades, Light Detection And Ranging (LiDAR) technology has been extensively explored as a robust alternative for self-localization and mapping. These approaches typically state ego-motion estimation as a non-linear optimization problem dependent on the correspondences established between the current point cloud and a map, whatever its scope, local or global. This paper proposes LiODOM…
▽ More
In the last decades, Light Detection And Ranging (LiDAR) technology has been extensively explored as a robust alternative for self-localization and mapping. These approaches typically state ego-motion estimation as a non-linear optimization problem dependent on the correspondences established between the current point cloud and a map, whatever its scope, local or global. This paper proposes LiODOM, a novel LiDAR-only ODOmetry and Mapping approach for pose estimation and map-building, based on minimizing a loss function derived from a set of weighted point-to-line correspondences with a local map abstracted from the set of available point clouds. Furthermore, this work places a particular emphasis on map representation given its relevance for quick data association. To efficiently represent the environment, we propose a data structure that combined with a hashing scheme allows for fast access to any section of the map. LiODOM is validated by means of a set of experiments on public datasets, for which it compares favourably against other solutions. Its performance on-board an aerial platform is also reported.
△ Less
Submitted 27 July, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Survey Propagation: A Resource Allocation Solution for Large Wireless Networks
Authors:
Andrea Ortiz,
Daniel Barragan-Yani
Abstract:
The ever-increasing number of nodes in current and future wireless communication networks brings unprecedented challenges for the allocation of the available communication resources. This is caused by the combinatorial nature of the resource allocation problems, which limits the performance of state-of-the-art techniques when the network size increases. In this paper, we take a new direction and i…
▽ More
The ever-increasing number of nodes in current and future wireless communication networks brings unprecedented challenges for the allocation of the available communication resources. This is caused by the combinatorial nature of the resource allocation problems, which limits the performance of state-of-the-art techniques when the network size increases. In this paper, we take a new direction and investigate how methods from statistical physics can be used to address resource allocation problems in large networks. To this aim, we propose a novel model of the wireless network based on a type of disordered physical systems called spin glasses. We show that resource allocation problems have the same structure as the problem of finding specific configurations in spin glasses. Based on this parallel, we investigate the use of the Survey Propagation method from statistical physics in the solution of resource allocation problems in wireless networks. Through numerical simulations we show that the proposed statistical-physics-based resource allocation algorithm is a promising tool for the efficient allocation of communication resources in large wireless communications networks. Given a fixed number of resources, we are able to serve a larger number of nodes, compared to state-of-the-art reference schemes, without introducing more interference into the system
△ Less
Submitted 6 January, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey
Authors:
Ali B. Barlas,
Seda Guler Mert,
Berk Orkun Isa,
Alvaro Ortiz,
Tomasa Rodrigo,
Baris Soybilgen,
Ege Yazgan
Abstract:
We use the aggregate information from individual-to-firm and firm-to-firm in Garanti BBVA Bank transactions to mimic domestic private demand. Particularly, we replicate the quarterly national accounts aggregate consumption and investment (gross fixed capital formation) and its bigger components (Machinery and Equipment and Construction) in real time for the case of Turkey. In order to validate the…
▽ More
We use the aggregate information from individual-to-firm and firm-to-firm in Garanti BBVA Bank transactions to mimic domestic private demand. Particularly, we replicate the quarterly national accounts aggregate consumption and investment (gross fixed capital formation) and its bigger components (Machinery and Equipment and Construction) in real time for the case of Turkey. In order to validate the usefulness of the information derived from these indicators we test the nowcasting ability of both indicators to nowcast the Turkish GDP using different nowcasting models. The results are successful and confirm the usefulness of Consumption and Investment Banking transactions for nowcasting purposes. The value of the Big data information is more relevant at the beginning of the nowcasting process, when the traditional hard data information is scarce. This makes this information specially relevant for those countries where statistical release lags are longer like the Emerging Markets.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Detecting Cattle and Elk in the Wild from Space
Authors:
Caleb Robinson,
Anthony Ortiz,
Lacey Hughey,
Jared A. Stabach,
Juan M. Lavista Ferres
Abstract:
Localizing and counting large ungulates -- hoofed mammals like cows and elk -- in very high-resolution satellite imagery is an important task for supporting ecological studies. Prior work has shown that this is feasible with deep learning based methods and sub-meter multi-spectral satellite imagery. We extend this line of work by proposing a baseline method, CowNet, that simultaneously estimates t…
▽ More
Localizing and counting large ungulates -- hoofed mammals like cows and elk -- in very high-resolution satellite imagery is an important task for supporting ecological studies. Prior work has shown that this is feasible with deep learning based methods and sub-meter multi-spectral satellite imagery. We extend this line of work by proposing a baseline method, CowNet, that simultaneously estimates the number of animals in an image (counts), as well as predicts their location at a pixel level (localizes). We also propose an methodology for evaluating such models on counting and localization tasks across large scenes that takes the uncertainty of noisy labels and the information needed by stakeholders in ecological monitoring tasks into account. Finally, we benchmark our baseline method with state of the art vision methods for counting objects in scenes. We specifically test the temporal generalization of the resulting models over a large landscape in Point Reyes Seashore, CA. We find that the LC-FCN model performs the best and achieves an average precision between 0.56 and 0.61 and an average recall between 0.78 and 0.92 over three held out test scenes.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Tiled sparse coding in eigenspaces for the COVID-19 diagnosis in chest X-ray images
Authors:
Juan E. Arco,
Andrés Ortiz,
Javier Ramírez,
Juan M Gorriz
Abstract:
The ongoing crisis of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), 4 million people have died due to this disease, whereas there have been more than 180 million confirmed cases of COVID-19. The collapse of the health system in many countries has demonstrated the need of developing tools to automatize the diagnosis of the…
▽ More
The ongoing crisis of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), 4 million people have died due to this disease, whereas there have been more than 180 million confirmed cases of COVID-19. The collapse of the health system in many countries has demonstrated the need of developing tools to automatize the diagnosis of the disease from medical imaging. Previous studies have used deep learning for this purpose. However, the performance of this alternative highly depends on the size of the dataset employed for training the algorithm. In this work, we propose a classification framework based on sparse coding in order to identify the pneumonia patterns associated with different pathologies. Specifically, each chest X-ray (CXR) image is partitioned into different tiles. The most relevant features extracted from PCA are then used to build the dictionary within the sparse coding procedure. Once images are transformed and reconstructed from the elements of the dictionary, classification is performed from the reconstruction errors of individual patches associated with each image. Performance is evaluated in a real scenario where simultaneously differentiation between four different pathologies: control vs bacterial pneumonia vs viral pneumonia vs COVID-19. The accuracy when identifying the presence of pneumonia is 93.85%, whereas 88.11% is obtained in the 4-class classification context. The excellent results and the pioneering use of sparse coding in this scenario evidence the applicability of this approach as an aid for clinicians in a real-world environment.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Becoming Good at AI for Good
Authors:
Meghana Kshirsagar,
Caleb Robinson,
Siyu Yang,
Shahrzad Gholami,
Ivan Klyuzhin,
Sumit Mukherjee,
Md Nasir,
Anthony Ortiz,
Felipe Oviedo,
Darren Tanner,
Anusua Trivedi,
Yixi Xu,
Ming Zhong,
Bistra Dilkina,
Rahul Dodhia,
Juan M. Lavista Ferres
Abstract:
AI for good (AI4G) projects involve developing and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Developing and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Ba…
▽ More
AI for good (AI4G) projects involve developing and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Developing and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level categories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations.
△ Less
Submitted 3 May, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.
-
Temporal EigenPAC for dyslexia diagnosis
Authors:
Nicolás Gallego-Molina,
Marco Formoso,
Andrés Ortiz,
Francisco J. Martínez-Murcia,
Juan L. Luque
Abstract:
Electroencephalography signals allow to explore the functional activity of the brain cortex in a non-invasive way. However, the analysis of these signals is not straightforward due to the presence of different artifacts and the very low signal-to-noise ratio. Cross-Frequency Coupling (CFC) methods provide a way to extract information from EEG, related to the synchronization among frequency bands.…
▽ More
Electroencephalography signals allow to explore the functional activity of the brain cortex in a non-invasive way. However, the analysis of these signals is not straightforward due to the presence of different artifacts and the very low signal-to-noise ratio. Cross-Frequency Coupling (CFC) methods provide a way to extract information from EEG, related to the synchronization among frequency bands. However, CFC methods are usually applied in a local way, computing the interaction between phase and amplitude at the same electrode. In this work we show a method to compute PAC features among electrodes to study the functional connectivity. Moreover, this has been applied jointly with Principal Component Analysis to explore patterns related to Dyslexia in 7-years-old children. The developed methodology reveals the temporal evolution of PAC-based connectivity. Directions of greatest variance computed by PCA are called eigenPACs here, since they resemble the classical \textit{eigenfaces} representation. The projection of PAC data onto the eigenPACs provide a set of features that has demonstrates their discriminative capability, specifically in the Beta-Gamma bands.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Modelling Brain Connectivity Networks by Graph Embedding for Dyslexia Diagnosis
Authors:
Marco A. Formoso,
Andrés Ortiz,
Francisco J. Martínez-Murcia,
Nicolás Gallego-Molina,
Juan L. Luque
Abstract:
Several methods have been developed to extract information from electroencephalograms (EEG). One of them is Phase-Amplitude Coupling (PAC) which is a type of Cross-Frequency Coupling (CFC) method, consisting in measure the synchronization of phase and amplitude for the different EEG bands and electrodes. This provides information regarding brain areas that are synchronously activated, and eventual…
▽ More
Several methods have been developed to extract information from electroencephalograms (EEG). One of them is Phase-Amplitude Coupling (PAC) which is a type of Cross-Frequency Coupling (CFC) method, consisting in measure the synchronization of phase and amplitude for the different EEG bands and electrodes. This provides information regarding brain areas that are synchronously activated, and eventually, a marker of functional connectivity between these areas. In this work, intra and inter electrode PAC is computed obtaining the relationship among different electrodes used in EEG. The connectivity information is then treated as a graph in which the different nodes are the electrodes and the edges PAC values between them. These structures are embedded to create a feature vector that can be further used to classify multichannel EEG samples. The proposed method has been applied to classified EEG samples acquired using specific auditory stimuli in a task designed for dyslexia disorder diagnosis in seven years old children EEG's. The proposed method provides AUC values up to 0.73 and allows selecting the most discriminant electrodes and EEG bands.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Temporal Cluster Matching for Change Detection of Structures from Satellite Imagery
Authors:
Caleb Robinson,
Anthony Ortiz,
Juan M. Lavista Ferres,
Brandon Anderson,
Daniel E. Ho
Abstract:
Longitudinal studies are vital to understanding dynamic changes of the planet, but labels (e.g., buildings, facilities, roads) are often available only for a single point in time. We propose a general model, Temporal Cluster Matching (TCM), for detecting building changes in time series of remotely sensed imagery when footprint labels are observed only once. The intuition behind the model is that t…
▽ More
Longitudinal studies are vital to understanding dynamic changes of the planet, but labels (e.g., buildings, facilities, roads) are often available only for a single point in time. We propose a general model, Temporal Cluster Matching (TCM), for detecting building changes in time series of remotely sensed imagery when footprint labels are observed only once. The intuition behind the model is that the relationship between spectral values inside and outside of building's footprint will change when a building is constructed (or demolished). For instance, in rural settings, the pre-construction area may look similar to the surrounding environment until the building is constructed. Similarly, in urban settings, the pre-construction areas will look different from the surrounding environment until construction. We further propose a heuristic method for selecting the parameters of our model which allows it to be applied in novel settings without requiring data labeling efforts (to fit the parameters). We apply our model over a dataset of poultry barns from 2016/2017 high-resolution aerial imagery in the Delmarva Peninsula and a dataset of solar farms from a 2020 mosaic of Sentinel 2 imagery in India. Our results show that our model performs as well when fit using the proposed heuristic as it does when fit with labeled data, and further, that supervised versions of our model perform the best among all the baselines we test against. Finally, we show that our proposed approach can act as an effective data augmentation strategy -- it enables researchers to augment existing structure footprint labels along the time dimension and thus use imagery from multiple points in time to train deep learning models. We show that this improves the spatial generalization of such models when evaluated on the same change detection task.
△ Less
Submitted 29 June, 2021; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images
Authors:
Juan E. Arco,
Andrés Ortiz,
Javier Ramírez,
Francisco J. Martínez-Murcia,
Yu-Dong Zhang,
Jordi Broncano,
M. Álvaro Berbís,
Javier Royuela-del-Val,
Antonio Luna,
Juan M. Górriz
Abstract:
The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT)…
▽ More
The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.
△ Less
Submitted 26 September, 2022; v1 submitted 4 March, 2021;
originally announced March 2021.
-
A DCNN-based Arbitrarily-Oriented Object Detector for Quality Control and Inspection Application
Authors:
Kai Yao,
Alberto Ortiz,
Francisco Bonnin-Pascual
Abstract:
Following the success of machine vision systems for on-line automated quality control and inspection processes, an object recognition solution is presented in this work for two different specific applications, i.e., the detection of quality control items in surgery toolboxes prepared for sterilizing in a hospital, as well as the detection of defects in vessel hulls to prevent potential structural…
▽ More
Following the success of machine vision systems for on-line automated quality control and inspection processes, an object recognition solution is presented in this work for two different specific applications, i.e., the detection of quality control items in surgery toolboxes prepared for sterilizing in a hospital, as well as the detection of defects in vessel hulls to prevent potential structural failures. The solution has two stages. First, a feature pyramid architecture based on Single Shot MultiBox Detector (SSD) is used to improve the detection performance, and a statistical analysis based on ground truth is employed to select parameters of a range of default boxes. Second, a lightweight neural network is exploited to achieve oriented detection results using a regression method. The first stage of the proposed method is capable of detecting the small targets considered in the two scenarios. In the second stage, despite the simplicity, it is efficient to detect elongated targets while maintaining high running efficiency.
△ Less
Submitted 30 June, 2022; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary
Authors:
Jean-Francois Rajotte,
Sumit Mukherjee,
Caleb Robinson,
Anthony Ortiz,
Christopher West,
Juan Lavista Ferres,
Raymond T Ng
Abstract:
We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside…
▽ More
We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside local premises. FELICIA works for a large family of Generative Adversarial Networks (GAN) architectures including vanilla and conditional GANs as demonstrated in this work. We show that by using the FELICIA mechanism, a data owner with limited image samples can generate high-quality synthetic images with high utility while neither data owners has to provide access to its data. The sharing happens solely through a central discriminator that has access limited to synthetic data. Here, utility is defined as classification performance on a real test set. We demonstrate these benefits on several realistic healthcare scenarions using benchmark image datasets (MNIST, CIFAR-10) as well as on medical images for the task of skin lesion classification. With multiple experiments, we show that even in the worst cases, combining FELICIA with real data gracefully achieves performance on par with real data while most results significantly improves the utility.
△ Less
Submitted 28 August, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya
Authors:
Shimaa Baraka,
Benjamin Akera,
Bibek Aryal,
Tenzing Sherpa,
Finu Shresta,
Anthony Ortiz,
Kris Sankaran,
Juan Lavista Ferres,
Mir Matin,
Yoshua Bengio
Abstract:
Glacier mapping is key to ecological monitoring in the hkh region. Climate change poses a risk to individuals whose livelihoods depend on the health of glacier ecosystems. In this work, we present a machine learning based approach to support ecological monitoring, with a focus on glaciers. Our approach is based on semi-automated mapping from satellite images. We utilize readily available remote se…
▽ More
Glacier mapping is key to ecological monitoring in the hkh region. Climate change poses a risk to individuals whose livelihoods depend on the health of glacier ecosystems. In this work, we present a machine learning based approach to support ecological monitoring, with a focus on glaciers. Our approach is based on semi-automated mapping from satellite images. We utilize readily available remote sensing data to create a model to identify and outline both clean ice and debris-covered glaciers from satellite imagery. We also release data and develop a web tool that allows experts to visualize and correct model predictions, with the ultimate aim of accelerating the glacier mapping process.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Uncertainty-driven ensembles of deep architectures for multiclass classification. Application to COVID-19 diagnosis in chest X-ray images
Authors:
Juan E. Arco,
A. Ortiz,
J. Ramirez,
F. J. Martinez-Murcia,
Yu-Dong Zhang,
Juan M. Gorriz
Abstract:
Respiratory diseases kill million of people each year. Diagnosis of these pathologies is a manual, time-consuming process that has inter and intra-observer variability, delaying diagnosis and treatment. The recent COVID-19 pandemic has demonstrated the need of developing systems to automatize the diagnosis of pneumonia, whilst Convolutional Neural Network (CNNs) have proved to be an excellent opti…
▽ More
Respiratory diseases kill million of people each year. Diagnosis of these pathologies is a manual, time-consuming process that has inter and intra-observer variability, delaying diagnosis and treatment. The recent COVID-19 pandemic has demonstrated the need of developing systems to automatize the diagnosis of pneumonia, whilst Convolutional Neural Network (CNNs) have proved to be an excellent option for the automatic classification of medical images. However, given the need of providing a confidence classification in this context it is crucial to quantify the reliability of the model's predictions. In this work, we propose a multi-level ensemble classification system based on a Bayesian Deep Learning approach in order to maximize performance while quantifying the uncertainty of each classification decision. This tool combines the information extracted from different architectures by weighting their results according to the uncertainty of their predictions. Performance of the Bayesian network is evaluated in a real scenario where simultaneously differentiating between four different pathologies: control vs bacterial pneumonia vs viral pneumonia vs COVID-19 pneumonia. A three-level decision tree is employed to divide the 4-class classification into three binary classifications, yielding an accuracy of 98.06% and overcoming the results obtained by recent literature. The reduced preprocessing needed for obtaining this high performance, in addition to the information provided about the reliability of the predictions evidence the applicability of the system to be used as an aid for clinicians.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Dyslexia detection from EEG signals using SSA component correlation and Convolutional Neural Networks
Authors:
Andrés Ortiz,
Francisco J. Martinez-Murcia,
Marco A. Formoso,
Juan Luis Luque,
Auxiliadora Sánchez
Abstract:
Objective dyslexia diagnosis is not a straighforward task since it is traditionally performed by means of the intepretation of different behavioural tests. Moreover, these tests are only applicable to readers. This way, early diagnosis requires the use of specific tasks not only related to reading. Thus, the use of Electroencephalography (EEG) constitutes an alternative for an objective and early…
▽ More
Objective dyslexia diagnosis is not a straighforward task since it is traditionally performed by means of the intepretation of different behavioural tests. Moreover, these tests are only applicable to readers. This way, early diagnosis requires the use of specific tasks not only related to reading. Thus, the use of Electroencephalography (EEG) constitutes an alternative for an objective and early diagnosis that can be used with pre-readers. In this way, the extraction of relevant features in EEG signals results crucial for classification. However, the identification of the most relevant features is not straighforward, and predefined statistics in the time or frequency domain are not always discriminant enough. On the other hand, classical processing of EEG signals based on extracting EEG bands frequency descriptors, usually make some assumptions on the raw signals that could cause indormation loosing. In this work we propose an alternative for analysis in the frequency domain based on Singluar Spectrum Analysis (SSA) to split the raw signal into components representing different oscillatory modes. Moreover, correlation matrices obtained for each component among EEG channels are classfied using a Convolutional Neural network.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection
Authors:
Kai Yao,
Alberto Ortiz,
Francisco Bonnin-Pascual
Abstract:
It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive performance. This is particularly difficult for semantic segmentation tasks since the annotation must be ideally generated at the pixel level. Weakly-supervised semantic segmentation a…
▽ More
It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive performance. This is particularly difficult for semantic segmentation tasks since the annotation must be ideally generated at the pixel level. Weakly-supervised semantic segmentation aims at reducing this cost by employing simpler annotations that, hence, are easier, cheaper and quicker to produce. In this paper, we propose and assess a new weakly-supervised semantic segmentation approach making use of a novel loss function whose goal is to counteract the effects of weak annotations. To this end, this loss function comprises several terms based on partial cross-entropy losses, being one of them the Centroid Loss. This term induces a clustering of the image pixels in the object classes under consideration, whose aim is to improve the training of the segmentation network by guiding the optimization. The performance of the approach is evaluated against datasets from two different industry-related case studies: while one involves the detection of instances of a number of different object classes in the context of a quality control application, the other stems from the visual inspection domain and deals with the localization of images areas whose pixels correspond to scene surface points affected by a specific sort of defect. The detection results that are reported for both cases show that, despite the differences among them and the particular challenges, the use of weak annotations do not prevent from achieving a competitive performance level for both.
△ Less
Submitted 4 March, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
LiPo-LCD: Combining Lines and Points for Appearance-based Loop Closure Detection
Authors:
Joan P. Company-Corcoles,
Emilio Garcia-Fidalgo,
Alberto Ortiz
Abstract:
Visual SLAM approaches typically depend on loop closure detection to correct the inconsistencies that may arise during the map and camera trajectory calculations, typically making use of point features for detecting and closing the existing loops. In low-textured scenarios, however, it is difficult to find enough point features and, hence, the performance of these solutions drops drastically. An a…
▽ More
Visual SLAM approaches typically depend on loop closure detection to correct the inconsistencies that may arise during the map and camera trajectory calculations, typically making use of point features for detecting and closing the existing loops. In low-textured scenarios, however, it is difficult to find enough point features and, hence, the performance of these solutions drops drastically. An alternative for human-made scenarios, due to their structural regularity, is the use of geometrical cues such as straight segments, frequently present within these environments. Under this context, in this paper we introduce LiPo-LCD, a novel appearance-based loop closure detection method that integrates lines and points. Adopting the idea of incremental Bag-of-Binary-Words schemes, we build separate BoW models for each feature, and use them to retrieve previously seen images using a late fusion strategy. Additionally, a simple but effective mechanism, based on the concept of island, groups similar images close in time to reduce the image candidate search effort. A final step validates geometrically the loop candidates by incorporating the detected lines by means of a process comprising a line feature matching stage, followed by a robust spatial verification stage, now combining both lines and points. As it is reported in the paper, LiPo-LCD compares well with several state-of-the-art solutions for a number of datasets involving different environmental conditions.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Evaluation of a Skill-based Control Architecture for a Visual Inspection-oriented Aerial Platform
Authors:
Emilio Garcia-Fidalgo,
Francisco Bonnin-Pascual,
Joan P. Company-Corcoles,
Alberto Ortiz
Abstract:
The periodic inspection of vessels is a fundamental task to ensure their integrity and avoid maritime accidents. Currently, these inspections represent a high cost for the ship owner, in addition to the danger that this kind of hostile environment entails for the surveyors. In these situations, robotic platforms turn out to be useful not only for safety reasons, but also to reduce vessel downtimes…
▽ More
The periodic inspection of vessels is a fundamental task to ensure their integrity and avoid maritime accidents. Currently, these inspections represent a high cost for the ship owner, in addition to the danger that this kind of hostile environment entails for the surveyors. In these situations, robotic platforms turn out to be useful not only for safety reasons, but also to reduce vessel downtimes and simplify the inspection procedures. Under this context, in this paper we report on the evaluation of a new control architecture devised to drive an aerial platform during these inspection procedures. The control architecture, based on an extensive use of behaviour-based high-level control, implements visual inspection-oriented functionalities, while releases the operator from the complexities of inspection flights and ensures the integrity of the platform. Apart from the control software, the full system comprises a multi-rotor platform equipped with a suitable set of sensors to permit teleporting the surveyor to the areas that need inspection. The paper provides an extensive set of testing results in different scenarios, under different operational conditions and over real vessels, in order to demonstrate the suitability of the platform for this kind of tasks.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Mining self-similarity: Label super-resolution with epitomic representations
Authors:
Nikolay Malkin,
Anthony Ortiz,
Caleb Robinson,
Nebojsa Jojic
Abstract:
We show that simple patch-based models, such as epitomes, can have superior performance to the current state of the art in semantic segmentation and label super-resolution, which uses deep convolutional neural networks. We derive a new training algorithm for epitomes which allows, for the first time, learning from very large data sets and derive a label super-resolution algorithm as a statistical…
▽ More
We show that simple patch-based models, such as epitomes, can have superior performance to the current state of the art in semantic segmentation and label super-resolution, which uses deep convolutional neural networks. We derive a new training algorithm for epitomes which allows, for the first time, learning from very large data sets and derive a label super-resolution algorithm as a statistical inference algorithm over epitomic representations. We illustrate our methods on land cover mapping and medical image analysis tasks.
△ Less
Submitted 13 December, 2021; v1 submitted 23 April, 2020;
originally announced April 2020.
-
A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia
Authors:
F. J. Martinez-Murcia,
A. Ortiz,
Marco A. Formoso,
M. Lopez-Zamora,
J. L. Luque,
A. Giménez
Abstract:
Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that aff…
▽ More
Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that affect phoneme processing, and hence these symptoms may be identifiable before reading ability is acquired, allowing for early intervention. In this paper we propose a new methodology to assess the risk of DD before students learn to read. For this purpose, we propose a mixed neural model that calculates risk levels of dyslexia from tests that can be completed at the age of 5 years. Our method first trains an auto-encoder, and then combines the trained encoder with an optimized ordinal regression neural network devised to ensure consistency of predictions. Our experiments show that the system is able to detect unaffected subjects two years before it can assess the risk of DD based mainly on phonological processing, giving a specificity of 0.969 and a correct rate of more than 0.92. In addition, the trained encoder can be used to transform test results into an interpretable subject spatial distribution that facilitates risk assessment and validates methodology.
△ Less
Submitted 20 October, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.