Search | arXiv e-print repository

ViewDelta: Text-Prompted Change Detection in Unaligned Images

Authors: Subin Varghese, Joshua Gao, Vedhus Hoskere

Abstract: Detecting changes between images is fundamental in applications such as infrastructure assessment, environmental monitoring, and industrial automation. Existing supervised models demonstrate strong performance but are inherently limited by the scope of their training data, requiring retraining to recognize novel changes. To overcome this limitation, we introduce a novel change detection task utili… ▽ More Detecting changes between images is fundamental in applications such as infrastructure assessment, environmental monitoring, and industrial automation. Existing supervised models demonstrate strong performance but are inherently limited by the scope of their training data, requiring retraining to recognize novel changes. To overcome this limitation, we introduce a novel change detection task utilizing textual prompts alongside two potentially unaligned images to produce binary segmentations highlighting user-relevant changes. This text-conditioned framework significantly broadens the scope of change detection, enabling unparalleled flexibility and straightforward scalability by incorporating diverse future datasets without restriction to specific change types. As a first approach to address this challenge, we propose ViewDelta, a multimodal architecture extending the vision transformer into the domain of text-conditioned change detection. ViewDelta establishes a robust baseline, demonstrating flexibility across various scenarios and achieving competitive results compared to specialized, fine-tuned models trained on aligned images. Moreover, we create and release the first text-prompt-conditioned change detection dataset, comprising 501,153 image pairs with corresponding textual prompts and annotated labels. Extensive experiments confirm the robustness and versatility of our model across diverse environments, including indoor, outdoor, street-level, synthetic, and satellite imagery. https://joshuakgao.github.io/viewdelta/ △ Less

Submitted 18 March, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

arXiv:2412.04664 [pdf, other]

Multiclass Post-Earthquake Building Assessment Integrating Optical and SAR Satellite Imagery, Ground Motion, and Soil Data with Transformers

Authors: Deepank Singh, Vedhus Hoskere, Pietro Milillo

Abstract: Timely and accurate assessments of building damage are crucial for effective response and recovery in the aftermath of earthquakes. Conventional preliminary damage assessments (PDA) often rely on manual door-to-door inspections, which are not only time-consuming but also pose significant safety risks. To safely expedite the PDA process, researchers have studied the applicability of satellite image… ▽ More Timely and accurate assessments of building damage are crucial for effective response and recovery in the aftermath of earthquakes. Conventional preliminary damage assessments (PDA) often rely on manual door-to-door inspections, which are not only time-consuming but also pose significant safety risks. To safely expedite the PDA process, researchers have studied the applicability of satellite imagery processed with heuristic and machine learning approaches. These approaches output binary or, more recently, multiclass damage states at the scale of a block or a single building. However, the current performance of such approaches limits practical applicability. To address this limitation, we introduce a metadata-enriched, transformer based framework that combines high-resolution post-earthquake satellite imagery with building-specific metadata relevant to the seismic performance of the structure. Our model achieves state-of-the-art performance in multiclass post-earthquake damage identification for buildings from the Turkey-Syria earthquake on February 6, 2023. Specifically, we demonstrate that incorporating metadata, such as seismic intensity indicators, soil properties, and SAR damage proxy maps not only enhances the model's accuracy and ability to distinguish between damage classes, but also improves its generalizability across various regions. Furthermore, we conducted a detailed, class-wise analysis of feature importance to understand the model's decision-making across different levels of building damage. This analysis reveals how individual metadata features uniquely contribute to predictions for each damage class. By leveraging both satellite imagery and metadata, our proposed framework enables faster and more accurate damage assessments for precise, multiclass, building-level evaluations that can improve disaster response and accelerate recovery efforts for affected communities. △ Less

Submitted 26 February, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

Comments: 28 Pages, 12 Figures

arXiv:2409.16381 [pdf]

Instance Segmentation of Reinforced Concrete Bridges with Synthetic Point Clouds

Authors: Asad Ur Rahman, Vedhus Hoskere

Abstract: The National Bridge Inspection Standards require detailed element-level bridge inspections. Traditionally, inspectors manually assign condition ratings by rating structural components based on damage, but this process is labor-intensive and time-consuming. Automating the element-level bridge inspection process can facilitate more comprehensive condition documentation to improve overall bridge mana… ▽ More The National Bridge Inspection Standards require detailed element-level bridge inspections. Traditionally, inspectors manually assign condition ratings by rating structural components based on damage, but this process is labor-intensive and time-consuming. Automating the element-level bridge inspection process can facilitate more comprehensive condition documentation to improve overall bridge management. While semantic segmentation of bridge point clouds has been studied, research on instance segmentation of bridge elements is limited, partly due to the lack of annotated datasets, and the difficulty in generalizing trained models. To address this, we propose a novel approach for generating synthetic data using three distinct methods. Our framework leverages the Mask3D transformer model, optimized with hyperparameter tuning and a novel occlusion technique. The model achieves state-of-the-art performance on real LiDAR and photogrammetry bridge point clouds, respectively, demonstrating the potential of the framework for automating element-level bridge inspections. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 33 pages, 12 figures, Submitted to "Automation in Construction"

arXiv:2407.16874 [pdf]

Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair

Authors: Joshua Genova, Eric Cabrera, Vedhus Hoskere

Abstract: Surface cracks in infrastructure can lead to significant deterioration and costly maintenance if not efficiently repaired. Manual repair methods are labor-intensive, time-consuming, and imprecise and thus difficult to scale to large areas. While advancements in robotic perception and manipulation have progressed autonomous crack repair, existing methods still face three key challenges: accurate lo… ▽ More Surface cracks in infrastructure can lead to significant deterioration and costly maintenance if not efficiently repaired. Manual repair methods are labor-intensive, time-consuming, and imprecise and thus difficult to scale to large areas. While advancements in robotic perception and manipulation have progressed autonomous crack repair, existing methods still face three key challenges: accurate localization of cracks within the robot's coordinate frame, (ii) adaptability to varying crack depths and widths, and (iii) validation of the repair process under realistic conditions. This paper presents an adaptive, autonomous system for surface crack detection and repair using robotics with advanced sensing technologies to enhance precision and safety for humans. The system uses an RGB-D camera for crack detection, a laser scanner for precise measurement, and an extruder and pump for material deposition. To address one of the key challenges, the laser scanner is used to enhance the crack coordinates for accurate localization. Furthermore, our approach demonstrates that an adaptive crack-filling method is more efficient and effective than a fixed-speed approach, with experimental results confirming both precision and consistency. In addition, to ensure real-world applicability and testing repeatability, we introduce a novel validation procedure using 3D-printed crack specimens that accurately simulate real-world conditions. This research contributes to the evolving field of human-robot interaction in construction by demonstrating how adaptive robotic systems can reduce the need for manual labor, improve safety, and enhance the efficiency of maintenance operations, ultimately paving the way for more sophisticated and integrated construction robotics. △ Less

Submitted 16 October, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: 22 pages, 14 figures, submitted to Advanced Engineering Informatics

arXiv:2406.18012 [pdf, other]

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis

Authors: Subin Varghese, Vedhus Hoskere

Abstract: The built environment, encompassing critical infrastructure such as bridges and buildings, requires diligent monitoring of unexpected anomalies or deviations from a normal state in captured imagery. Anomaly detection methods could aid in automating this task; however, deploying anomaly detection effectively in such environments presents significant challenges that have not been evaluated before. T… ▽ More The built environment, encompassing critical infrastructure such as bridges and buildings, requires diligent monitoring of unexpected anomalies or deviations from a normal state in captured imagery. Anomaly detection methods could aid in automating this task; however, deploying anomaly detection effectively in such environments presents significant challenges that have not been evaluated before. These challenges include camera viewpoints that vary, the presence of multiple objects within a scene, and the absence of labeled anomaly data for training. To address these comprehensively, we introduce and formalize Scene Anomaly Detection (Scene AD) as the task of unsupervised, pixel-wise anomaly localization under these specific real-world conditions. Evaluating progress in Scene AD required the development of ToyCity, the first multi-object, multi-view real-image dataset, for unsupervised anomaly detection. Our initial evaluations using ToyCity revealed that established anomaly detection baselines struggle to achieve robust pixel-level localization. To address this, two data augmentation strategies were created to generate additional synthetic images of non-anomalous regions to enhance generalizability. However, the addition of these synthetic images alone only provided minor improvements. Thus, OmniAD, a refinement of the Reverse Distillation methodology, was created to establish a stronger baseline. Our experiments demonstrate that OmniAD, when used with augmented views, yields a 64.33\% increase in pixel-wise \(F_1\) score over Reverse Distillation with no augmentation. Collectively, this work offers the Scene AD task definition, the ToyCity benchmark, the view synthesis augmentation approaches, and the OmniAD method. Project Page: https://drags99.github.io/OmniAD/ △ Less

Submitted 19 May, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2305.12052 [pdf, other]

Deep Learning Hydrodynamic Forecasting for Flooded Region Assessment in Near-Real-Time (DL Hydro-FRAN)

Authors: Francisco Haces-Garcia, Natalya Maslennikova, Craig L Glennie, Hanadi S Rifai, Vedhus Hoskere, Nima Ekhtari

Abstract: Hydrodynamic flood modeling improves hydrologic and hydraulic prediction of storm events. However, the computationally intensive numerical solutions required for high-resolution hydrodynamics have historically prevented their implementation in near-real-time flood forecasting. This study examines whether several Deep Neural Network (DNN) architectures are suitable for optimizing hydrodynamic flood… ▽ More Hydrodynamic flood modeling improves hydrologic and hydraulic prediction of storm events. However, the computationally intensive numerical solutions required for high-resolution hydrodynamics have historically prevented their implementation in near-real-time flood forecasting. This study examines whether several Deep Neural Network (DNN) architectures are suitable for optimizing hydrodynamic flood models. Several pluvial flooding events were simulated in a low-relief high-resolution urban environment using a 2D HEC-RAS hydrodynamic model. These simulations were assembled into a training set for the DNNs, which were then used to forecast flooding depths and velocities. The DNNs' forecasts were compared to the hydrodynamic flood models, and showed good agreement, with a median RMSE of around 2 mm for cell flooding depths in the study area. The DNNs also improved forecast computation time significantly, with the DNNs providing forecasts between 34.2 and 72.4 times faster than conventional hydrodynamic models. The study area showed little change between HEC-RAS' Full Momentum Equations and Diffusion Equations, however, important numerical stability considerations were discovered that impact equation selection and DNN architecture configuration. Overall, the results from this study show that DNNs can greatly optimize hydrodynamic flood modeling, and enable near-real-time hydrodynamic flood forecasting. △ Less

Submitted 5 July, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures

arXiv:1809.09195 [pdf]

Towards Automated Post-Earthquake Inspections with Deep Learning-based Condition-Aware Models

Authors: Vedhus Hoskere, Yasutaka Narazaki, Tu A. Hoang, Billie F. Spencer Jr

Abstract: In the aftermath of an earthquake, rapid structural inspections are required to get citizens back in to their homes and offices in a safe and timely manner. These inspections gfare typically conducted by municipal authorities through structural engineer volunteers. As manual inspec-tions can be time consuming, laborious and dangerous, research has been underway to develop methods to help speed up… ▽ More In the aftermath of an earthquake, rapid structural inspections are required to get citizens back in to their homes and offices in a safe and timely manner. These inspections gfare typically conducted by municipal authorities through structural engineer volunteers. As manual inspec-tions can be time consuming, laborious and dangerous, research has been underway to develop methods to help speed up and increase the automation of the entire process. Researchers typi-cally envisage the use of unmanned aerial vehicles (UAV) for data acquisition and computer vision for data processing to extract actionable information. In this work we propose a new framework to generate vision-based condition-aware models that can serve as the basis for speeding up or automating higher level inspection decisions. The condition-aware models are generated by projecting the inference of trained deep-learning models on a set of images of a structure onto a 3D mesh model generated through multi-view stereo from the same image set. Deep fully convolutional residual networks are used for semantic segmentation of images of buildings to provide (i) damage information such as cracks and spalling (ii) contextual infor-mation such as the presence of a building and visually identifiable components like windows and doors. The proposed methodology was implemented on a damaged building that was sur-veyed by the authors after the Central Mexico Earthquake in September 2017 and qualitative-ly evaluated. Results demonstrate the promise of the proposed method towards the ultimate goal of rapid and automated post-earthquake inspections. △ Less

Submitted 24 September, 2018; originally announced September 2018.

arXiv:1806.06820 [pdf]

Automated Bridge Component Recognition using Video Data

Authors: Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer Jr

Abstract: This paper investigates the automated recognition of structural bridge components using video data. Although understanding video data for structural inspections is straightforward for human inspectors, the implementation of the same task using machine learning methods has not been fully realized. In particular, single-frame image processing techniques, such as convolutional neural networks (CNNs),… ▽ More This paper investigates the automated recognition of structural bridge components using video data. Although understanding video data for structural inspections is straightforward for human inspectors, the implementation of the same task using machine learning methods has not been fully realized. In particular, single-frame image processing techniques, such as convolutional neural networks (CNNs), are not expected to identify structural components accurately when the image is a close-up view, lacking contextual information regarding where on the structure the image originates. Inspired by the significant progress in video processing techniques, this study investigates automated bridge component recognition using video data, where the information from the past frames is used to augment the understanding of the current frame. A new simulated video dataset is created to train the machine learning algorithms. Then, convolutional Neural Networks (CNNs) with recurrent architectures are designed and applied to implement the automated bridge component recognition task. Results are presented for simulated video data, as well as video collected in the field. △ Less

Submitted 27 September, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

arXiv:1805.06042 [pdf]

Automated Vision-based Bridge Component Extraction Using Multiscale Convolutional Neural Networks

Authors: Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer Jr

Abstract: Image data has a great potential of helping post-earthquake visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been applied to detect damages automatically from a close-up image of a structural component. However, the application of the automatic damage detection methods become incr… ▽ More Image data has a great potential of helping post-earthquake visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been applied to detect damages automatically from a close-up image of a structural component. However, the application of the automatic damage detection methods become increasingly difficult when the image includes multiple components from different structures. To reduce the inaccurate false positive alarms, critical structural components need to be recognized first, and the damage alarms need to be cleaned using the component recognition results. To achieve the goal, this study aims at recognizing and extracting bridge components from images of urban scenes. The bridge component recognition begins with pixel-wise classifications of an image into 10 scene classes. Then, the original image and the scene classification results are combined to classify the image pixels into five component classes. The multi-scale convolutional neural networks (multi-scale CNNs) are used to perform pixel-wise classification, and the classification results are post-processed by averaging within superpixels and smoothing by conditional random fields (CRFs). The performance of the bridge component extraction is tested in terms of accuracy and consistency. △ Less

Submitted 15 May, 2018; originally announced May 2018.

arXiv:1805.06041 [pdf]

Vision-based Automated Bridge Component Recognition Integrated With High-level Scene Understanding

Authors: Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer

Abstract: Image data has a great potential of helping conventional visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been proposed to detect damages, such as cracks and spalling on a close-up image of a single component (columns and road surfaces etc.). However, these techniques commonly suf… ▽ More Image data has a great potential of helping conventional visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been proposed to detect damages, such as cracks and spalling on a close-up image of a single component (columns and road surfaces etc.). However, these techniques commonly suffer from severe false-positives especially when the image includes multiple components of different structures. To reduce the false-positives and extract reliable information about the structures' conditions, detection and localization of critical structural components are important first steps preceding the damage assessment. This study aims at recognizing bridge structural and non-structural components from images of urban scenes. During the bridge component recognition, every image pixel is classified into one of the five classes (non-bridge, columns, beams and slabs, other structural, other nonstructural) by multi-scale convolutional neural networks (multi-scale CNNs). To reduce false-positives and get consistent labels, the component classifications are integrated with scene understanding by an additional classifier with 10 higher-level scene classes (building, greenery, person, pavement, signs and poles, vehicles, bridges, water, sky, and others). The bridge component recognition integrated with the scene understanding is compared with the naive approach without scene classification in terms of accuracy, false-positives and consistencies to demonstrate the effectiveness of the integrated approach. △ Less

Submitted 15 May, 2018; originally announced May 2018.

arXiv:1805.01055 [pdf]

Vision-based Structural Inspection using Multiscale Deep Convolutional Neural Networks

Authors: Vedhus Hoskere, Yasutaka Narazaki, Tu Hoang, BillieF Spencer Jr

Abstract: Current methods of practice for inspection of civil infrastructure typically involve visual assessments conducted manually by trained inspectors. For post-earthquake structural inspections, the number of structures to be inspected often far exceeds the capability of the available inspectors. The labor intensive and time consuming natures of manual inspection have engendered research into developme… ▽ More Current methods of practice for inspection of civil infrastructure typically involve visual assessments conducted manually by trained inspectors. For post-earthquake structural inspections, the number of structures to be inspected often far exceeds the capability of the available inspectors. The labor intensive and time consuming natures of manual inspection have engendered research into development of algorithms for automated damage identification using computer vision techniques. In this paper, a novel damage localization and classification technique based on a state of the art computer vision algorithm is presented to address several key limitations of current computer vision techniques. The proposed algorithm carries out a pixel-wise classification of each image at multiple scales using a deep convolutional neural network and can recognize 6 different types of damage. The resulting output is a segmented image where the portion of the image representing damage is outlined and classified as one of the trained damage categories. The proposed method is evaluated in terms of pixel accuracy and the application of the method to real world images is shown. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Showing 1–11 of 11 results for author: Hoskere, V