-
The Urban Model Platform: A Public Backbone for Modeling and Simulation in Urban Digital Twins
Authors:
Rico H Herzog,
Till Degkwitz,
Trivik Verma
Abstract:
Urban digital twins are increasingly perceived as a way to pool the growing digital resources of cities for the purpose of a more sustainable and integrated urban planning. Models and simulations are central to this undertaking: They enable "what if?" scenarios, create insights and describe relationships between the vast data that is being collected. However, the process of integrating and subsequ…
▽ More
Urban digital twins are increasingly perceived as a way to pool the growing digital resources of cities for the purpose of a more sustainable and integrated urban planning. Models and simulations are central to this undertaking: They enable "what if?" scenarios, create insights and describe relationships between the vast data that is being collected. However, the process of integrating and subsequently using models in urban digital twins is an inherently complex undertaking. It raises questions about how to represent urban complexity, how to deal with uncertain assUrban Model Platformtions and modeling paradigms, and how to capture underlying power relations. Existent approaches in the domain largely focus on monolithic and centralized solutions in the tradition of neoliberal city-making, oftentimes prohibiting pluralistic and open interoperable models. Using a participatory design for participatory systems approach together with the City of Hamburg, Germany, we find that an open Urban Model Platform can function both as a public technological backbone for modeling and simulation in urban digital twins and as a socio-technical framework for a collaborative and pluralistic representation of urban processes. Such a platform builds on open standards, allows for a decentralized integration of models, enables communication between models and supports a multi-model approach to representing urban systems.
△ Less
Submitted 16 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
Atlantes: A system of GPS transformers for global-scale real-time maritime intelligence
Authors:
Henry Herzog,
Joshua Hansen,
Yawen Zhang,
Patrick Beukema
Abstract:
Unsustainable exploitation of the oceans exacerbated by global warming is threatening coastal communities worldwide. Accurate and timely monitoring of maritime activity is an essential step to effective governance and to inform future policy. In support of this complex global-scale effort, we built Atlantes, a deep learning based system that provides the first-ever real-time view of vessel behavio…
▽ More
Unsustainable exploitation of the oceans exacerbated by global warming is threatening coastal communities worldwide. Accurate and timely monitoring of maritime activity is an essential step to effective governance and to inform future policy. In support of this complex global-scale effort, we built Atlantes, a deep learning based system that provides the first-ever real-time view of vessel behavior at global scale. Atlantes leverages a series of bespoke transformers to distill a high volume, continuous stream of GPS messages emitted by hundreds of thousands of vessels into easily quantifiable behaviors. The combination of low latency and high performance enables operationally relevant decision-making and successful interventions on the high seas where illegal and exploitative activity is too common. Atlantes is already in use by hundreds of organizations worldwide. Here we provide an overview of the model and infrastructure that enables this system to function efficiently and cost-effectively at global-scale and in real-time.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Contour Integration Underlies Human-Like Vision
Authors:
Ben Lonnqvist,
Elsa Scialom,
Abdulkadir Gokce,
Zehra Merchant,
Michael H. Herzog,
Martin Schrimpf
Abstract:
Despite the tremendous success of deep learning in computer vision, models still fall behind humans in generalizing to new input distributions. Existing benchmarks do not investigate the specific failure points of models by analyzing performance under many controlled conditions. Our study systematically dissects where and why models struggle with contour integration -- a hallmark of human vision -…
▽ More
Despite the tremendous success of deep learning in computer vision, models still fall behind humans in generalizing to new input distributions. Existing benchmarks do not investigate the specific failure points of models by analyzing performance under many controlled conditions. Our study systematically dissects where and why models struggle with contour integration -- a hallmark of human vision -- by designing an experiment that tests object recognition under various levels of object fragmentation. Humans (n=50) perform at high accuracy, even with few object contours present. This is in contrast to models which exhibit substantially lower sensitivity to increasing object contours, with most of the over 1,000 models we tested barely performing above chance. Only at very large scales ($\sim5B$ training dataset size) do models begin to approach human performance. Importantly, humans exhibit an integration bias -- a preference towards recognizing objects made up of directional fragments over directionless fragments. We find that not only do models that share this property perform better at our task, but that this bias also increases with model training dataset size, and training models to exhibit contour integration leads to high shape bias. Taken together, our results suggest that contour integration is a hallmark of object vision that underlies object recognition performance, and may be a mechanism learned from data at scale.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
Authors:
Gabriel Tseng,
Anthony Fuller,
Marlena Reil,
Henry Herzog,
Patrick Beukema,
Favyen Bastani,
James R. Green,
Evan Shelhamer,
Hannah Kerner,
David Rolnick
Abstract:
We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the d…
▽ More
We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the diversity of relevant data modalities, and because objects of interest vary massively in scale, from small boats (1-2 pixels and fast) to glaciers (thousands of pixels and slow). We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. Our dual global and local contrastive losses differ in their targets (deep representations vs. shallow input projections) and masking strategies (structured vs. not). Our Galileo is a single generalist model that outperforms SoTA specialist models for satellite images and pixel time series across eleven benchmarks and multiple tasks.
△ Less
Submitted 4 June, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions
Authors:
Luis Gerhorst,
Henriette Herzog,
Peter Wägemann,
Maximilian Ott,
Rüdiger Kapitza,
Timo Hönig
Abstract:
High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are static…
▽ More
High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are statically analyzed for memory- and type-safety, which imposes some restrictions but allows for good expressiveness and high performance. However, to mitigate the Spectre vulnerabilities disclosed in 2018, defenses which reject potentially-dangerous programs had to be deployed. We find that this affects 31% to 54% of programs in a dataset with 844 real-world BPF programs from popular open-source projects. To solve this, users are forced to disable the defenses to continue using the programs, which puts the entire system at risk.
To enable secure and expressive untrusted Linux kernel extensions, we propose VeriFence, an enhancement to the kernel's Spectre defenses that reduces the number of BPF application programs rejected from 54% to zero. We measure VeriFence's overhead for all mainstream performance-sensitive applications of BPF (i.e., event tracing, profiling, and packet processing) and find that it improves significantly upon the status-quo where affected BPF programs are either unusable or enable transient execution attacks on the kernel.
△ Less
Submitted 8 January, 2025; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Satellite Imagery and AI: A New Era in Ocean Conservation, from Research to Deployment and Impact (Version. 2.0)
Authors:
Patrick Beukema,
Favyen Bastani,
Yawen Zheng,
Piper Wolters,
Henry Herzog,
Joe Ferdinando
Abstract:
Illegal, unreported, and unregulated (IUU) fishing poses a global threat to ocean habitats. Publicly available satellite data offered by NASA, the European Space Agency (ESA), and the U.S. Geological Survey (USGS), provide an opportunity to actively monitor this activity. Effectively leveraging satellite data for maritime conservation requires highly reliable machine learning models operating glob…
▽ More
Illegal, unreported, and unregulated (IUU) fishing poses a global threat to ocean habitats. Publicly available satellite data offered by NASA, the European Space Agency (ESA), and the U.S. Geological Survey (USGS), provide an opportunity to actively monitor this activity. Effectively leveraging satellite data for maritime conservation requires highly reliable machine learning models operating globally with minimal latency. This paper introduces four specialized computer vision models designed for a variety of sensors including Sentinel-1 (synthetic aperture radar), Sentinel-2 (optical imagery), Landsat 8-9 (optical imagery), and Suomi-NPP/NOAA-20/NOAA-21 (nighttime lights). It also presents best practices for developing and deploying global-scale real-time satellite based computer vision. All of the models are open sourced under permissive licenses. These models have all been deployed in Skylight, a real-time maritime monitoring platform, which is provided at no cost to users worldwide.
△ Less
Submitted 29 May, 2025; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping
Authors:
Ben Lonnqvist,
Zhengqing Wu,
Michael H. Herzog
Abstract:
Humans are able to segment images effortlessly without supervision using perceptual grouping. Here, we propose a counter-intuitive computational approach to solving unsupervised perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects fr…
▽ More
Humans are able to segment images effortlessly without supervision using perceptual grouping. Here, we propose a counter-intuitive computational approach to solving unsupervised perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects from each other; (2) that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels; and (3) that segmenting objects using noise results in segmentation performance that aligns with the perceptual grouping phenomena observed in humans, and is sample-efficient. We introduce the Good Gestalt (GG) datasets -- six datasets designed to specifically test perceptual grouping, and show that our DNN models reproduce many important phenomena in human perception, such as illusory contours, closure, continuity, proximity, and occlusion. Finally, we (4) show that our model improves performance on our GG datasets compared to other tested unsupervised models by $24.9\%$. Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grouping, and a novel potential benefit of neural noise.
△ Less
Submitted 23 October, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Frequency-Based Vulnerability Analysis of Deep Learning Models against Image Corruptions
Authors:
Harshitha Machiraju,
Michael H. Herzog,
Pascal Frossard
Abstract:
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MU…
▽ More
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MUFIA (Multiplicative Filter Attack), an algorithm designed to identify the specific types of corruptions that can cause models to fail. Our algorithm identifies the combination of image frequency components that render a model susceptible to misclassification while preserving the semantic similarity to the original image. We find that even state-of-the-art models trained to be robust against known common corruptions struggle against the low visibility-based corruptions crafted by MUFIA. This highlights the need for more comprehensive approaches to enhance model robustness against a wider range of real-world image corruptions.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
A comment on Guo et al. [arXiv:2206.11228]
Authors:
Ben Lonnqvist,
Harshitha Machiraju,
Michael H. Herzog
Abstract:
In a recent article, Guo et al. [arXiv:2206.11228] report that adversarially trained neural representations in deep networks may already be as robust as corresponding primate IT neural representations. While we find the paper's primary experiment illuminating, we have doubts about the interpretation and phrasing of the results presented in the paper.
In a recent article, Guo et al. [arXiv:2206.11228] report that adversarially trained neural representations in deep networks may already be as robust as corresponding primate IT neural representations. While we find the paper's primary experiment illuminating, we have doubts about the interpretation and phrasing of the results presented in the paper.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Identifying public values and spatial conflicts in urban planning
Authors:
Rico H. Herzog,
Juliana E. Gonçalves,
Geertje Slingerland,
Reinout Kleinhans,
Holger Prang,
Frances Brazier,
Trivik Verma
Abstract:
Identifying the diverse and often competing values of citizens, and resolving the consequent public value conflicts, are of significant importance for inclusive and integrated urban development. Scholars have highlighted that relational, value-laden urban space gives rise to many diverse conflicts that vary both spatially and temporally. Although notions of public value conflicts have been conceiv…
▽ More
Identifying the diverse and often competing values of citizens, and resolving the consequent public value conflicts, are of significant importance for inclusive and integrated urban development. Scholars have highlighted that relational, value-laden urban space gives rise to many diverse conflicts that vary both spatially and temporally. Although notions of public value conflicts have been conceived in theory, there are very few empirical studies that identify such values and their conflicts in urban space. Building on public value theory and using a case-study mixed-methods approach, this paper proposes a new approach to empirically investigate public value conflicts in urban space. Using unstructured participatory data of 4,528 citizen contributions from a Public Participation Geographic Information Systems in Hamburg, Germany, natural language processing and spatial clustering techniques are used to identify areas of potential value conflicts. Four expert workshops assess and interpret these quantitative findings. Integrating both quantitative and qualitative results, 19 general public values and a total of 9 archetypical conflicts are identified. On the basis of these results, this paper proposes a new conceptual tool of Public Value Spheres that extends the theoretical notion of public-value conflicts and helps to further account for the value-laden nature of urban space.
△ Less
Submitted 18 July, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Empirical Advocacy of Bio-inspired Models for Robust Image Recognition
Authors:
Harshitha Machiraju,
Oh-Hyeon Choung,
Michael H. Herzog,
Pascal Frossard
Abstract:
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. There are continuous attempts to use features of the human visual system to improve the robustness of neural networks to data perturbations. We provi…
▽ More
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. There are continuous attempts to use features of the human visual system to improve the robustness of neural networks to data perturbations. We provide a detailed analysis of such bio-inspired models and their properties. To this end, we benchmark the robustness of several bio-inspired models against their most comparable baseline DCNN models. We find that bio-inspired models tend to be adversarially robust without requiring any special data augmentation. Additionally, we find that bio-inspired models beat adversarially trained models in the presence of more real-world common corruptions. Interestingly, we also find that bio-inspired models tend to use both low and mid-frequency information, in contrast to other DCNN models. We find that this mix of frequency information makes them robust to both adversarial perturbations and common corruptions.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Bio-inspired Robustness: A Review
Authors:
Harshitha Machiraju,
Oh-Hyeon Choung,
Pascal Frossard,
Michael. H Herzog
Abstract:
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassifi…
▽ More
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassification of that object. But for humans, the noise is often invisible. If vulnerability to adversarial noise cannot be fixed, DCNNs cannot be taken as serious models of human vision. Many studies have tried to add features of the human visual system to DCNNs to make them robust against adversarial attacks. However, it is not fully clear whether human vision inspired components increase robustness because performance evaluations of these novel components in DCNNs are often inconclusive. We propose a set of criteria for proper evaluation and analyze different models according to these criteria. We finally sketch future efforts to make DCCNs one step closer to the model of human vision.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.