-
The Patterns of Life Human Mobility Simulation
Authors:
Hossein Amiri,
Will Kohn,
Shiyang Ruan,
Joon-Seok Kim,
Hamdi Kavak,
Andrew Crooks,
Dieter Pfoser,
Carola Wenk,
Andreas Zufle
Abstract:
We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation…
▽ More
We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation. We further demonstrate how the Patterns of Life simulation can be used to simulate any region on Earth by using publicly available data from OpenStreetMap. Finally, we also demonstrate recent improvements to the scalability of the simulation allows simulating up to 100,000 individual agents for years of simulation time. During our demonstration, as well as offline using our guides on GitHub, participants will learn: (1) The theories of human behavior driving the Patters of Life simulation, (2) how to simulate to generate massive amounts of synthetic yet realistic trajectory data, (3) running the simulation for a region of interest chosen by participants using OSM data, (4) learn the scalability of the simulation and understand the properties of generated data, and (5) manage thousands of parallel simulation instances running concurrently.
△ Less
Submitted 11 October, 2024; v1 submitted 30 September, 2024;
originally announced October 2024.
-
Trajectory Anomaly Detection with Language Models
Authors:
Jonathan Mbuya,
Dieter Pfoser,
Antonios Anastasopoulos
Abstract:
This paper presents a novel approach for trajectory anomaly detection using an autoregressive causal-attention model, termed LM-TAD. This method leverages the similarities between language statements and trajectories, both of which consist of ordered elements requiring coherence through external rules and contextual variations. By treating trajectories as sequences of tokens, our model learns the…
▽ More
This paper presents a novel approach for trajectory anomaly detection using an autoregressive causal-attention model, termed LM-TAD. This method leverages the similarities between language statements and trajectories, both of which consist of ordered elements requiring coherence through external rules and contextual variations. By treating trajectories as sequences of tokens, our model learns the probability distributions over trajectories, enabling the identification of anomalous locations with high precision. We incorporate user-specific tokens to account for individual behavior patterns, enhancing anomaly detection tailored to user context. Our experiments demonstrate the effectiveness of LM-TAD on both synthetic and real-world datasets. In particular, the model outperforms existing methods on the Pattern of Life (PoL) dataset by detecting user-contextual anomalies and achieves competitive results on the Porto taxi dataset, highlighting its adaptability and robustness. Additionally, we introduce the use of perplexity and surprisal rate metrics for detecting outliers and pinpointing specific anomalous locations within trajectories. The LM-TAD framework supports various trajectory representations, including GPS coordinates, staypoints, and activity types, proving its versatility in handling diverse trajectory data. Moreover, our approach is well-suited for online trajectory anomaly detection, significantly reducing computational latency by caching key-value states of the attention mechanism, thereby avoiding repeated computations.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Extracting the U.S. building types from OpenStreetMap data
Authors:
Henrique F. de Arruda,
Sandro M. Reia,
Shiyang Ruan,
Kuldip S. Atwal,
Hamdi Kavak,
Taylor Anderson,
Dieter Pfoser
Abstract:
Building type information is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work creates a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We propose and utilize an unsuperv…
▽ More
Building type information is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work creates a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We propose and utilize an unsupervised machine learning method to classify building types based on building footprints and available OpenStreetMap information. The classification result is validated using authoritative ground truth data for select counties in the U.S. The validation shows a high precision for non-residential building classification and a high recall for residential buildings. We identified various approaches to improving the quality of the classification, such as removing sheds and garages from the dataset. Furthermore, analyzing the misclassifications revealed that they are mainly due to missing and scarce metadata in OSM. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Urban Mobility Assessment Using LLMs
Authors:
Prabin Bhandari,
Antonios Anastasopoulos,
Dieter Pfoser
Abstract:
Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This wo…
▽ More
Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.
△ Less
Submitted 22 August, 2024;
originally announced September 2024.
-
Are Large Language Models Geospatially Knowledgeable?
Authors:
Prabin Bhandari,
Antonios Anastasopoulos,
Dieter Pfoser
Abstract:
Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making. This paper investigates the extent of geospatial knowledge, awareness, and reasoning abilities encoded within such pretrained LLMs. With a focus on autoreg…
▽ More
Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making. This paper investigates the extent of geospatial knowledge, awareness, and reasoning abilities encoded within such pretrained LLMs. With a focus on autoregressive language models, we devise experimental approaches related to (i) probing LLMs for geo-coordinates to assess geospatial knowledge, (ii) using geospatial and non-geospatial prepositions to gauge their geospatial awareness, and (iii) utilizing a multidimensional scaling (MDS) experiment to assess the models' geospatial reasoning capabilities and to determine locations of cities based on prompting. Our results confirm that it does not only take larger, but also more sophisticated LLMs to synthesize geospatial knowledge from textual information. As such, this research contributes to understanding the potential and limitations of LLMs in dealing with geospatial information.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Disentangled Dynamic Graph Deep Generation
Authors:
Wenbin Zhang,
Liming Zhang,
Dieter Pfoser,
Liang Zhao
Abstract:
Deep generative models for graphs have exhibited promising performance in ever-increasing domains such as design of molecules (i.e, graph of atoms) and structure prediction of proteins (i.e., graph of amino acids). Existing work typically focuses on static rather than dynamic graphs, which are actually very important in the applications such as protein folding, molecule reactions, and human mobili…
▽ More
Deep generative models for graphs have exhibited promising performance in ever-increasing domains such as design of molecules (i.e, graph of atoms) and structure prediction of proteins (i.e., graph of amino acids). Existing work typically focuses on static rather than dynamic graphs, which are actually very important in the applications such as protein folding, molecule reactions, and human mobility. Extending existing deep generative models from static to dynamic graphs is a challenging task, which requires to handle the factorization of static and dynamic characteristics as well as mutual interactions among node and edge patterns. Here, this paper proposes a novel framework of factorized deep generative models to achieve interpretable dynamic graph generation. Various generative models are proposed to characterize conditional independence among node, edge, static, and dynamic factors. Then, variational optimization strategies as well as dynamic graph decoders are proposed based on newly designed factorized variational autoencoders and recurrent graph deconvolutions. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed models.
△ Less
Submitted 19 January, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Factorized Deep Generative Models for Trajectory Generation with Spatiotemporal-Validity Constraints
Authors:
Liming Zhang,
Liang Zhao,
Dieter Pfoser
Abstract:
Trajectory data generation is an important domain that characterizes the generative process of mobility data. Traditional methods heavily rely on predefined heuristics and distributions and are weak in learning unknown mechanisms. Inspired by the success of deep generative neural networks for images and texts, a fast-developing research topic is deep generative models for trajectory data which can…
▽ More
Trajectory data generation is an important domain that characterizes the generative process of mobility data. Traditional methods heavily rely on predefined heuristics and distributions and are weak in learning unknown mechanisms. Inspired by the success of deep generative neural networks for images and texts, a fast-developing research topic is deep generative models for trajectory data which can learn expressively explanatory models for sophisticated latent patterns. This is a nascent yet promising domain for many applications. We first propose novel deep generative models factorizing time-variant and time-invariant latent variables that characterize global and local semantics, respectively. We then develop new inference strategies based on variational inference and constrained optimization to encapsulate the spatiotemporal validity. New deep neural network architectures have been developed to implement the inference and generation models with newly-generalized latent variable priors. The proposed methods achieved significant improvements in quantitative and qualitative evaluations in extensive experiments.
△ Less
Submitted 19 September, 2020;
originally announced September 2020.
-
TG-GAN: Continuous-time Temporal Graph Generation with Deep Generative Models
Authors:
Liming Zhang,
Liang Zhao,
Shan Qin,
Dieter Pfoser
Abstract:
The recent deep generative models for static graphs that are now being actively developed have achieved significant success in areas such as molecule design. However, many real-world problems involve temporal graphs whose topology and attribute values evolve dynamically over time, including important applications such as protein folding, human mobility networks, and social network growth. As yet,…
▽ More
The recent deep generative models for static graphs that are now being actively developed have achieved significant success in areas such as molecule design. However, many real-world problems involve temporal graphs whose topology and attribute values evolve dynamically over time, including important applications such as protein folding, human mobility networks, and social network growth. As yet, deep generative models for temporal graphs are not yet well understood and existing techniques for static graphs are not adequate for temporal graphs since they cannot 1) encode and decode continuously-varying graph topology chronologically, 2) enforce validity via temporal constraints, or 3) ensure efficiency for information-lossless temporal resolution. To address these challenges, we propose a new model, called ``Temporal Graph Generative Adversarial Network'' (TG-GAN) for continuous-time temporal graph generation, by modeling the deep generative process for truncated temporal random walks and their compositions. Specifically, we first propose a novel temporal graph generator that jointly model truncated edge sequences, time budgets, and node attributes, with novel activation functions that enforce temporal validity constraints under recurrent architecture. In addition, a new temporal graph discriminator is proposed, which combines time and node encoding operations over a recurrent architecture to distinguish the generated sequences from the real ones sampled by a newly-developed truncated temporal random walk sampler. Extensive experiments on both synthetic and real-world datasets demonstrate TG-GAN significantly outperforms the comparison methods in efficiency and effectiveness.
△ Less
Submitted 9 June, 2020; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Station-to-User Transfer Learning: Towards Explainable User Clustering Through Latent Trip Signatures Using Tidal-Regularized Non-Negative Matrix Factorization
Authors:
Liming Zhang,
Andreas Züfle,
Dieter Pfoser
Abstract:
Urban areas provide us with a treasure trove of available data capturing almost every aspect of a population's life. This work focuses on mobility data and how it will help improve our understanding of urban mobility patterns. Readily available and sizable farecard data captures trips in a public transportation network. However, such data typically lacks temporal modalities and as such the task of…
▽ More
Urban areas provide us with a treasure trove of available data capturing almost every aspect of a population's life. This work focuses on mobility data and how it will help improve our understanding of urban mobility patterns. Readily available and sizable farecard data captures trips in a public transportation network. However, such data typically lacks temporal modalities and as such the task of inferring trip semantic, station function, and user profile is quite challenging. As existing approaches either focus on station-level or user-level signals, they are prone to overfitting and generate less credible and insightful results. To properly learn such characteristics from trip data, we propose a Collective Learning Framework through Latent Representation, which augments user-level learning with collective patterns learned from station-level signals. This framework uses a novel, so-called Tidal-Regularized Non-negative Matrix Factorization method, which incorporates domain knowledge in the form of temporal passenger flow patterns in generic Non-negative Matrix Factorization. To evaluate our model performance, a user stability test based on the classical Rand Index is introduced as a metric to benchmark different unsupervised learning models. We provide a qualitative analysis of the station functions and user profiles for the Washington D.C. metro and show how our method supports spatiotemporal intra-city mobility exploration.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
ReHub. Extending Hub Labels for Reverse k-Nearest Neighbor Queries on Large-Scale networks
Authors:
Alexandros Efentakis,
Dieter Pfoser
Abstract:
Quite recently, the algorithmic community has focused on solving multiple shortest-path query problems beyond simple vertex-to-vertex queries, especially in the context of road networks. Unfortunately, this research cannot be generalized for large-scale graphs, e.g., social or collaboration networks, or to efficiently answer Reverse k-Nearest Neighbor (RkNN) queries, which are of practical relevan…
▽ More
Quite recently, the algorithmic community has focused on solving multiple shortest-path query problems beyond simple vertex-to-vertex queries, especially in the context of road networks. Unfortunately, this research cannot be generalized for large-scale graphs, e.g., social or collaboration networks, or to efficiently answer Reverse k-Nearest Neighbor (RkNN) queries, which are of practical relevance to a wide range of applications. To remedy this, we propose ReHub, a novel main-memory algorithm that extends the Hub Labeling technique to efficiently answer RkNN queries on large-scale networks. Our experimentation will show that ReHub is the best overall solution for this type of queries, requiring only minimal preprocessing and providing very fast query times.
△ Less
Submitted 10 July, 2015; v1 submitted 7 April, 2015;
originally announced April 2015.
-
SALT. A unified framework for all shortest-path query variants on road networks
Authors:
Alexandros Efentakis,
Dieter Pfoser,
Yannis Vassiliou
Abstract:
Although recent scientific output focuses on multiple shortest-path problem definitions for road networks, none of the existing solutions does efficiently answer all different types of SP queries. This work proposes SALT, a novel framework that not only efficiently answers SP related queries but also k-nearest neighbor queries not handled by previous approaches. Our solution offers all the benefit…
▽ More
Although recent scientific output focuses on multiple shortest-path problem definitions for road networks, none of the existing solutions does efficiently answer all different types of SP queries. This work proposes SALT, a novel framework that not only efficiently answers SP related queries but also k-nearest neighbor queries not handled by previous approaches. Our solution offers all the benefits needed for practical use-cases, including excellent query performance and very short preprocessing times, thus making it also a viable option for dynamic road networks, i.e., edge weights changing frequently due to traffic updates. The proposed SALT framework is a deployable software solution capturing a range of network-related query problems under one "algorithmic hood".
△ Less
Submitted 2 November, 2014;
originally announced November 2014.
-
Towards Knowledge-Enriched Path Computation
Authors:
Georgios Skoumas,
Klaus Arthur Schmid,
Gregor Jossé,
Andreas Züfle,
Mario A. Nascimento,
Matthias Renz,
Dieter Pfoser
Abstract:
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such…
▽ More
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such as "nearby" or "next to" from travel blogs, that define closeness between pairs of points of interest (PoIs) and quantify each of these relations using a probabilistic model. Subsequently, we create a relationship graph where each node corresponds to a PoI and each edge describes the spatial connection between the respective PoIs. Using Bayesian inference we obtain a probabilistic measure of spatial closeness according to the crowd. Applying this measure to the corresponding road network, we obtain an altered cost function which does not exclusively rely on distance, and enriches an actual road networks taking crowdsourced spatial relations into account. Finally, we propose two routing algorithms on the enriched road networks. To evaluate our approach, we use Flickr photo data as a ground truth for popularity. Our experimental results -- based on real world datasets -- show that the paths computed w.r.t.\ our alternative cost function yield competitive solutions in terms of path length while also providing more "popular" paths, making routing easier and more informative for the user.
△ Less
Submitted 9 September, 2014;
originally announced September 2014.
-
Location Estimation Using Crowdsourced Geospatial Narratives
Authors:
Georgios Skoumas,
Dieter Pfoser,
Anastasios Kyrillidis
Abstract:
The "crowd" has become a very important geospatial data provider. Subsumed under the term Volunteered Geographic Information (VGI), non-expert users have been providing a wealth of quantitative geospatial data online. With spatial reasoning being a basic form of human cognition, narratives expressing geospatial experiences, e.g., travel blogs, would provide an even bigger source of geospatial data…
▽ More
The "crowd" has become a very important geospatial data provider. Subsumed under the term Volunteered Geographic Information (VGI), non-expert users have been providing a wealth of quantitative geospatial data online. With spatial reasoning being a basic form of human cognition, narratives expressing geospatial experiences, e.g., travel blogs, would provide an even bigger source of geospatial data. Textual narratives typically contain qualitative data in the form of objects and spatial relationships. The scope of this work is (i) to extract these relationships from user-generated texts, (ii) to quantify them and (iii) to reason about object locations based only on this qualitative data. We use information extraction methods to identify toponyms and spatial relationships and to formulate a quantitative approach based on distance and orientation features to represent the latter. Positional probability distributions for spatial relationships are determined by means of a greedy Expectation Maximization-based (EM) algorithm. These estimates are then used to "triangulate" the positions of unknown object locations. Experiments using a text corpus harvested from travel blog sites establish the considerable location estimation accuracy of the proposed approach.
△ Less
Submitted 25 August, 2014;
originally announced August 2014.
-
A Comparison and Evaluation of Map Construction Algorithms
Authors:
Mahmuda Ahmed,
Sophia Karagiorgou,
Dieter Pfoser,
Carola Wenk
Abstract:
Map construction methods automatically produce and/or update road network datasets using vehicle tracking data. Enabled by the ubiquitous generation of georeferenced tracking data, there has been a recent surge in map construction algorithms coming from different computer science domains. A cross-comparison of the various algorithms is still very rare, since (i) algorithms and constructed maps are…
▽ More
Map construction methods automatically produce and/or update road network datasets using vehicle tracking data. Enabled by the ubiquitous generation of georeferenced tracking data, there has been a recent surge in map construction algorithms coming from different computer science domains. A cross-comparison of the various algorithms is still very rare, since (i) algorithms and constructed maps are generally not publicly available and (ii) there is no standard approach to assess the result quality, given the lack of benchmark data and quantitative evaluation methods. This work represents a first comprehensive attempt to benchmark map construction algorithms. We provide an evaluation and comparison of seven algorithms using four datasets and four different evaluation measures. In addition to this comprehensive comparison, we make our datasets, source code of map construction algorithms and evaluation measures publicly available on mapconstruction.org. This site has been established as a repository for map con- struction data and algorithms and we invite other researchers to contribute by uploading code and benchmark data supporting their contributions to map construction algorithms.
△ Less
Submitted 12 June, 2014; v1 submitted 19 February, 2014;
originally announced February 2014.
-
On Quantifying Qualitative Geospatial Data: A Probabilistic Approach
Authors:
Georgios Skoumas,
Dieter Pfoser,
Anastasios Kyrillidis
Abstract:
Living in the era of data deluge, we have witnessed a web content explosion, largely due to the massive availability of User-Generated Content (UGC). In this work, we specifically consider the problem of geospatial information extraction and representation, where one can exploit diverse sources of information (such as image and audio data, text data, etc), going beyond traditional volunteered geog…
▽ More
Living in the era of data deluge, we have witnessed a web content explosion, largely due to the massive availability of User-Generated Content (UGC). In this work, we specifically consider the problem of geospatial information extraction and representation, where one can exploit diverse sources of information (such as image and audio data, text data, etc), going beyond traditional volunteered geographic information. Our ambition is to include available narrative information in an effort to better explain geospatial relationships: with spatial reasoning being a basic form of human cognition, narratives expressing such experiences typically contain qualitative spatial data, i.e., spatial objects and spatial relationships.
To this end, we formulate a quantitative approach for the representation of qualitative spatial relations extracted from UGC in the form of texts. The proposed method quantifies such relations based on multiple text observations. Such observations provide distance and orientation features which are utilized by a greedy Expectation Maximization-based (EM) algorithm to infer a probability distribution over predefined spatial relationships; the latter represent the quantified relationships under user-defined probabilistic assumptions. We evaluate the applicability and quality of the proposed approach using real UGC data originating from an actual travel blog text corpus. To verify the quality of the result, we generate grid-based maps visualizing the spatial extent of the various relations.
△ Less
Submitted 18 November, 2013;
originally announced November 2013.