-
Construction of Urban Greenland Resources Collaborative Management Platform
Authors:
Dongyang Lyu,
Xiaoqi Li,
Zongwei Li
Abstract:
Nowadays, environmental protection has become a global consensus. At the same time, with the rapid development of science and technology, urbanisation has become a phenomenon that has become the norm. Therefore, the urban greening management system is an essential component in protecting the urban environment. The system utilises a transparent management process known as" monitoring - early warnin…
▽ More
Nowadays, environmental protection has become a global consensus. At the same time, with the rapid development of science and technology, urbanisation has become a phenomenon that has become the norm. Therefore, the urban greening management system is an essential component in protecting the urban environment. The system utilises a transparent management process known as" monitoring - early warning - response - optimisation," which enhances the tracking of greening resources, streamlines maintenance scheduling, and encourages employee involvement in planning. Designed with a microservice architecture, the system can improve the utilisation of greening resources by 30%, increase citizen satisfaction by 20%, and support carbon neutrality objectives, ultimately making urban governance more intelligent and focused on the community. The Happy City Greening Management System effectively manages gardeners, trees, flowers, and green spaces. It comprises modules for gardener management, purchase and supplier management, tree and flower management, and maintenance planning. Its automation feature allows for real-time updates of greening data, thereby enhancing decision-making. The system is built using Java for the backend and MySQL for data storage, complemented by a user-friendly frontend designed with the Vue framework. Additionally, it leverages features from the Spring Boot framework to enhance maintainability and scalability.
△ Less
Submitted 4 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
Authors:
N. Benjamin Erichson,
Vinicius Mikuni,
Dongwei Lyu,
Yang Gao,
Omri Azencot,
Soon Hoe Lim,
Michael W. Mahoney
Abstract:
We introduce FLEX (FLow EXpert), a backbone architecture for generative modeling of spatio-temporal physical systems using diffusion models. FLEX operates in the residual space rather than on raw data, a modeling choice that we motivate theoretically, showing that it reduces the variance of the velocity field in the diffusion model, which helps stabilize training. FLEX integrates a latent Transfor…
▽ More
We introduce FLEX (FLow EXpert), a backbone architecture for generative modeling of spatio-temporal physical systems using diffusion models. FLEX operates in the residual space rather than on raw data, a modeling choice that we motivate theoretically, showing that it reduces the variance of the velocity field in the diffusion model, which helps stabilize training. FLEX integrates a latent Transformer into a U-Net with standard convolutional ResNet layers and incorporates a redesigned skip connection scheme. This hybrid design enables the model to capture both local spatial detail and long-range dependencies in latent space. To improve spatio-temporal conditioning, FLEX uses a task-specific encoder that processes auxiliary inputs such as coarse or past snapshots. Weak conditioning is applied to the shared encoder via skip connections to promote generalization, while strong conditioning is applied to the decoder through both skip and bottleneck features to ensure reconstruction fidelity. FLEX achieves accurate predictions for super-resolution and forecasting tasks using as few as two reverse diffusion steps. It also produces calibrated uncertainty estimates through sampling. Evaluations on high-resolution 2D turbulence data show that FLEX outperforms strong baselines and generalizes to out-of-distribution settings, including unseen Reynolds numbers, physical observables (e.g., fluid flow velocity fields), and boundary conditions.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation
Authors:
Dongxin Lyu,
Han Huang,
Cheng Tan,
Zimu Li
Abstract:
Monocular 3D lane detection is challenging due to the difficulty in capturing depth information from single-camera images. A common strategy involves transforming front-view (FV) images into bird's-eye-view (BEV) space through inverse perspective mapping (IPM), facilitating lane detection using BEV features. However, IPM's flat-ground assumption and loss of contextual information lead to inaccurac…
▽ More
Monocular 3D lane detection is challenging due to the difficulty in capturing depth information from single-camera images. A common strategy involves transforming front-view (FV) images into bird's-eye-view (BEV) space through inverse perspective mapping (IPM), facilitating lane detection using BEV features. However, IPM's flat-ground assumption and loss of contextual information lead to inaccuracies in reconstructing 3D information, especially height. In this paper, we introduce a BEV-based framework to address these limitations and improve 3D lane detection accuracy. Our approach incorporates a Hierarchical Depth-Aware Head that provides multi-scale depth features, mitigating the flat-ground assumption by enhancing spatial awareness across varying depths. Additionally, we leverage Depth Prior Distillation to transfer semantic depth knowledge from a teacher model, capturing richer structural and contextual information for complex lane structures. To further refine lane continuity and ensure smooth lane reconstruction, we introduce a Conditional Random Field module that enforces spatial coherence in lane predictions. Extensive experiments validate that our method achieves state-of-the-art performance in terms of z-axis error and outperforms other methods in the field in overall performance. The code is released at: https://anonymous.4open.science/r/Depth3DLane-DCDD.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
ABCDWaveNet: Advancing Robust Road Ponding Detection in Fog through Dynamic Frequency-Spatial Synergy
Authors:
Ronghui Zhang,
Dakang Lyu,
Tengfei Li,
Yunfan Wu,
Ujjal Manandhar,
Benfei Wang,
Junzhou Chen,
Bolin Gao,
Danwei Wang,
Yiqiu Tan
Abstract:
Road ponding presents a significant threat to vehicle safety, particularly in adverse fog conditions, where reliable detection remains a persistent challenge for Advanced Driver Assistance Systems (ADAS). To address this, we propose ABCDWaveNet, a novel deep learning framework leveraging Dynamic Frequency-Spatial Synergy for robust ponding detection in fog. The core of ABCDWaveNet achieves this sy…
▽ More
Road ponding presents a significant threat to vehicle safety, particularly in adverse fog conditions, where reliable detection remains a persistent challenge for Advanced Driver Assistance Systems (ADAS). To address this, we propose ABCDWaveNet, a novel deep learning framework leveraging Dynamic Frequency-Spatial Synergy for robust ponding detection in fog. The core of ABCDWaveNet achieves this synergy by integrating dynamic convolution for adaptive feature extraction across varying visibilities with a wavelet-based module for synergistic frequency-spatial feature enhancement, significantly improving robustness against fog interference. Building on this foundation, ABCDWaveNet captures multi-scale structural and contextual information, subsequently employing an Adaptive Attention Coupling Gate (AACG) to adaptively fuse global and local features for enhanced accuracy. To facilitate realistic evaluations under combined adverse conditions, we introduce the Foggy Low-Light Puddle dataset. Extensive experiments demonstrate that ABCDWaveNet establishes new state-of-the-art performance, achieving significant Intersection over Union (IoU) gains of 3.51%, 1.75%, and 1.03% on the Foggy-Puddle, Puddle-1000, and our Foggy Low-Light Puddle datasets, respectively. Furthermore, its processing speed of 25.48 FPS on an NVIDIA Jetson AGX Orin confirms its suitability for ADAS deployment. These findings underscore the effectiveness of the proposed Dynamic Frequency-Spatial Synergy within ABCDWaveNet, offering valuable insights for developing proactive road safety solutions capable of operating reliably in challenging weather conditions.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Potentials and Limitations of Large-scale, Individual-level Mobile Location Data for Food Acquisition Analysis
Authors:
Duanya Lyu,
Luyu Liu,
Catherine Campbell,
Yuxuan Zhang,
Xiang Yan
Abstract:
Understanding food acquisition is crucial for developing strategies to combat food insecurity, a major public health concern. The emergence of large-scale mobile location data (typically exemplified by GPS data), which captures people's movement over time at high spatiotemporal resolutions, offer a new approach to study this topic. This paper evaluates the potential and limitations of large-scale…
▽ More
Understanding food acquisition is crucial for developing strategies to combat food insecurity, a major public health concern. The emergence of large-scale mobile location data (typically exemplified by GPS data), which captures people's movement over time at high spatiotemporal resolutions, offer a new approach to study this topic. This paper evaluates the potential and limitations of large-scale GPS data for food acquisition analysis through a case study. Using a high-resolution dataset of 286 million GPS records from individuals in Jacksonville, Florida, we conduct a case study to assess the strengths of GPS data in capturing spatiotemporal patterns of food outlet visits while also discussing key limitations, such as potential data biases and algorithmic uncertainties. Our findings confirm that GPS data can generate valuable insights about food acquisition behavior but may significantly underestimate visitation frequency to food outlets. Robustness checks highlight how algorithmic choices-especially regarding food outlet classification and visit identification-can influence research results. Our research underscores the value of GPS data in place-based health studies while emphasizing the need for careful consideration of data coverage, representativeness, algorithmic choices, and the broader implications of study findings.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
UPCMR: A Universal Prompt-guided Model for Random Sampling Cardiac MRI Reconstruction
Authors:
Donghang Lyu,
Chinmay Rao,
Marius Staring,
Matthias J. P. van Osch,
Mariya Doneva,
Hildo J. Lamb,
Nicola Pezzotti
Abstract:
Cardiac magnetic resonance imaging (CMR) is vital for diagnosing heart diseases, but long scan time remains a major drawback. To address this, accelerated imaging techniques have been introduced by undersampling k-space, which reduces the quality of the resulting images. Recent deep learning advancements aim to speed up scanning while preserving quality, but adapting to various sampling modes and…
▽ More
Cardiac magnetic resonance imaging (CMR) is vital for diagnosing heart diseases, but long scan time remains a major drawback. To address this, accelerated imaging techniques have been introduced by undersampling k-space, which reduces the quality of the resulting images. Recent deep learning advancements aim to speed up scanning while preserving quality, but adapting to various sampling modes and undersampling factors remains challenging. Therefore, building a universal model is a promising direction. In this work, we introduce UPCMR, a universal unrolled model designed for CMR reconstruction. This model incorporates two kinds of learnable prompts, undersampling-specific prompt and spatial-specific prompt, and integrates them with a UNet structure in each block. Overall, by using the CMRxRecon2024 challenge dataset for training and validation, the UPCMR model highly enhances reconstructed image quality across all random sampling scenarios through an effective training strategy compared to some traditional methods, demonstrating strong adaptability potential for this task.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics
Authors:
Wenrui Xu,
Dalin Lyu,
Weihang Wang,
Jie Feng,
Chen Gao,
Yong Li
Abstract:
The Theory of Multiple Intelligences underscores the hierarchical nature of cognitive capabilities. To advance Spatial Artificial Intelligence, we pioneer a psychometric framework defining five Basic Spatial Abilities (BSAs) in Visual Language Models (VLMs): Spatial Perception, Spatial Relation, Spatial Orientation, Mental Rotation, and Spatial Visualization. Benchmarking 13 mainstream VLMs throug…
▽ More
The Theory of Multiple Intelligences underscores the hierarchical nature of cognitive capabilities. To advance Spatial Artificial Intelligence, we pioneer a psychometric framework defining five Basic Spatial Abilities (BSAs) in Visual Language Models (VLMs): Spatial Perception, Spatial Relation, Spatial Orientation, Mental Rotation, and Spatial Visualization. Benchmarking 13 mainstream VLMs through nine validated psychometric experiments reveals significant gaps versus humans (average score 24.95 vs. 68.38), with three key findings: 1) VLMs mirror human hierarchies (strongest in 2D orientation, weakest in 3D rotation) with independent BSAs (Pearson's r<0.4); 2) Smaller models such as Qwen2-VL-7B surpass larger counterparts, with Qwen leading (30.82) and InternVL2 lagging (19.6); 3) Interventions like chain-of-thought (0.100 accuracy gain) and 5-shot training (0.259 improvement) show limits from architectural constraints. Identified barriers include weak geometry encoding and missing dynamic simulation. By linking psychometric BSAs to VLM capabilities, we provide a diagnostic toolkit for spatial intelligence evaluation, methodological foundations for embodied AI development, and a cognitive science-informed roadmap for achieving human-like spatial intelligence.
△ Less
Submitted 20 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Efficient MedSAMs: Segment Anything in Medical Images on Laptop
Authors:
Jun Ma,
Feifei Li,
Sumin Kim,
Reza Asakereh,
Bao-Hiep Le,
Dang-Khoa Nguyen-Vu,
Alexander Pfefferle,
Muxin Wei,
Ruochen Gao,
Donghang Lyu,
Songxiao Yang,
Lennart Purucker,
Zdravko Marinov,
Marius Staring,
Haisheng Lu,
Thuy Thanh Dao,
Xincheng Ye,
Zhi Li,
Gianluca Brugnara,
Philipp Vollmuth,
Martha Foltyn-Dumitru,
Jaeyoung Cho,
Mustafa Ahmed Mahmutoglu,
Martin Bendszus,
Irada Pflüger
, et al. (57 additional authors not shown)
Abstract:
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa…
▽ More
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
MCP-MedSAM: A Powerful Lightweight Medical Segment Anything Model Trained with a Single GPU in Just One Day
Authors:
Donghang Lyu,
Ruochen Gao,
Marius Staring
Abstract:
Medical image segmentation involves partitioning medical images into meaningful regions, with a focus on identifying anatomical structures and lesions. It has broad applications in healthcare, and deep learning methods have enabled significant advancements in automating this process. Recently, the introduction of the Segmentation Anything Model (SAM), the first foundation model for segmentation ta…
▽ More
Medical image segmentation involves partitioning medical images into meaningful regions, with a focus on identifying anatomical structures and lesions. It has broad applications in healthcare, and deep learning methods have enabled significant advancements in automating this process. Recently, the introduction of the Segmentation Anything Model (SAM), the first foundation model for segmentation task, has prompted researchers to adapt it for the medical domain to improve performance across various tasks. However, SAM's large model size and high GPU requirements hinder its scalability and development in the medical domain. In this work, we propose MCP-MedSAM, a powerful and lightweight medical SAM model designed to be trainable on a single A100 GPU with 40GB of memory within one day while delivering superior segmentation performance. Recognizing the significant internal differences between modalities and the need for direct segmentation target information within bounding boxes, we introduce two kinds of prompts: the modality prompt and the content prompt. After passing through the prompt encoder, their embedding representations can further improve the segmentation performance by incorporating more relevant information without adding significant training overhead. Additionally, we adopt an effective modality-based data sampling strategy to address data imbalance between modalities, ensuring more balanced performance across all modalities. Our method was trained and evaluated using a large-scale challenge dataset, compared to top-ranking methods on the challenge leaderboard, MCP-MedSAM achieved superior performance while requiring only one day of training on a single GPU. The code is publicly available at \textcolor{blue}{https://github.com/dong845/MCP-MedSAM}.}
△ Less
Submitted 14 May, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials
Authors:
Matthias Holzenkamp,
Dongyu Lyu,
Ulrich Kleinekathöfer,
Peter Zaspel
Abstract:
Uncertainty estimations for machine learning interatomic potentials (MLIPs) are crucial for quantifying model error and identifying informative training samples in active learning strategies. In this study, we evaluate uncertainty estimations of Gaussian process regression (GPR)-based MLIPs, including the predictive GPR standard deviation and ensemble-based uncertainties. We do this in terms of ca…
▽ More
Uncertainty estimations for machine learning interatomic potentials (MLIPs) are crucial for quantifying model error and identifying informative training samples in active learning strategies. In this study, we evaluate uncertainty estimations of Gaussian process regression (GPR)-based MLIPs, including the predictive GPR standard deviation and ensemble-based uncertainties. We do this in terms of calibration and in terms of impact on model performance in an active learning scheme. We consider GPR models with Coulomb and Smooth Overlap of Atomic Positions (SOAP) representations as inputs to predict potential energy surfaces and excitation energies of molecules. Regarding calibration, we find that ensemble-based uncertainty estimations show already poor global calibration (e.g., averaged over the whole test set). In contrast, the GPR standard deviation shows good global calibration, but when grouping predictions by their uncertainty, we observe a systematical bias for predictions with high uncertainty. Although an increasing uncertainty correlates with an increasing bias, the bias is not captured quantitatively by the uncertainty. Therefore, the GPR standard deviation can be useful to identify predictions with a high bias and error but, without further knowledge, should not be interpreted as a quantitative measure for a potential error range. Selecting the samples with the highest GPR standard deviation from a fixed configuration space leads to a model that overemphasizes the borders of the configuration space represented in the fixed dataset. This may result in worse performance in more densely sampled areas but better generalization for extrapolation tasks.
△ Less
Submitted 9 January, 2025; v1 submitted 27 October, 2024;
originally announced October 2024.
-
AGSENet: A Robust Road Ponding Detection Method for Proactive Traffic Safety
Authors:
Ronghui Zhang,
Shangyu Yang,
Dakang Lyu,
Zihan Wang,
Junzhou Chen,
Yilong Ren,
Bolin Gao,
Zhihan Lv
Abstract:
Road ponding, a prevalent traffic hazard, poses a serious threat to road safety by causing vehicles to lose control and leading to accidents ranging from minor fender benders to severe collisions. Existing technologies struggle to accurately identify road ponding due to complex road textures and variable ponding coloration influenced by reflection characteristics. To address this challenge, we pro…
▽ More
Road ponding, a prevalent traffic hazard, poses a serious threat to road safety by causing vehicles to lose control and leading to accidents ranging from minor fender benders to severe collisions. Existing technologies struggle to accurately identify road ponding due to complex road textures and variable ponding coloration influenced by reflection characteristics. To address this challenge, we propose a novel approach called Self-Attention-based Global Saliency-Enhanced Network (AGSENet) for proactive road ponding detection and traffic safety improvement. AGSENet incorporates saliency detection techniques through the Channel Saliency Information Focus (CSIF) and Spatial Saliency Information Enhancement (SSIE) modules. The CSIF module, integrated into the encoder, employs self-attention to highlight similar features by fusing spatial and channel information. The SSIE module, embedded in the decoder, refines edge features and reduces noise by leveraging correlations across different feature levels. To ensure accurate and reliable evaluation, we corrected significant mislabeling and missing annotations in the Puddle-1000 dataset. Additionally, we constructed the Foggy-Puddle and Night-Puddle datasets for road ponding detection in low-light and foggy conditions, respectively. Experimental results demonstrate that AGSENet outperforms existing methods, achieving IoU improvements of 2.03\%, 0.62\%, and 1.06\% on the Puddle-1000, Foggy-Puddle, and Night-Puddle datasets, respectively, setting a new state-of-the-art in this field. Finally, we verified the algorithm's reliability on edge computing devices. This work provides a valuable reference for proactive warning research in road traffic safety.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Tuning Frequency Bias of State Space Models
Authors:
Annan Yu,
Dongwei Lyu,
Soon Hoe Lim,
Michael W. Mahoney,
N. Benjamin Erichson
Abstract:
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model t…
▽ More
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
InsightPulse: An IoT-based System for User Experience Interview Analysis
Authors:
Dian Lyu,
Yuetong Lu,
Jassie He,
Murad Mehrab Abrar,
Ruijun Xie,
John Raiti
Abstract:
Conducting efficient and effective user experience (UX) interviews often poses challenges, such as maintaining focus on key topics and managing the duration of interviews and post-interview analyses. To address these issues, this paper introduces InsightPulse, an Internet of Things (IoT)-based hardware and software system designed to streamline and enhance the UX interview process through speech a…
▽ More
Conducting efficient and effective user experience (UX) interviews often poses challenges, such as maintaining focus on key topics and managing the duration of interviews and post-interview analyses. To address these issues, this paper introduces InsightPulse, an Internet of Things (IoT)-based hardware and software system designed to streamline and enhance the UX interview process through speech analysis and Artificial Intelligence. InsightPulse provides real-time support during user interviews by automatically identifying and highlighting key discussion points, proactively suggesting follow-up questions, and generating thematic summaries. These features enable more insightful discoveries and help to manage interview duration effectively. Additionally, the system features a robust backend analytics dashboard that simplifies the post-interview review process, thus facilitating the quick extraction of actionable insights and enhancing overall UX research efficiency.
△ Less
Submitted 23 September, 2024;
originally announced October 2024.
-
Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets
Authors:
Ruochen Gao,
Donghang Lyu,
Marius Staring
Abstract:
Medical imaging is essential for the diagnosis and treatment of diseases, with medical image segmentation as a subtask receiving high attention. However, automatic medical image segmentation models are typically task-specific and struggle to handle multiple scenarios, such as different imaging modalities and regions of interest. With the introduction of the Segment Anything Model (SAM), training a…
▽ More
Medical imaging is essential for the diagnosis and treatment of diseases, with medical image segmentation as a subtask receiving high attention. However, automatic medical image segmentation models are typically task-specific and struggle to handle multiple scenarios, such as different imaging modalities and regions of interest. With the introduction of the Segment Anything Model (SAM), training a universal model for various clinical scenarios has become feasible. Recently, several Medical SAM (MedSAM) methods have been proposed, but these models often rely on heavy image encoders to achieve high performance, which may not be practical for real-world applications due to their high computational demands and slow inference speed. To address this issue, a lightweight version of the MedSAM (LiteMedSAM) can provide a viable solution, achieving high performance while requiring fewer resources and less time. In this work, we introduce Swin-LiteMedSAM, a new variant of LiteMedSAM. This model integrates the tiny Swin Transformer as the image encoder, incorporates multiple types of prompts, including box-based points and scribble generated from a given bounding box, and establishes skip connections between the image encoder and the mask decoder. In the \textit{Segment Anything in Medical Images on Laptop} challenge (CVPR 2024), our approach strikes a good balance between segmentation performance and speed, demonstrating significantly improved overall results across multiple modalities compared to the LiteMedSAM baseline provided by the challenge organizers. Our proposed model achieved a DSC score of \textbf{0.8678} and an NSD score of \textbf{0.8844} on the validation set. On the final test set, it attained a DSC score of \textbf{0.8193} and an NSD score of \textbf{0.8461}, securing fourth place in the challenge.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks
Authors:
Zhengjia Xu,
Dingyang Lyu,
Jinghui Zhang
Abstract:
As graphs grow larger, full-batch GNN training becomes hard for single GPU memory. Therefore, to enhance the scalability of GNN training, some studies have proposed sampling-based mini-batch training and distributed graph learning. However, these methods still have drawbacks, such as performance degradation and heavy communication. This paper introduces SliceGCN, a feature-sliced distributed large…
▽ More
As graphs grow larger, full-batch GNN training becomes hard for single GPU memory. Therefore, to enhance the scalability of GNN training, some studies have proposed sampling-based mini-batch training and distributed graph learning. However, these methods still have drawbacks, such as performance degradation and heavy communication. This paper introduces SliceGCN, a feature-sliced distributed large-scale graph learning method. SliceGCN slices the node features, with each computing device, i.e., GPU, handling partial features. After each GPU processes its share, partial representations are obtained and concatenated to form complete representations, enabling a single GPU's memory to handle the entire graph structure. This aims to avoid the accuracy loss typically associated with mini-batch training (due to incomplete graph structures) and to reduce inter-GPU communication during message passing (the forward propagation process of GNNs). To study and mitigate potential accuracy reductions due to slicing features, this paper proposes feature fusion and slice encoding. Experiments were conducted on six node classification datasets, yielding some interesting analytical results. These results indicate that while SliceGCN does not enhance efficiency on smaller datasets, it does improve efficiency on larger datasets. Additionally, we found that SliceGCN and its variants have better convergence, feature fusion and slice encoding can make training more stable, reduce accuracy fluctuations, and this study also discovered that the design of SliceGCN has a potentially parameter-efficient nature.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
BLADE: Benchmarking Language Model Agents for Data-Driven Science
Authors:
Ken Gu,
Ruoxi Shang,
Ruien Jiang,
Keying Kuang,
Richard-John Lin,
Donghe Lyu,
Yue Mao,
Youran Pan,
Teng Wu,
Jiaqian Yu,
Yikun Zhang,
Tianmai M. Zhang,
Lanyi Zhu,
Mike A. Merrill,
Jeffrey Heer,
Tim Althoff
Abstract:
Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-dri…
▽ More
Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-driven science. However, evaluating agents on such open-ended tasks is challenging due to multiple valid approaches, partially correct steps, and different ways to express the same decisions. To address these challenges, we present BLADE, a benchmark to automatically evaluate agents' multifaceted approaches to open-ended research questions. BLADE consists of 12 datasets and research questions drawn from existing scientific literature, with ground truth collected from independent analyses by expert data scientists and researchers. To automatically evaluate agent responses, we developed corresponding computational methods to match different representations of analyses to this ground truth. Though language models possess considerable world knowledge, our evaluation shows that they are often limited to basic analyses. However, agents capable of interacting with the underlying data demonstrate improved, but still non-optimal, diversity in their analytical decision making. Our work enables the evaluation of agents for data-driven science and provides researchers deeper insights into agents' analysis approaches.
△ Less
Submitted 20 August, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
Authors:
Cheng Tan,
Dongxin Lyu,
Siyuan Li,
Zhangyang Gao,
Jingxuan Wei,
Siqi Ma,
Zicheng Liu,
Stan Z. Li
Abstract:
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r…
▽ More
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
WaveCastNet: An AI-enabled Wavefield Forecasting Framework for Earthquake Early Warning
Authors:
Dongwei Lyu,
Rie Nakata,
Pu Ren,
Michael W. Mahoney,
Arben Pitarka,
Nori Nakata,
N. Benjamin Erichson
Abstract:
Large earthquakes can be destructive and quickly wreak havoc on a landscape. To mitigate immediate threats, early warning systems have been developed to alert residents, emergency responders, and critical infrastructure operators seconds to a minute before seismic waves arrive. These warnings provide time to take precautions and prevent damage. The success of these systems relies on fast, accurate…
▽ More
Large earthquakes can be destructive and quickly wreak havoc on a landscape. To mitigate immediate threats, early warning systems have been developed to alert residents, emergency responders, and critical infrastructure operators seconds to a minute before seismic waves arrive. These warnings provide time to take precautions and prevent damage. The success of these systems relies on fast, accurate predictions of ground motion intensities, which is challenging due to the complex physics of earthquakes, wave propagation, and their intricate spatial and temporal interactions. To improve early warning, we propose a novel AI-enabled framework, WaveCastNet, for forecasting ground motions from large earthquakes. WaveCastNet integrates a novel convolutional Long Expressive Memory (ConvLEM) model into a sequence to sequence (seq2seq) forecasting framework to model long-term dependencies and multi-scale patterns in both space and time. WaveCastNet, which shares weights across spatial and temporal dimensions, requires fewer parameters compared to more resource-intensive models like transformers and thus, in turn, reduces inference times. Importantly, WaveCastNet also generalizes better than transformer-based models to different seismic scenarios, including to more rare and critical situations with higher magnitude earthquakes. Our results using simulated data from the San Francisco Bay Area demonstrate the capability to rapidly predict the intensity and timing of destructive ground motions. Importantly, our proposed approach does not require estimating earthquake magnitudes and epicenters, which are prone to errors using conventional approaches; nor does it require empirical ground motion models, which fail to capture strongly heterogeneous wave propagation effects.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks
Authors:
Shuyang Shi,
Ding Lyu,
Lin Wang,
Xiaofan Wang,
Guanrong Chen
Abstract:
Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of…
▽ More
Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of people, as fundamental units of human mobility networks. In this paper, we propose two network construction frameworks at the level of mobility motifs in characterizing regional importance in cities. Firstly, we enhance the structural dependencies within mobility motifs and proceed to construct mobility networks based on the enhanced mobility motifs. Secondly, taking inspiration from PageRank, we speculate that people would allocate values of importance to destinations according to their trip intentions. A motif-wise network construction framework is proposed based on the established mechanism. Leveraging large-scale metro data across cities, we construct three types of human mobility networks and characterize the regional importance by node importance indicators. Our comparison results suggest that the motif-based mobility network outperforms the classic mobility network, thus highlighting the efficacy of the introduced human mobility motifs. Finally, we demonstrate that the performance in characterizing the regional importance is significantly improved by our motif-wise framework.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing
Authors:
Yansong Xu,
Dongxu Lyu,
Zhenyu Li,
Zilong Wang,
Yuzhou Chen,
Gang Wang,
Zhican Wang,
Haomin Li,
Guanghui He
Abstract:
Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular data access and enormous memory requirement, leading to severe PE underutilization. Meanwhile, existing approaches for attention acceleration cannot be directly ap…
▽ More
Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular data access and enormous memory requirement, leading to severe PE underutilization. Meanwhile, existing approaches for attention acceleration cannot be directly applied to MSDeformAttn due to lack of support for this distinct procedure. Therefore, we propose a dedicated algorithm-architecture co-design dubbed DEFA, the first-of-its-kind method for MSDeformAttn acceleration. At the algorithm level, DEFA adopts frequency-weighted pruning and probability-aware pruning for feature maps and sampling points respectively, alleviating the memory footprint by over 80%. At the architecture level, it explores the multi-scale parallelism to boost the throughput significantly and further reduces the memory access via fine-grained layer fusion and feature map reusing. Extensively evaluated on representative benchmarks, DEFA achieves 10.1-31.9x speedup and 20.3-37.7x energy efficiency boost compared to powerful GPUs. It also rivals the related accelerators by 2.2-3.7x energy efficiency improvement while providing pioneering support for MSDeformAttn.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
A Capability Maturity Model for Urban Dataset Meta-data
Authors:
Mark S. Fox,
Bart Gajderowicz,
Dishu Lyu
Abstract:
In the current environment of data generation and publication, there is an ever-growing number of datasets available for download. This growth precipitates an existing challenge: sourcing and integrating relevant datasets for analysis is becoming more complex. Despite efforts by open data platforms, obstacles remain, predominantly rooted in inadequate metadata, unsuitable data presentation, compli…
▽ More
In the current environment of data generation and publication, there is an ever-growing number of datasets available for download. This growth precipitates an existing challenge: sourcing and integrating relevant datasets for analysis is becoming more complex. Despite efforts by open data platforms, obstacles remain, predominantly rooted in inadequate metadata, unsuitable data presentation, complications in pinpointing desired data, and data integration. This paper delves into the intricacies of dataset retrieval, emphasizing the pivotal role of metadata in aligning datasets with user queries. Through an exploration of existing literature, it underscores prevailing issues such as the identification of valuable metadata and the development of tools to maintain and annotate them effectively. The central contribution of this research is the proposition of a dataset metadata maturity model. Deriving inspiration from software engineering maturity models, this framework delineates a progression from rudimentary metadata documentation to advanced levels, aiding dataset creators in their documentation efforts. The model encompasses seven pivotal dimensions, spanning content to quality information, each stratified across five maturity levels to guide the optimal documentation of datasets, ensuring ease of discovery, relevance assessment, and comprehensive dataset understanding. This paper also incorporates the maturity model into a data cataloguing tool called CKAN through a custom plugin, CKANext-udc. The plugin introduces custom fields based on different maturity levels, allows for user interface customisation, and integrates with a graph database, converting catalogue data into a knowledge graph based on the Maturity Model ontology.
△ Less
Submitted 11 October, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Causal Structure Learning Supervised by Large Language Model
Authors:
Taiyu Ban,
Lyuzhou Chen,
Derui Lyu,
Xiangyu Wang,
Huanhuan Chen
Abstract:
Causal discovery from observational data is pivotal for deciphering complex relationships. Causal Structure Learning (CSL), which focuses on deriving causal Directed Acyclic Graphs (DAGs) from data, faces challenges due to vast DAG spaces and data sparsity. The integration of Large Language Models (LLMs), recognized for their causal reasoning capabilities, offers a promising direction to enhance C…
▽ More
Causal discovery from observational data is pivotal for deciphering complex relationships. Causal Structure Learning (CSL), which focuses on deriving causal Directed Acyclic Graphs (DAGs) from data, faces challenges due to vast DAG spaces and data sparsity. The integration of Large Language Models (LLMs), recognized for their causal reasoning capabilities, offers a promising direction to enhance CSL by infusing it with knowledge-based causal inferences. However, existing approaches utilizing LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses. In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework. ILS-CSL innovatively integrates LLM-based causal inference with CSL in an iterative process, refining the causal DAG using feedback from LLMs. This method not only utilizes LLM resources more efficiently but also generates more robust and high-quality structural constraints compared to previous methodologies. Our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery. The codes are available at \url{https://github.com/tyMadara/ILS-CSL}.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
SpOctA: A 3D Sparse Convolution Accelerator with Octree-Encoding-Based Map Search and Inherent Sparsity-Aware Processing
Authors:
Dongxu Lyu,
Zhenyu Li,
Yuzhou Chen,
Jinming Zhang,
Ningyi Xu,
Guanghui He
Abstract:
Point-cloud-based 3D perception has attracted great attention in various applications including robotics, autonomous driving and AR/VR. In particular, the 3D sparse convolution (SpConv) network has emerged as one of the most popular backbones due to its excellent performance. However, it poses severe challenges to real-time perception on general-purpose platforms, such as lengthy map search latenc…
▽ More
Point-cloud-based 3D perception has attracted great attention in various applications including robotics, autonomous driving and AR/VR. In particular, the 3D sparse convolution (SpConv) network has emerged as one of the most popular backbones due to its excellent performance. However, it poses severe challenges to real-time perception on general-purpose platforms, such as lengthy map search latency, high computation cost, and enormous memory footprint. In this paper, we propose SpOctA, a SpConv accelerator that enables high-speed and energy-efficient point cloud processing. SpOctA parallelizes the map search by utilizing algorithm-architecture co-optimization based on octree encoding, thereby achieving 8.8-21.2x search speedup. It also attenuates the heavy computational workload by exploiting inherent sparsity of each voxel, which eliminates computation redundancy and saves 44.4-79.1% processing latency. To optimize on-chip memory management, a SpConv-oriented non-uniform caching strategy is introduced to reduce external memory access energy by 57.6% on average. Implemented on a 40nm technology and extensively evaluated on representative benchmarks, SpOctA rivals the state-of-the-art SpConv accelerators by 1.1-6.9x speedup with 1.5-3.1x energy efficiency improvement.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Mitigating Prior Errors in Causal Structure Learning: Towards LLM driven Prior Knowledge
Authors:
Lyuzhou Chen,
Taiyu Ban,
Xiangyu Wang,
Derui Lyu,
Huanhuan Chen
Abstract:
Causal structure learning, a prominent technique for encoding cause and effect relationships among variables, through Bayesian Networks (BNs). Merely recovering causal structures from real-world observed data lacks precision, while the development of Large Language Models (LLM) is opening a new frontier of causality. LLM presents strong capability in discovering causal relationships between variab…
▽ More
Causal structure learning, a prominent technique for encoding cause and effect relationships among variables, through Bayesian Networks (BNs). Merely recovering causal structures from real-world observed data lacks precision, while the development of Large Language Models (LLM) is opening a new frontier of causality. LLM presents strong capability in discovering causal relationships between variables with the "text" inputs defining the investigated variables, leading to a potential new hierarchy and new ladder of causality. We aim an critical issue in the emerging topic of LLM based causal structure learning, to tackle erroneous prior causal statements from LLM, which is seldom considered in the current context of expert dominating prior resources. As a pioneer attempt, we propose a BN learning strategy resilient to prior errors without need of human intervention. Focusing on the edge-level prior, we classify the possible prior errors into three types: order-consistent, order-reversed, and irrelevant, and provide their theoretical impact on the Structural Hamming Distance (SHD) under the presumption of sufficient data. Intriguingly, we discover and prove that only the order-reversed error contributes to an increase in a unique acyclic closed structure, defined as a "quasi-circle". Leveraging this insight, a post-hoc strategy is employed to identify the order-reversed prior error by its impact on the increment of "quasi-circles". Through empirical evaluation on both real and synthetic datasets, we demonstrate our strategy's robustness against prior errors. Specifically, we highlight its substantial ability to resist order-reversed errors while maintaining the majority of correct prior knowledge.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
AutoRepair: Automated Repair for AI-Enabled Cyber-Physical Systems under Safety-Critical Conditions
Authors:
Deyun Lyu,
Jiayang Song,
Zhenya Zhang,
Zhijie Wang,
Tianyi Zhang,
Lei Ma,
Jianjun Zhao
Abstract:
Cyber-Physical Systems (CPS) have been widely deployed in safety-critical domains such as transportation, power and energy. Recently, there comes an increasing demand in employing deep neural networks (DNNs) in CPS for more intelligent control and decision making in sophisticated industrial safety-critical conditions, giving birth to the class of DNN controllers. However, due to the inherent uncer…
▽ More
Cyber-Physical Systems (CPS) have been widely deployed in safety-critical domains such as transportation, power and energy. Recently, there comes an increasing demand in employing deep neural networks (DNNs) in CPS for more intelligent control and decision making in sophisticated industrial safety-critical conditions, giving birth to the class of DNN controllers. However, due to the inherent uncertainty and opaqueness of DNNs, concerns about the safety of DNN-enabled CPS are also surging. In this work, we propose an automated framework named AutoRepair that, given a safety requirement, identifies unsafe control behavior in a DNN controller and repairs them through an optimization-based method. Having an unsafe signal of system execution, AutoRepair iteratively explores the control decision space and searches for the optimal corrections for the DNN controller in order to satisfy the safety requirements. We conduct a comprehensive evaluation of AutoRepair on 6 instances of industry-level DNN-enabled CPS from different safety-critical domains. Evaluation results show that AutoRepair successfully repairs critical safety issues in the DNN controllers, and significantly improves the reliability of CPS.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
RVE Analysis in LS-DYNA for High-fidelity Multiscale Material Modeling
Authors:
Haoyan Wei,
Dandan Lyu,
Wei Hu,
C. T. Wu
Abstract:
In modern engineering designs, advanced materials (e.g., fiber/particle-reinforced polymers, metallic alloys, laminar composites, etc.) are widely used, where microscale heterogeneities such as grains, inclusions, voids, micro-cracks, and interfaces significantly affect the macroscopic constitutive behaviors. Obviously, an accurate description of the multiscale material behaviors is of great impor…
▽ More
In modern engineering designs, advanced materials (e.g., fiber/particle-reinforced polymers, metallic alloys, laminar composites, etc.) are widely used, where microscale heterogeneities such as grains, inclusions, voids, micro-cracks, and interfaces significantly affect the macroscopic constitutive behaviors. Obviously, an accurate description of the multiscale material behaviors is of great importance to the success of material design and structural analysis. The Representative Volume Element (RVE) analysis method provides a rigorous means to obtain homogenized macroscopic material properties at the upper length scale from the properties of the material constituents and structures at a lower length scale. Recently, we have developed an RVE module (keyword: *RVE_ANALYSIS_FEM) in the multiphysics simulation software LS-DYNA to enable high-fidelity virtual testing of numerically re-constructed material samples at user-specified characteristic length scales. In this article, a brief introduction to this new feature will be given.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model
Authors:
Pengyue Jia,
Ling Chen,
Dandan Lyu
Abstract:
Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained p…
▽ More
Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
△ Less
Submitted 15 July, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
PRIMA: Planner-Reasoner Inside a Multi-task Reasoning Agent
Authors:
Daoming Lyu,
Bo Liu,
Jianshu Chen
Abstract:
We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning. This capability is essential for human-like intelligence due to its strong generalizability and simplicity for handling multiple tasks. However, a major challenge in developing effective MTR is the intrinsic conflict between reasoning capability and efficiency. An MTR-c…
▽ More
We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning. This capability is essential for human-like intelligence due to its strong generalizability and simplicity for handling multiple tasks. However, a major challenge in developing effective MTR is the intrinsic conflict between reasoning capability and efficiency. An MTR-capable agent must master a large set of "skills" to tackle diverse tasks, but executing a particular task at the inference stage requires only a small subset of immediately relevant skills. How can we maintain broad reasoning capability and also efficient specific-task performance? To address this problem, we propose a Planner-Reasoner framework capable of state-of-the-art MTR capability and high efficiency. The Reasoner models shareable (first-order) logic deduction rules, from which the Planner selects a subset to compose into efficient reasoning paths. The entire model is trained in an end-to-end manner using deep reinforcement learning, and experimental studies over a variety of domains validate its effectiveness.
△ Less
Submitted 12 February, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
Authors:
Liangliang Xu,
Daoming Lyu,
Yangchen Pan,
Aiwen Jiang,
Bo Liu
Abstract:
It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes…
▽ More
It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of STOPS' performance among existing risk-averse policy search methods.
△ Less
Submitted 22 July, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
When Cyber-Physical Systems Meet AI: A Benchmark, an Evaluation, and a Way Forward
Authors:
Jiayang Song,
Deyun Lyu,
Zhenya Zhang,
Zhijie Wang,
Tianyi Zhang,
Lei Ma
Abstract:
Cyber-physical systems (CPS) have been broadly deployed in safety-critical domains, such as automotive systems, avionics, medical devices, etc. In recent years, Artificial Intelligence (AI) has been increasingly adopted to control CPS. Despite the popularity of AI-enabled CPS, few benchmarks are publicly available. There is also a lack of deep understanding on the performance and reliability of AI…
▽ More
Cyber-physical systems (CPS) have been broadly deployed in safety-critical domains, such as automotive systems, avionics, medical devices, etc. In recent years, Artificial Intelligence (AI) has been increasingly adopted to control CPS. Despite the popularity of AI-enabled CPS, few benchmarks are publicly available. There is also a lack of deep understanding on the performance and reliability of AI-enabled CPS across different industrial domains. To bridge this gap, we initiate to create a public benchmark of industry-level CPS in seven domains and build AI controllers for them via state-of-the-art deep reinforcement learning (DRL) methods. Based on that, we further perform a systematic evaluation of these AI-enabled systems with their traditional counterparts to identify the current challenges and explore future opportunities. Our key findings include (1) AI controllers do not always outperform traditional controllers, (2) existing CPS testing techniques (falsification, specifically) fall short of analyzing AI-enabled CPS, and (3) building a hybrid system that strategically combines and switches between AI controllers and traditional controllers can achieve better performance across different domains. Our results highlight the need for new testing techniques for AI-enabled CPS and the need for more investigations into hybrid CPS systems to achieve optimal performance and reliability.
△ Less
Submitted 19 April, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation
Authors:
Ling Chen,
Da Wang,
Dandan Lyu,
Xing Tang,
Hongyu Shi
Abstract:
Evolving temporal networks serve as the abstractions of many real-life dynamic systems, e.g., social network and e-commerce. The purpose of temporal network embedding is to map each node to a time-evolving low-dimension vector for downstream tasks, e.g., link prediction and node classification. The difficulty of temporal network embedding lies in how to utilize the topology and time information jo…
▽ More
Evolving temporal networks serve as the abstractions of many real-life dynamic systems, e.g., social network and e-commerce. The purpose of temporal network embedding is to map each node to a time-evolving low-dimension vector for downstream tasks, e.g., link prediction and node classification. The difficulty of temporal network embedding lies in how to utilize the topology and time information jointly to capture the evolution of a temporal network. In response to this challenge, we propose a temporal motif-preserving network embedding method with bicomponent neighbor aggregation, named TME-BNA. Considering that temporal motifs are essential to the understanding of topology laws and functional properties of a temporal network, TME-BNA constructs additional edge features based on temporal motifs to explicitly utilize complex topology with time information. In order to capture the topology dynamics of nodes, TME-BNA utilizes Graph Neural Networks (GNNs) to aggregate the historical and current neighbors respectively according to the timestamps of connected edges. Experiments are conducted on three public temporal network datasets, and the results show the effectiveness of TME-BNA.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Multi-Relation Aware Temporal Interaction Network Embedding
Authors:
Ling Chen,
Shanshan Yu,
Dandan Lyu,
Da Wang
Abstract:
Temporal interaction networks are formed in many fields, e.g., e-commerce, online education, and social network service. Temporal interaction network embedding can effectively mine the information in temporal interaction networks, which is of great significance to the above fields. Usually, the occurrence of an interaction affects not only the nodes directly involved in the interaction (interactin…
▽ More
Temporal interaction networks are formed in many fields, e.g., e-commerce, online education, and social network service. Temporal interaction network embedding can effectively mine the information in temporal interaction networks, which is of great significance to the above fields. Usually, the occurrence of an interaction affects not only the nodes directly involved in the interaction (interacting nodes), but also the neighbor nodes of interacting nodes. However, existing temporal interaction network embedding methods only use historical interaction relations to mine neighbor nodes, ignoring other relation types. In this paper, we propose a multi-relation aware temporal interaction network embedding method (MRATE). Based on historical interactions, MRATE mines historical interaction relations, common interaction relations, and interaction sequence similarity relations to obtain the neighbor based embeddings of interacting nodes. The hierarchical multi-relation aware aggregation method in MRATE first employs graph attention networks (GATs) to aggregate the interaction impacts propagated through a same relation type and then combines the aggregated interaction impacts from multiple relation types through the self-attention mechanism. Experiments are conducted on three public temporal interaction network datasets, and the experimental results show the effectiveness of MRATE.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Investigating and Modeling the Dynamics of Long Ties
Authors:
Ding Lyu,
Yuan Yuan,
Lin Wang,
Xiaofan Wang,
Alex Pentland
Abstract:
Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynami…
▽ More
Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds.
△ Less
Submitted 2 April, 2022; v1 submitted 22 September, 2021;
originally announced September 2021.
-
Multi-Level Visual Similarity Based Personalized Tourist Attraction Recommendation Using Geo-Tagged Photos
Authors:
Ling Chen,
Dandan Lyu,
Shanshan Yu,
Gencai Chen
Abstract:
Geo-tagged photo based tourist attraction recommendation can discover users' travel preferences from their taken photos, so as to recommend suitable tourist attractions to them. However, existing visual content based methods cannot fully exploit the user and tourist attraction information of photos to extract visual features, and do not differentiate the significances of different photos. In this…
▽ More
Geo-tagged photo based tourist attraction recommendation can discover users' travel preferences from their taken photos, so as to recommend suitable tourist attractions to them. However, existing visual content based methods cannot fully exploit the user and tourist attraction information of photos to extract visual features, and do not differentiate the significances of different photos. In this paper, we propose multi-level visual similarity based personalized tourist attraction recommendation using geo-tagged photos (MEAL). MEAL utilizes the visual contents of photos and interaction behavior data to obtain the final embeddings of users and tourist attractions, which are then used to predict the visit probabilities. Specifically, by crossing the user and tourist attraction information of photos, we define four visual similarity levels and introduce a corresponding quintuplet loss to embed the visual contents of photos. In addition, to capture the significances of different photos, we exploit the self-attention mechanism to obtain the visual representations of users and tourist attractions. We conducted experiments on a dataset crawled from Flickr, and the experimental results proved the advantage of this method.
△ Less
Submitted 27 January, 2023; v1 submitted 16 September, 2021;
originally announced September 2021.
-
TDM: Trustworthy Decision-Making via Interpretability Enhancement
Authors:
Daoming Lyu,
Fangkai Yang,
Hugh Kwon,
Wen Dong,
Levent Yilmaz,
Bo Liu
Abstract:
Human-robot interactive decision-making is increasingly becoming ubiquitous, and trust is an influential factor in determining the reliance on autonomy. However, it is not reasonable to trust systems that are beyond our comprehension, and typical machine learning and data-driven decision-making are black-box paradigms that impede interpretability. Therefore, it is critical to establish computation…
▽ More
Human-robot interactive decision-making is increasingly becoming ubiquitous, and trust is an influential factor in determining the reliance on autonomy. However, it is not reasonable to trust systems that are beyond our comprehension, and typical machine learning and data-driven decision-making are black-box paradigms that impede interpretability. Therefore, it is critical to establish computational trustworthy decision-making mechanisms enhanced by interpretability-aware strategies. To this end, we propose a Trustworthy Decision-Making (TDM) framework, which integrates symbolic planning into sequential decision-making. The framework learns interpretable subtasks that result in a complex, higher-level composite task that can be formally evaluated using the proposed trust metric. TDM enables the subtask-level interpretability by design and converges to an optimal symbolic plan from the learned subtasks. Moreover, a TDM-based algorithm is introduced to demonstrate the unification of symbolic planning with other sequential-decision making algorithms, reaping the benefits of both. Experimental results validate the effectiveness of trust-score-based planning while improving the interpretability of subtasks.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
APAN: Asynchronous Propagation Attention Network for Real-time Temporal Graph Embedding
Authors:
Xuhong Wang,
Ding Lyu,
Mengjian Li,
Yang Xia,
Qi Yang,
Xinwen Wang,
Xinguang Wang,
Ping Cui,
Yupu Yang,
Bowen Sun,
Zhenyu Guo
Abstract:
Limited by the time complexity of querying k-hop neighbors in a graph database, most graph algorithms cannot be deployed online and execute millisecond-level inference. This problem dramatically limits the potential of applying graph algorithms in certain areas, such as financial fraud detection. Therefore, we propose Asynchronous Propagation Attention Network, an asynchronous continuous time dyna…
▽ More
Limited by the time complexity of querying k-hop neighbors in a graph database, most graph algorithms cannot be deployed online and execute millisecond-level inference. This problem dramatically limits the potential of applying graph algorithms in certain areas, such as financial fraud detection. Therefore, we propose Asynchronous Propagation Attention Network, an asynchronous continuous time dynamic graph algorithm for real-time temporal graph embedding. Traditional graph models usually execute two serial operations: first graph computation and then model inference. We decouple model inference and graph computation step so that the heavy graph query operations will not damage the speed of model inference. Extensive experiments demonstrate that the proposed method can achieve competitive performance and 8.7 times inference speed improvement in the meantime.
△ Less
Submitted 26 March, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Authors:
Daoming Lyu,
Qi Qi,
Mohammad Ghavamzadeh,
Hengshuai Yao,
Tianbao Yang,
Bo Liu
Abstract:
Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues with convergence. A few variance-reduced on-policy policy gradient algorithms have been recently proposed that use methods from stochastic optimization to reduce…
▽ More
Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues with convergence. A few variance-reduced on-policy policy gradient algorithms have been recently proposed that use methods from stochastic optimization to reduce the variance of the gradient estimate in the REINFORCE algorithm. However, these algorithms are not designed for the off-policy setting and are memory-inefficient, since they need to collect and store a large ``reference'' batch of samples from time to time. To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples. Empirical studies validate the effectiveness of the proposed approaches.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
ATG-PVD: Ticketing Parking Violations on A Drone
Authors:
Hengli Wang,
Yuxuan Liu,
Huaiyang Huang,
Yuheng Pan,
Wenbin Yu,
Jialin Jiang,
Dianbin Lyu,
Mohammud J. Bocus,
Ming Liu,
Ioannis Pitas,
Rui Fan
Abstract:
In this paper, we introduce a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). Our proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegal…
▽ More
In this paper, we introduce a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). Our proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegally parked car (IPC) candidate investigation module developed based on visual SLAM. The proposed framework was successfully embedded in a drone from ATG Robotics. The experimental results demonstrate that, firstly, our proposed SwiftFlow outperforms all other state-of-the-art unsupervised optical flow estimation approaches in terms of both speed and accuracy; secondly, IPC candidates can be effectively and efficiently detected by our proposed Flow-RCNN, with a better performance than our baseline network, Faster-RCNN; finally, the actual IPCs can be successfully verified by our investigation module after drone re-localization.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Stable and Efficient Policy Evaluation
Authors:
Daoming Lyu,
Bo Liu,
Matthieu Geist,
Wen Dong,
Saad Biaz,
Qi Wang
Abstract:
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not of…
▽ More
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not off-policy stable. On the other hand, the gradient TD and emphatic TD algorithms are off-policy stable, but are not on-policy efficient. This paper introduces novel algorithms that are both off-policy stable and on-policy efficient by using the oblique projection method. The empirical experimental results on various domains validate the effectiveness of the proposed approach.
△ Less
Submitted 27 December, 2021; v1 submitted 6 June, 2020;
originally announced June 2020.
-
A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming
Authors:
Daoming Lyu,
Fangkai Yang,
Bo Liu,
Steven Gustafson
Abstract:
Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning…
▽ More
Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning (PACMAN), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, and also integrates the Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. This work is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
A Joint Planning and Learning Framework for Human-Aided Decision-Making
Authors:
Daoming Lyu,
Fangkai Yang,
Bo Liu,
Steven Gustafson
Abstract:
Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve, especially at the beginning stage. On the contrary, human learning is usually much faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a \textbf{P}lanner-\textbf{A}ctor-\textbf{C}ritic archi…
▽ More
Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve, especially at the beginning stage. On the contrary, human learning is usually much faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a \textbf{P}lanner-\textbf{A}ctor-\textbf{C}ritic architecture for hu\textbf{MAN}-centered planning and learning (\textbf{PACMAN}), where an agent uses prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions. PACMAN integrates Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. To the best our knowledge, This is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback.
△ Less
Submitted 24 December, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Knowledge-Based Sequential Decision-Making Under Uncertainty
Authors:
Daoming Lyu
Abstract:
Deep reinforcement learning (DRL) algorithms have achieved great success on sequential decision-making problems, yet is criticized for the lack of data-efficiency and explainability. Especially, explainability of subtasks is critical in hierarchical decision-making since it enhances the transparency of black-box-style DRL methods and helps the RL practitioners to understand the high-level behavior…
▽ More
Deep reinforcement learning (DRL) algorithms have achieved great success on sequential decision-making problems, yet is criticized for the lack of data-efficiency and explainability. Especially, explainability of subtasks is critical in hierarchical decision-making since it enhances the transparency of black-box-style DRL methods and helps the RL practitioners to understand the high-level behavior of the system better. To improve the data-efficiency and explainability of DRL, declarative knowledge is introduced in this work and a novel algorithm is proposed by integrating DRL with symbolic planning. Experimental analysis on publicly available benchmarks validates the explainability of the subtasks and shows that our method can outperform the state-of-the-art approach in terms of data-efficiency.
△ Less
Submitted 15 May, 2020; v1 submitted 16 May, 2019;
originally announced May 2019.
-
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Authors:
Daoming Lyu,
Fangkai Yang,
Bo Liu,
Steven Gustafson
Abstract:
Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better…
▽ More
Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner -- controller -- meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.
△ Less
Submitted 28 February, 2019; v1 submitted 31 October, 2018;
originally announced November 2018.
-
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Authors:
Bo Liu,
Tengyang Xie,
Yangyang Xu,
Mohammad Ghavamzadeh,
Yinlam Chow,
Daoming Lyu,
Daesub Yoon
Abstract:
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, wh…
▽ More
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.
△ Less
Submitted 1 November, 2018; v1 submitted 6 September, 2018;
originally announced September 2018.
-
PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making
Authors:
Fangkai Yang,
Daoming Lyu,
Bo Liu,
Steven Gustafson
Abstract:
Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a un…
▽ More
Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties.
Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.
△ Less
Submitted 5 June, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
O$^2$TD: (Near)-Optimal Off-Policy TD Learning
Authors:
Bo Liu,
Daoming Lyu,
Wen Dong,
Saad Biaz
Abstract:
Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true value function $V$. Two novel algorithms are proposed to approximate the true value function $V$. This paper makes the following contributions: (1) A batch algorit…
▽ More
Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true value function $V$. Two novel algorithms are proposed to approximate the true value function $V$. This paper makes the following contributions: (1) A batch algorithm that can help find the approximate optimal off-policy prediction of the true value function $V$. (2) A linear computational cost (per step) near-optimal algorithm that can learn from a collection of off-policy samples. (3) A new perspective of the emphatic temporal difference learning which bridges the gap between off-policy optimality and off-policy stability.
△ Less
Submitted 19 April, 2017; v1 submitted 17 April, 2017;
originally announced April 2017.