-
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations
Authors:
Weiduo Yuan,
Jerry Li,
Justin Yue,
Divyank Shah,
Konstantinos Karydis,
Hang Qiu
Abstract:
Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform…
▽ More
Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations
Authors:
Juwei Yue,
Haikuo Li,
Jiawei Sheng,
Xiaodong Li,
Taoyu Su,
Tingwen Liu,
Li Guo
Abstract:
Graph neural networks (GNNs) leverage message passing mechanisms to learn the topological features of graph data. Traditional GNNs learns node features in a spatial domain unrelated to the topology, which can hardly ensure topological features. In this paper, we formulates message passing as a system of hyperbolic partial differential equations (hyperbolic PDEs), constituting a dynamical system th…
▽ More
Graph neural networks (GNNs) leverage message passing mechanisms to learn the topological features of graph data. Traditional GNNs learns node features in a spatial domain unrelated to the topology, which can hardly ensure topological features. In this paper, we formulates message passing as a system of hyperbolic partial differential equations (hyperbolic PDEs), constituting a dynamical system that explicitly maps node representations into a particular solution space. This solution space is spanned by a set of eigenvectors describing the topological structure of graphs. Within this system, for any moment in time, a node features can be decomposed into a superposition of the basis of eigenvectors. This not only enhances the interpretability of message passing but also enables the explicit extraction of fundamental characteristics about the topological structure. Furthermore, by solving this system of hyperbolic partial differential equations, we establish a connection with spectral graph neural networks (spectral GNNs), serving as a message passing enhancement paradigm for spectral GNNs.We further introduce polynomials to approximate arbitrary filter functions. Extensive experiments demonstrate that the paradigm of hyperbolic PDEs not only exhibits strong flexibility but also significantly enhances the performance of various spectral GNNs across diverse graph tasks.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Graph Wave Networks
Authors:
Juwei Yue,
Haikuo Li,
Jiawei Sheng,
Yihan Guo,
Xinghua Zhang,
Chuan Zhou,
Tingwen Liu,
Li Guo
Abstract:
Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equat…
▽ More
Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equation is essentially a partial differential equation (PDE) involving a first partial derivative of time, whose numerical solution usually has low stability, and leads to inefficient model training. In this paper, we would like to depict more wave details in MP, since graph signals are essentially wave signals that can be seen as a superposition of a series of waves in the form of eigenvector. This motivates us to consider MP as a wave propagation process to capture the temporal evolution of wave signals in the space. Based on wave equation in physics, we innovatively develop a graph wave equation to leverage the wave propagation on graphs. In details, we demonstrate that the graph wave equation can be connected to traditional spectral GNNs, facilitating the design of graph wave networks based on various Laplacians and enhancing the performance of the spectral GNNs. Besides, the graph wave equation is particularly a PDE involving a second partial derivative of time, which has stronger stability on graphs than the heat equation that involves a first partial derivative of time. Additionally, we theoretically prove that the numerical solution derived from the graph wave equation are constantly stable, enabling to significantly enhance model efficiency while ensuring its performance. Extensive experiments show that GWNs achieve SOTA and efficient performance on benchmark datasets, and exhibit outstanding performance in addressing challenging graph problems, such as over-smoothing and heterophily.
△ Less
Submitted 28 May, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Recent Deep Learning in Crowd Behaviour Analysis: A Brief Review
Authors:
Jiangbei Yue,
He Wang
Abstract:
Crowd behaviour analysis is essential to numerous real-world applications, such as public safety and urban planning, and therefore has been studied for decades. In the last decade or so, the development of deep learning has significantly propelled the research on crowd behaviours. This chapter reviews recent advances in crowd behaviour analysis using deep learning. We mainly review the research in…
▽ More
Crowd behaviour analysis is essential to numerous real-world applications, such as public safety and urban planning, and therefore has been studied for decades. In the last decade or so, the development of deep learning has significantly propelled the research on crowd behaviours. This chapter reviews recent advances in crowd behaviour analysis using deep learning. We mainly review the research in two core tasks in this field, crowd behaviour prediction and recognition. We broadly cover how different deep neural networks, after first being proposed in machine learning, are applied to analysing crowd behaviours. This includes pure deep neural network models as well as recent development of methodologies combining physics with deep learning. In addition, representative studies are discussed and compared in detail. Finally, we discuss the effectiveness of existing methods and future research directions in this rapidly evolving field. This chapter aims to provide a high-level summary of the ongoing deep learning research in crowd behaviour analysis. It intends to help new researchers who just entered this field to obtain an overall understanding of the ongoing research, as well as to provide a retrospective analysis for existing researchers to identify possible future directions
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
Authors:
Jingkun Yue,
Siqi Zhang,
Zinan Jia,
Huihuan Xu,
Zongbo Han,
Xiaohong Liu,
Guangyu Wang
Abstract:
Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of d…
▽ More
Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting change regions across images, and 2) Image Consistency Grounding, which emphasizes detection of consistent or shared semantics across sequential images. MedSG-Bench covers 76 public datasets, 10 medical imaging modalities, and a wide spectrum of anatomical structures and diseases, totaling 9,630 question-answer pairs. We benchmark both general-purpose MLLMs (e.g., Qwen2.5-VL) and medical-domain specialized MLLMs (e.g., HuatuoGPT-vision), observing that even the advanced models exhibit substantial limitations in medical sequential grounding tasks. To advance this field, we construct MedSG-188K, a large-scale instruction-tuning dataset tailored for sequential visual grounding, and further develop MedSeq-Grounder, an MLLM designed to facilitate future research on fine-grained understanding across medical sequential images. The benchmark, dataset, and model are available at https://huggingface.co/MedSG-Bench
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling
Authors:
Seraj Al Mahmud Mostafa,
Chenxi Wang,
Jia Yue,
Yuta Hozumi,
Jianwu Wang
Abstract:
Object localization in satellite imagery is particularly challenging due to the high variability of objects, low spatial resolution, and interference from noise and dominant features such as clouds and city lights. In this research, we focus on three satellite datasets: upper atmospheric Gravity Waves (GW), mesospheric Bores (Bore), and Ocean Eddies (OE), each presenting its own unique challenges.…
▽ More
Object localization in satellite imagery is particularly challenging due to the high variability of objects, low spatial resolution, and interference from noise and dominant features such as clouds and city lights. In this research, we focus on three satellite datasets: upper atmospheric Gravity Waves (GW), mesospheric Bores (Bore), and Ocean Eddies (OE), each presenting its own unique challenges. These challenges include the variability in the scale and appearance of the main object patterns, where the size, shape, and feature extent of objects of interest can differ significantly. To address these challenges, we introduce YOLO-DCAP, a novel enhanced version of YOLOv5 designed to improve object localization in these complex scenarios. YOLO-DCAP incorporates a Multi-scale Dilated Residual Convolution (MDRC) block to capture multi-scale features at scale with varying dilation rates, and an Attention-aided Spatial Pooling (AaSP) module to focus on the global relevant spatial regions, enhancing feature selection. These structural improvements help to better localize objects in satellite imagery. Experimental results demonstrate that YOLO-DCAP significantly outperforms both the YOLO base model and state-of-the-art approaches, achieving an average improvement of 20.95% in mAP50 and 32.23% in IoU over the base model, and 7.35% and 9.84% respectively over state-of-the-art alternatives, consistently across all three satellite datasets. These consistent gains across all three satellite datasets highlight the robustness and generalizability of the proposed approach. Our code is open sourced at https://github.com/AI-4-atmosphere-remote-sensing/satellite-object-localization.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Diffuson-Dominated Thermal Transport Crossover from Ordered to Liquid-like Cu$_3$BiS$_3$:The Negligible Role of Ion Hopping
Authors:
Jincheng Yue,
Jiongzhi Zheng,
Xingchen Shen,
Chun-Chuen Yang,
Shuyao Lin,
Yanhui Liu,
Tian Cui
Abstract:
Fundamentally understanding lattice dynamics and thermal transport behavior in liquid-like, partially occupied compounds remains a long-standing challenge in condensed matter physics. Here, we investigate the microscopic mechanisms underlying the ultralow thermal conductivity in ordered/liquid-like Cu$_3$BiS$_3$ by combining experimental methods with first-principles calculations. We first experim…
▽ More
Fundamentally understanding lattice dynamics and thermal transport behavior in liquid-like, partially occupied compounds remains a long-standing challenge in condensed matter physics. Here, we investigate the microscopic mechanisms underlying the ultralow thermal conductivity in ordered/liquid-like Cu$_3$BiS$_3$ by combining experimental methods with first-principles calculations. We first experimentally synthesize and characterize the ordered structure and liquid-like, partially Cu-atom occupied Cu$_3$BiS$_3$ structure with increasing temperature. We then combine self-consistent phonon calculations, including bubble-diagram corrections, with the Wigner transport equation, considering both phonon propagation and diffuson contributions, to evaluate the anharmonic lattice dynamics and thermal conductivity in phase-change Cu$_3$BiS$_3$. Our theoretical model predicts an ultralow thermal conductivity of 0.34 W/m/K at 400 K, dominated by diffuson contributions, which accurately reproduces and explains the experimental data. Importantly, the machine-learning-based molecular dynamics (MD) simulations not only reproduced the partially Cu-atom occupied Cu$_3$BiS$_3$ structure with the space group $\mathrm{P2_12_12_1}$ but also successfully replicated the thermal conductivity obtained from experiments and Wigner transport calculations. This observation highlights the negligible impact of ionic mobility arising from partially occupied Cu sites on the thermal conductivity in diffuson-dominated thermal transport compounds. Our work not only sheds light on the minimal impact of ionic mobility on ultralow thermal conductivity in phase-change materials but also demonstrates that the Wigner transport equation accurately describes thermal transport behavior in partially occupied phases with diffuson-dominant thermal transport.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability
Authors:
Zhaoyang Ma,
Zhihao Wu,
Wang Lu,
Xin Gao,
Jinghang Yue,
Taolin Zhang,
Lipo Wang,
Youfang Lin,
Jing Wang
Abstract:
The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, w…
▽ More
The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, we propose a novel method Harmonized Ensemble for Adversarial Transferability (HEAT), which introduces domain generalization into adversarial example generation for the first time. HEAT consists of two key modules: Consensus Gradient Direction Synthesizer, which uses Singular Value Decomposition to synthesize shared gradient directions; and Dual-Harmony Weight Orchestrator which dynamically balances intra-domain coherence, stabilizing gradients within individual models, and inter-domain diversity, enhancing transferability across models. Experimental results demonstrate that HEAT significantly outperforms existing methods across various datasets and settings, offering a new perspective and direction for adversarial attack research.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes
Authors:
Junlin Guo,
James R. Zimmer-Dauphinee,
Jordan M. Nieusma,
Siqi Lu,
Quan Liu,
Ruining Deng,
Can Cui,
Jialin Yue,
Yizhe Lin,
Tianyuan Yao,
Juming Xiong,
Junchao Zhu,
Chongyu Qu,
Yuechen Yang,
Mitchell Wilkes,
Xiao Wang,
Parker VanValkenburgh,
Steven A. Wernke,
Yuankai Huo
Abstract:
By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional sup…
▽ More
By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. While recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multi-spectral satellite imagery, such as the 8-band data used in our study. In this paper, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multi-spectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for 8-band multi-spectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pre-trained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pre-training in archaeological remote sensing. Codes will be available on https://github.com/geopacha/DeepAndes.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
Authors:
Taoyu Su,
Jiawei Sheng,
Duohe Ma,
Xiaodong Li,
Juwei Yue,
Mengxiao Song,
Yingkai Tang,
Tingwen Liu
Abstract:
Multi-Modal Entity Alignment (MMEA) aims to retrieve equivalent entities from different Multi-Modal Knowledge Graphs (MMKGs), a critical information retrieval task. Existing studies have explored various fusion paradigms and consistency constraints to improve the alignment of equivalent entities, while overlooking that the visual modality may not always contribute positively. Empirically, entities…
▽ More
Multi-Modal Entity Alignment (MMEA) aims to retrieve equivalent entities from different Multi-Modal Knowledge Graphs (MMKGs), a critical information retrieval task. Existing studies have explored various fusion paradigms and consistency constraints to improve the alignment of equivalent entities, while overlooking that the visual modality may not always contribute positively. Empirically, entities with low-similarity images usually generate unsatisfactory performance, highlighting the limitation of overly relying on visual features. We believe the model can be biased toward the visual modality, leading to a shortcut image-matching task. To address this, we propose a counterfactual debiasing framework for MMEA, termed CDMEA, which investigates visual modality bias from a causal perspective. Our approach aims to leverage both visual and graph modalities to enhance MMEA while suppressing the direct causal effect of the visual modality on model predictions. By estimating the Total Effect (TE) of both modalities and excluding the Natural Direct Effect (NDE) of the visual modality, we ensure that the model predicts based on the Total Indirect Effect (TIE), effectively utilizing both modalities and reducing visual modality bias. Extensive experiments on 9 benchmark datasets show that CDMEA outperforms 14 state-of-the-art methods, especially in low-similarity, high-noise, and low-resource data scenarios.
△ Less
Submitted 15 May, 2025; v1 submitted 27 April, 2025;
originally announced April 2025.
-
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Authors:
Junrong Yue,
Yifan Zhang,
Chuan Qin,
Bo Li,
Xiaomin Lie,
Xinlei Yu,
Wenxin Zhang,
Zhendong Zhao
Abstract:
Vision-and-Language Navigation (VLN) aims to enable embodied agents to follow natural language instructions and reach target locations in real-world environments. While prior methods often rely on either global scene representations or object-level features, these approaches are insufficient for capturing the complex interactions across modalities required for accurate navigation. In this paper, w…
▽ More
Vision-and-Language Navigation (VLN) aims to enable embodied agents to follow natural language instructions and reach target locations in real-world environments. While prior methods often rely on either global scene representations or object-level features, these approaches are insufficient for capturing the complex interactions across modalities required for accurate navigation. In this paper, we propose a Multi-level Fusion and Reasoning Architecture (MFRA) to enhance the agent's ability to reason over visual observations, language instructions and navigation history. Specifically, MFRA introduces a hierarchical fusion mechanism that aggregates multi-level features-ranging from low-level visual cues to high-level semantic concepts-across multiple modalities. We further design a reasoning module that leverages fused representations to infer navigation actions through instruction-guided attention and dynamic context integration. By selectively capturing and combining relevant visual, linguistic, and temporal signals, MFRA improves decision-making accuracy in complex navigation scenarios. Extensive experiments on benchmark VLN datasets including REVERIE, R2R, and SOON demonstrate that MFRA achieves superior performance compared to state-of-the-art methods, validating the effectiveness of multi-level modal fusion for embodied navigation.
△ Less
Submitted 24 April, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Application of Single-cell Deep Learning in Elucidating the Mapping Relationship Between Visceral and Body Surface Inflammatory Patterns
Authors:
Haixiang Huang,
Bingbing Shen,
Zhenwei Zhang,
Jianming Yue,
Lu Mei,
Qiusheng Chen
Abstract:
As a system of integrated homeostasis, life is susceptible to disruptions by visceral inflammation, which can disturb internal environment equilibrium. The role of body-spread subcutaneous fascia (scFascia) in this process is poorly understood. In the rat model of Salmonella-induced dysentery, scRNA-seq of scFascia and deep-learning analysis revealed Warburg-like metabolic reprogramming in macroph…
▽ More
As a system of integrated homeostasis, life is susceptible to disruptions by visceral inflammation, which can disturb internal environment equilibrium. The role of body-spread subcutaneous fascia (scFascia) in this process is poorly understood. In the rat model of Salmonella-induced dysentery, scRNA-seq of scFascia and deep-learning analysis revealed Warburg-like metabolic reprogramming in macrophages (MPs) with reduced citrate cycle activity. Cd34+/Pdgfra+ telocytes (CPTCs) regulated MPs differentiation and proliferation via Wnt/Fgf signal, suggesting a pathological crosstalk pattern in the scFascia, herein termed the fascia-visceral inflammatory crosstalk pattern (FVICP). PySCENIC analysis indicated increased activity transcription factors Fosl1, Nfkb2, and Atf4, modulated by CPTCs signaling to MPs, downregulating aerobic respiration and upregulating cell cycle, DNA replication, and transcription. This study highlights scFascia's role in immunomodulation and metabolic reprogramming during visceral inflammation, underscoring its function in systemic homeostasis.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Learning Extremely High Density Crowds as Active Matters
Authors:
Feixiang He,
Jiangbei Yue,
Jialin Zhu,
Armin Seyfried,
Dan Casas,
Julien Pettré,
He Wang
Abstract:
Video-based high-density crowd analysis and prediction has been a long-standing topic in computer vision. It is notoriously difficult due to, but not limited to, the lack of high-quality data and complex crowd dynamics. Consequently, it has been relatively under studied. In this paper, we propose a new approach that aims to learn from in-the-wild videos, often with low quality where it is difficul…
▽ More
Video-based high-density crowd analysis and prediction has been a long-standing topic in computer vision. It is notoriously difficult due to, but not limited to, the lack of high-quality data and complex crowd dynamics. Consequently, it has been relatively under studied. In this paper, we propose a new approach that aims to learn from in-the-wild videos, often with low quality where it is difficult to track individuals or count heads. The key novelty is a new physics prior to model crowd dynamics. We model high-density crowds as active matter, a continumm with active particles subject to stochastic forces, named 'crowd material'. Our physics model is combined with neural networks, resulting in a neural stochastic differential equation system which can mimic the complex crowd dynamics. Due to the lack of similar research, we adapt a range of existing methods which are close to ours for comparison. Through exhaustive evaluation, we show our model outperforms existing methods in analyzing and forecasting extremely high-density crowds. Furthermore, since our model is a continuous-time physics model, it can be used for simulation and analysis, providing strong interpretability. This is categorically different from most deep learning methods, which are discrete-time models and black-boxes.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
3D Student Splatting and Scooping
Authors:
Jialin Zhu,
Jiangbei Yue,
Feixiang He,
He Wang
Abstract:
Recently, 3D Gaussian Splatting (3DGS) provides a new framework for novel view synthesis, and has spiked a new wave of research in neural rendering and related applications. As 3DGS is becoming a foundational component of many models, any improvement on 3DGS itself can bring huge benefits. To this end, we aim to improve the fundamental paradigm and formulation of 3DGS. We argue that as an unnormal…
▽ More
Recently, 3D Gaussian Splatting (3DGS) provides a new framework for novel view synthesis, and has spiked a new wave of research in neural rendering and related applications. As 3DGS is becoming a foundational component of many models, any improvement on 3DGS itself can bring huge benefits. To this end, we aim to improve the fundamental paradigm and formulation of 3DGS. We argue that as an unnormalized mixture model, it needs to be neither Gaussians nor splatting. We subsequently propose a new mixture model consisting of flexible Student's t distributions, with both positive (splatting) and negative (scooping) densities. We name our model Student Splatting and Scooping, or SSS. When providing better expressivity, SSS also poses new challenges in learning. Therefore, we also propose a new principled sampling approach for optimization. Through exhaustive evaluation and comparison, across multiple datasets, settings, and metrics, we demonstrate that SSS outperforms existing methods in terms of quality and parameter efficiency, e.g. achieving matching or better quality with similar numbers of components, and obtaining comparable results while reducing the component number by as much as 82%.
△ Less
Submitted 11 April, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
Authors:
Jingtong Yue,
Zhiwei Lin,
Xin Lin,
Xiaoyu Zhou,
Xiangtai Li,
Lu Qi,
Yongtao Wang,
Ming-Hsuan Yang
Abstract:
While recent low-cost radar-camera approaches have shown promising results in multi-modal 3D object detection, both sensors face challenges from environmental and intrinsic disturbances. Poor lighting or adverse weather conditions degrade camera performance, while radar suffers from noise and positional ambiguity. Achieving robust radar-camera 3D object detection requires consistent performance ac…
▽ More
While recent low-cost radar-camera approaches have shown promising results in multi-modal 3D object detection, both sensors face challenges from environmental and intrinsic disturbances. Poor lighting or adverse weather conditions degrade camera performance, while radar suffers from noise and positional ambiguity. Achieving robust radar-camera 3D object detection requires consistent performance across varying conditions, a topic that has not yet been fully explored. In this work, we first conduct a systematic analysis of robustness in radar-camera detection on five kinds of noises and propose RobuRCDet, a robust object detection model in BEV. Specifically, we design a 3D Gaussian Expansion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity. The 3DGE uses RCS and velocity priors to generate a deformable kernel map and variance for kernel size adjustment and value distribution. Additionally, we introduce a weather-adaptive fusion module, which adaptively fuses radar and camera features based on camera signal confidence. Extensive experiments on the popular benchmark, nuScenes, show that our model achieves competitive results in regular and noisy conditions.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
TastepepAI, An artificial intelligence platform for taste peptide de novo design
Authors:
Jianda Yue,
Tingting Li,
Jian Ouyang,
Jiawei Xu,
Hua Tan,
Zihui Chen,
Changsheng Han,
Huanyu Li,
Songping Liang,
Zhonghua Liu,
Zhonghua Liu,
Ying Wang
Abstract:
Taste peptides have emerged as promising natural flavoring agents attributed to their unique organoleptic properties, high safety profile, and potential health benefits. However, the de novo identification of taste peptides derived from animal, plant, or microbial sources remains a time-consuming and resource-intensive process, significantly impeding their widespread application in the food indust…
▽ More
Taste peptides have emerged as promising natural flavoring agents attributed to their unique organoleptic properties, high safety profile, and potential health benefits. However, the de novo identification of taste peptides derived from animal, plant, or microbial sources remains a time-consuming and resource-intensive process, significantly impeding their widespread application in the food industry. Here, we present TastePepAI, a comprehensive artificial intelligence framework for customized taste peptide design and safety assessment. As the key element of this framework, a loss-supervised adaptive variational autoencoder (LA-VAE) is implemented to efficiently optimizes the latent representation of sequences during training and facilitates the generation of target peptides with desired taste profiles. Notably, our model incorporates a novel taste-avoidance mechanism, allowing for selective flavor exclusion. Subsequently, our in-house developed toxicity prediction algorithm (SpepToxPred) is integrated in the framework to undergo rigorous safety evaluation of generated peptides. Using this integrated platform, we successfully identified 73 peptides exhibiting sweet, salty, and umami, significantly expanding the current repertoire of taste peptides. This work demonstrates the potential of TastePepAI in accelerating taste peptide discovery for food applications and provides a versatile framework adaptable to broader peptide engineering challenges.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge
Authors:
Muhammad Imran,
Jonathan R. Krebs,
Vishal Balaji Sivaraman,
Teng Zhang,
Amarjeet Kumar,
Walker R. Ueland,
Michael J. Fassler,
Jinlong Huang,
Xiao Sun,
Lisheng Wang,
Pengcheng Shi,
Maximilian Rokuss,
Michael Baumgartner,
Yannick Kirchhof,
Klaus H. Maier-Hein,
Fabian Isensee,
Shuolin Liu,
Bing Han,
Bong Thanh Nguyen,
Dong-jin Shin,
Park Ji-Woo,
Mathew Choi,
Kwang-Hyun Uhm,
Sung-Jea Ko,
Chanwoong Lee
, et al. (38 additional authors not shown)
Abstract:
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently…
▽ More
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression
Authors:
Donghe Chen,
Jiaxuan Yue,
Tengjie Zheng,
Lanxuan Wang,
Lin Cheng
Abstract:
In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas. This imbalance presents significant challenges for existing classification methods with clear class boundaries, while highlighting a scarcity of approaches specifically designed for imbalanced regression probl…
▽ More
In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas. This imbalance presents significant challenges for existing classification methods with clear class boundaries, while highlighting a scarcity of approaches specifically designed for imbalanced regression problems. To better address these issues, we introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density. Furthermore, we propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness. Through several experiments, EDS has shown its effectiveness, and the related code and dataset can be accessed at https://anonymous.4open.science/r/Error-Distribution-Smoothing-762F.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
The interaction between rough vortex patch and boundary layer
Authors:
Jingchi Huang,
Chao Wang,
Jingchao Yue,
Zhifei Zhang
Abstract:
In this paper, we study the asymptotic behavior of the solution of the Navier-Stokes equations in the half plane at high Reynolds number regime, when the initial vorticity belongs to the Yudovich class and is supported away from the boundary. We prove the $L^p$ ($2\leq p< \infty$) convergence from the Naiver-Stokes equations to the Euler equations. The key point is to introduce a good functional f…
▽ More
In this paper, we study the asymptotic behavior of the solution of the Navier-Stokes equations in the half plane at high Reynolds number regime, when the initial vorticity belongs to the Yudovich class and is supported away from the boundary. We prove the $L^p$ ($2\leq p< \infty$) convergence from the Naiver-Stokes equations to the Euler equations. The key point is to introduce a good functional framework to control the interaction between rough vortex patch and boundary layer.
△ Less
Submitted 4 December, 2024; v1 submitted 4 December, 2024;
originally announced December 2024.
-
GloFinder: AI-empowered QuPath Plugin for WSI-level Glomerular Detection, Visualization, and Curation
Authors:
Jialin Yue,
Tianyuan Yao,
Ruining Deng,
Siqi Lu,
Junlin Guo,
Quan Liu,
Mengmeng Yin,
Juming Xiong,
Haichun Yang,
Yuankai Huo
Abstract:
Artificial intelligence (AI) has demonstrated significant success in automating the detection of glomeruli, the key functional units of the kidney, from whole slide images (WSIs) in kidney pathology. However, existing open-source tools are often distributed as source code or Docker containers, requiring advanced programming skills that hinder accessibility for non-programmers, such as clinicians.…
▽ More
Artificial intelligence (AI) has demonstrated significant success in automating the detection of glomeruli, the key functional units of the kidney, from whole slide images (WSIs) in kidney pathology. However, existing open-source tools are often distributed as source code or Docker containers, requiring advanced programming skills that hinder accessibility for non-programmers, such as clinicians. Additionally, current models are typically trained on a single dataset and lack flexibility in adjusting confidence levels for predictions. To overcome these challenges, we introduce GloFinder, a QuPath plugin designed for single-click automated glomeruli detection across entire WSIs with online editing through the graphical user interface (GUI). GloFinder employs CircleNet, an anchor-free detection framework utilizing circle representations for precise object localization, with models trained on approximately 160,000 manually annotated glomeruli. To further enhance accuracy, the plugin incorporates Weighted Circle Fusion (WCF), an ensemble method that combines confidence scores from multiple CircleNet models to produce refined predictions, achieving superior performance in glomerular detection. GloFinder enables direct visualization and editing of results in QuPath, facilitating seamless interaction for clinicians and providing a powerful tool for nephropathology research and clinical practice.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Dual-Representation Interaction Driven Image Quality Assessment with Restoration Assistance
Authors:
Jingtong Yue,
Xin Lin,
Zijiu Yang,
Chao Ren
Abstract:
No-Reference Image Quality Assessment for distorted images has always been a challenging problem due to image content variance and distortion diversity. Previous IQA models mostly encode explicit single-quality features of synthetic images to obtain quality-aware representations for quality score prediction. However, performance decreases when facing real-world distortion and restored images from…
▽ More
No-Reference Image Quality Assessment for distorted images has always been a challenging problem due to image content variance and distortion diversity. Previous IQA models mostly encode explicit single-quality features of synthetic images to obtain quality-aware representations for quality score prediction. However, performance decreases when facing real-world distortion and restored images from restoration models. The reason is that they do not consider the degradation factors of the low-quality images adequately. To address this issue, we first introduce the DRI method to obtain degradation vectors and quality vectors of images, which separately model the degradation and quality information of low-quality images. After that, we add the restoration network to provide the MOS score predictor with degradation information. Then, we design the Representation-based Semantic Loss (RS Loss) to assist in enhancing effective interaction between representations. Extensive experimental results demonstrate that the proposed method performs favorably against existing state-of-the-art models on both synthetic and real-world datasets.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Cross-organ Deployment of EOS Detection AI without Retraining: Feasibility and Limitation
Authors:
Yifei Wu,
Juming Xiong,
Tianyuan Yao,
Ruining Deng,
Junlin Guo,
Jialin Yue,
Naweed Chowdhury,
Yuankai Huo
Abstract:
Chronic rhinosinusitis (CRS) is characterized by persistent inflammation in the paranasal sinuses, leading to typical symptoms of nasal congestion, facial pressure, olfactory dysfunction, and discolored nasal drainage, which can significantly impact quality-of-life. Eosinophils (Eos), a crucial component in the mucosal immune response, have been linked to disease severity in CRS. The diagnosis of…
▽ More
Chronic rhinosinusitis (CRS) is characterized by persistent inflammation in the paranasal sinuses, leading to typical symptoms of nasal congestion, facial pressure, olfactory dysfunction, and discolored nasal drainage, which can significantly impact quality-of-life. Eosinophils (Eos), a crucial component in the mucosal immune response, have been linked to disease severity in CRS. The diagnosis of eosinophilic CRS typically uses a threshold of 10-20 eos per high-power field (HPF). However, manually counting Eos in histological samples is laborious and time-intensive, making the use of AI-driven methods for automated evaluations highly desirable. Interestingly, eosinophils are predominantly located in the gastrointestinal (GI) tract, which has prompted the release of numerous deep learning models trained on GI data. This study leverages a CircleSnake model initially trained on upper-GI data to segment Eos cells in whole slide images (WSIs) of nasal tissues. It aims to determine the extent to which Eos segmentation models developed for the GI tract can be adapted to nasal applications without retraining. The experimental results show promising accuracy in some WSIs, although, unsurprisingly, the performance varies across cases. This paper details these performance outcomes, delves into the reasons for such variations, and aims to provide insights that could guide future development of deep learning models for eosinophilic CRS.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Bidirectional Optimization onto Thermoelectric Performance via Hydrostatic-Pressure in Chalcopyrite AgXTe2 (X=In, Ga)
Authors:
Siqi Guo,
Jincheng Yue,
Jiongzhi Zheng,
Hui Zhang,
Ning Wang,
Junda Li,
Yanhui Liu,
Tian Cui
Abstract:
Pressure tuning has emerged as a powerful strategy for manipulating the thermoelectric properties of materials by inducing structural and electronic modifications. Herein, we systematically investigate the transport properties and thermoelectric performance concerning lattice distortions induced by hydrostatic pressure in Ag-based chalcopyrite AgXTe2 (X=In, Ga). The findings reveal that the lattic…
▽ More
Pressure tuning has emerged as a powerful strategy for manipulating the thermoelectric properties of materials by inducing structural and electronic modifications. Herein, we systematically investigate the transport properties and thermoelectric performance concerning lattice distortions induced by hydrostatic pressure in Ag-based chalcopyrite AgXTe2 (X=In, Ga). The findings reveal that the lattice distortion in AgXTe2 exhibits distinct behaviors under lattice compression, diverging from trends observed at ambient pressure. Importantly, the hydrostatic pressure breaks the phenomenally negative correlation between thermal conductivity and lattice distortion. Pressure-induced softening of low-frequency acoustic phonons broadens the low-energy phonon spectrum, enhancing interactions between acoustic and optical phonons. Such broadening substantially increases the number of available three-phonon scattering channels, resulting in a marked reduction in thermal conductivity. Meanwhile, we establish a macroscopic connection between metavalent bonding and anharmonicity, providing an indirect explanation for lattice anharmonicity through pressure-driven transferred charge. Additionally, the applied pressure achieves a notable net increase in the power factor despite the strong coupling of electrical transport parameters, which underscores the potential for bidirectional optimization of transport properties in AgXTe2. As a result, the maximum ZT value of AgInTe2 is nearly doubled, demonstrating that pressure modulation is a powerful strategy for enhancing thermoelectric performance. Our work not only establishes the link between pressure, lattice dynamics, and thermoelectric properties within chalcopyrite AgXTe2, but also inspires the exploration of pressure-related optimization strategies for conventional thermoelectric materials.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Airborne Biomarker Localization Engine (ABLE) for Open Air Point-of-Care Detection
Authors:
Jingcheng Ma,
Megan Laune,
Pengju Li,
Jing Lu,
Jiping Yue,
Yueyue Yu,
Jessica Cleary,
Kaitlyn Oliphant,
Zachary Kessler,
Erika C. Claud,
Bozhi Tian
Abstract:
Unlike biomarkers in biofluids, airborne biomarkers are dilute and difficult to trace. Detecting diverse airborne biomarkers with sufficient sensitivity typically relies on bulky and expensive equipment like mass spectrometers that remain inaccessible to the general population. Here, we introduce Airborne Biomarker Localization Engine (ABLE), a simple, affordable, and portable platform that can de…
▽ More
Unlike biomarkers in biofluids, airborne biomarkers are dilute and difficult to trace. Detecting diverse airborne biomarkers with sufficient sensitivity typically relies on bulky and expensive equipment like mass spectrometers that remain inaccessible to the general population. Here, we introduce Airborne Biomarker Localization Engine (ABLE), a simple, affordable, and portable platform that can detect both volatile, non-volatile, molecular, and particulate biomarkers in about 15 minutes. ABLE significantly improves gas detection limits by converting dilute gases into droplets by water condensation, producing concentrated aqueous samples that are easy to be tested. Fundamental studies of multiphase condensation revealed unexpected stability in condensate-trapped biomarkers, making ABLE a reliable, accessible, and high-performance system for open-air-based biosensing applications such as non-contact infant healthcare, pathogen detection in public space, and food safety.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Authors:
Junpeng Yue,
Xinrun Xu,
Börje F. Karlsson,
Zongqing Lu
Abstract:
MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MLLM As ReTriever (MART), which enhances the pe…
▽ More
MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MLLM As ReTriever (MART), which enhances the performance of embodied agents by utilizing interaction data to fine-tune an MLLM retriever based on preference learning, such that the retriever fully considers the effectiveness of trajectories and prioritizes them for unseen tasks. We also introduce Trajectory Abstraction, a mechanism that leverages MLLMs' summarization capabilities to represent trajectories with fewer tokens while preserving key information, enabling agents to better comprehend milestones in the trajectory. Experimental results across various environments demonstrate our method significantly improves task success rates in unseen scenes compared to baseline methods. This work presents a new paradigm for multimodal retrieval in embodied agents, by fine-tuning a general-purpose MLLM as the retriever to assess trajectory effectiveness. All the code for benchmark tasks, simulator modifications, and the MLLM retriever is available at https://github.com/PKU-RL/MART.
△ Less
Submitted 22 May, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Spanning weakly even trees of graphs
Authors:
Jiangdong Ai,
M. N. Ellingham,
Zhipeng Gao,
Yixuan Huang,
Xiangzhou Liu,
Songling Shan,
Simon Špacapan,
Jun Yue
Abstract:
Let $G$ be a graph (with multiple edges allowed) and let $T$ be a tree in $G$. We say that $T$ is $\textit{even}$ if every leaf of $T$ belongs to the same part of the bipartition of $T$, and that $T$ is $\textit{weakly even}$ if every leaf of $T$ that has maximum degree in $G$ belongs to the same part of the bipartition of $T$. We confirm two recent conjectures of Jackson and Yoshimoto by showing…
▽ More
Let $G$ be a graph (with multiple edges allowed) and let $T$ be a tree in $G$. We say that $T$ is $\textit{even}$ if every leaf of $T$ belongs to the same part of the bipartition of $T$, and that $T$ is $\textit{weakly even}$ if every leaf of $T$ that has maximum degree in $G$ belongs to the same part of the bipartition of $T$. We confirm two recent conjectures of Jackson and Yoshimoto by showing that every connected graph that is not a regular bipartite graph has a spanning weakly even tree.
△ Less
Submitted 17 October, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Efficient Cross-layer Thermal Transport with Atypical Glassy-like Phenomena in Crystalline CsCu$_4$Se$_3$
Authors:
Jincheng Yue,
Yanhui Liu,
Jiongzhi Zheng
Abstract:
Understanding lattice dynamics and thermal transport in crystalline compounds with intrinsically low lattice thermal conductivity ($κ_L$) is crucial in condensed matter physics. In this work, we investigate the lattice thermal conductivity of crystalline CsCu$_4$Se$_3$ by coupling first-principles anharmonic lattice dynamics with a unified theory of thermal transport. We consider the effects of bo…
▽ More
Understanding lattice dynamics and thermal transport in crystalline compounds with intrinsically low lattice thermal conductivity ($κ_L$) is crucial in condensed matter physics. In this work, we investigate the lattice thermal conductivity of crystalline CsCu$_4$Se$_3$ by coupling first-principles anharmonic lattice dynamics with a unified theory of thermal transport. We consider the effects of both cubic and quartic anharmonicity on phonon scattering rates and energy shifts, as well as the diagonal and off-diagonal terms of heat flux operators. Our results reveal that the vibrational properties of CsCu$_4$Se$_3$ are characterized by strong anharmonicity and wave-like phonon tunneling. In particular, the strong anharmonic scattering induced by Cu- and Cs-dominated phonon modes plays a non-negligible role in suppressing particle-like propagation. Moreover, the coherence-driven conductivity dominates the total thermal conductivity along the $z$-axis, leading to an anomalous, wide-temperature-range (100-700 K) glassy-like thermal transport. Importantly, the significant coherence contribution, resulting from the coupling of distinct vibrational eigenstates, facilitates effective thermal transport across layers, sharply contrasting with traditional layered materials. As a result, the non-monotonic temperature dependence of coherences' thermal conductivity results from the combined effects of anharmonic scattering rates and anharmonic phonon renormalization. Our work not only reveals the significant contributions from the off-diagonal terms of heat flux operators in crystalline CsCu$_4$Se$_3$, but also explains the non-monotonic relationship between wave-like thermal conductivity and anharmonic scattering, providing insights into the microscopic mechanisms driving anomalous heat transport.
△ Less
Submitted 14 November, 2024; v1 submitted 14 September, 2024;
originally announced September 2024.
-
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
Authors:
Bojian Li,
Bo Liu,
Xinning Yao,
Jinghua Yue,
Fugen Zhou
Abstract:
Depth estimation is a cornerstone of 3D reconstruction and plays a vital role in minimally invasive endoscopic surgeries. However, most current depth estimation networks rely on traditional convolutional neural networks, which are limited in their ability to capture global information. Foundation models offer a promising approach to enhance depth estimation, but those models currently available ar…
▽ More
Depth estimation is a cornerstone of 3D reconstruction and plays a vital role in minimally invasive endoscopic surgeries. However, most current depth estimation networks rely on traditional convolutional neural networks, which are limited in their ability to capture global information. Foundation models offer a promising approach to enhance depth estimation, but those models currently available are primarily trained on natural images, leading to suboptimal performance when applied to endoscopic images. In this work, we introduce a novel fine-tuning strategy for the Depth Anything Model and integrate it with an intrinsic-based unsupervised monocular depth estimation framework. Our approach includes a low-rank adaptation technique based on random vectors, which improves the model's adaptability to different scales. Additionally, we propose a residual block built on depthwise separable convolution to compensate for the transformer's limited ability to capture local features. Our experimental results on the SCARED dataset and Hamlyn dataset show that our method achieves state-of-the-art performance while minimizing the number of trainable parameters. Applying this method in minimally invasive endoscopic surgery can enhance surgeons' spatial awareness, thereby improving the precision and safety of the procedures.
△ Less
Submitted 5 March, 2025; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Ultrafast symmetry control in photoexcited quantum dots
Authors:
Burak Guzelturk,
Joshua Portner,
Justin Ondry,
Samira Ghanbarzadeh,
Mia Tarantola,
Ahhyun Jeong,
Thomas Field,
Alicia M. Chandler,
Eliza Wieman,
Thomas R. Hopper,
Nicolas E. Watkins,
Jin Yue,
Xinxin Cheng,
Ming-Fu Lin,
Duan Luo,
Patrick L. Kramer,
Xiaozhe Shen,
Alexander H. Reid,
Olaf Borkiewicz,
Uta Ruett,
Xiaoyi Zhang,
Aaron M. Lindenberg,
Jihong Ma,
Richard Schaller,
Dmitri V. Talapin
, et al. (1 additional authors not shown)
Abstract:
Symmetry control is essential for realizing unconventional properties, such as ferroelectricity, nonlinear optical responses, and complex topological order, thus it holds promise for the design of emerging quantum and photonic systems. Nevertheless, fast and reversible control of symmetry in materials remains a challenge, especially for nanoscale systems. Here, we unveil reversible symmetry change…
▽ More
Symmetry control is essential for realizing unconventional properties, such as ferroelectricity, nonlinear optical responses, and complex topological order, thus it holds promise for the design of emerging quantum and photonic systems. Nevertheless, fast and reversible control of symmetry in materials remains a challenge, especially for nanoscale systems. Here, we unveil reversible symmetry changes in colloidal lead chalcogenide quantum dots on picosecond timescales. Using a combination of ultrafast electron diffraction and total X-ray scattering, in conjunction with atomic-scale structural modeling and first-principles calculations, we reveal that symmetry-broken lead sulfide quantum dots restore to a centrosymmetric phase upon photoexcitation. The symmetry restoration is driven by photoexcited electronic carriers, which suppress lead off-centering for about 100 ps. Furthermore, the change in symmetry is closely correlated with the electronic properties as shown by transient optical measurements. Overall, this study elucidates reversible symmetry changes in colloidal quantum dots, and more broadly defines a new methodology to optically control symmetry in nanoscale systems on ultrafast timescales.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
gWaveNet: Classification of Gravity Waves from Noisy Satellite Data using Custom Kernel Integrated Deep Learning Method
Authors:
Seraj Al Mahmud Mostafa,
Omar Faruque,
Chenxi Wang,
Jia Yue,
Sanjay Purushotham,
Jianwu Wang
Abstract:
Atmospheric gravity waves occur in the Earths atmosphere caused by an interplay between gravity and buoyancy forces. These waves have profound impacts on various aspects of the atmosphere, including the patterns of precipitation, cloud formation, ozone distribution, aerosols, and pollutant dispersion. Therefore, understanding gravity waves is essential to comprehend and monitor changes in a wide r…
▽ More
Atmospheric gravity waves occur in the Earths atmosphere caused by an interplay between gravity and buoyancy forces. These waves have profound impacts on various aspects of the atmosphere, including the patterns of precipitation, cloud formation, ozone distribution, aerosols, and pollutant dispersion. Therefore, understanding gravity waves is essential to comprehend and monitor changes in a wide range of atmospheric behaviors. Limited studies have been conducted to identify gravity waves from satellite data using machine learning techniques. Particularly, without applying noise removal techniques, it remains an underexplored area of research. This study presents a novel kernel design aimed at identifying gravity waves within satellite images. The proposed kernel is seamlessly integrated into a deep convolutional neural network, denoted as gWaveNet. Our proposed model exhibits impressive proficiency in detecting images containing gravity waves from noisy satellite data without any feature engineering. The empirical results show our model outperforms related approaches by achieving over 98% training accuracy and over 94% test accuracy which is known to be the best result for gravity waves detection up to the time of this work. We open sourced our code at https://rb.gy/qn68ku.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration
Authors:
Xin Lin,
Yuyan Zhou,
Jingtong Yue,
Chao Ren,
Kelvin C. K. Chan,
Lu Qi,
Ming-Hsuan Yang
Abstract:
Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-…
▽ More
Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-collaboration (SC) strategy for existing restoration models. This strategy utilizes information from the previous stage as feedback to guide subsequent stages, achieving significant performance improvement without increasing the framework's inference complexity. The SC strategy comprises a prompt learning (PL) module and a restorer ($Res$). It iteratively replaces the previous less powerful fixed restorer $\overline{Res}$ in the PL module with a more powerful $Res$. The enhanced PL module generates better pseudo-degraded/clean image pairs, leading to a more powerful $Res$ for the next iteration. Our SC can significantly improve the $Res$'s performance by over 1.5 dB without adding extra parameters or computational complexity during inference. Meanwhile, existing self-ensemble (SE) and our SC strategies enhance the performance of pre-trained restorers from different perspectives. As SE increases computational complexity during inference, we propose a re-boosting module to the SC (Reb-SC) to improve the SC strategy further by incorporating SE into SC without increasing inference time. This approach further enhances the restorer's performance by approximately 0.3 dB. Extensive experimental results on restoration tasks demonstrate that the proposed model performs favorably against existing state-of-the-art unsupervised restoration methods. Source code and trained models are publicly available at: \url{https://github.com/linxin0/RSCP2GAN}.
△ Less
Submitted 17 August, 2024;
originally announced August 2024.
-
A short note on spanning even trees
Authors:
Jiangdong Ai,
Zhipeng Gao,
Xiangzhou Liu,
Jun Yue
Abstract:
We call a tree $T$ is \emph{even} if every pair of its leaves is joined by a path of even length. Jackson and Yoshimoto~[J. Graph Theory, 2024] conjectured that every $r$-regular nonbipartite connected graph $G$ has a spanning even tree. They verified this conjecture for the case when $G$ has a $2$-factor. In this paper, we prove that the conjecture holds when $r$ is odd, thereby resolving the onl…
▽ More
We call a tree $T$ is \emph{even} if every pair of its leaves is joined by a path of even length. Jackson and Yoshimoto~[J. Graph Theory, 2024] conjectured that every $r$-regular nonbipartite connected graph $G$ has a spanning even tree. They verified this conjecture for the case when $G$ has a $2$-factor. In this paper, we prove that the conjecture holds when $r$ is odd, thereby resolving the only remaining unsolved case for this conjecture.
△ Less
Submitted 10 September, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Egocentric Vision Language Planning
Authors:
Zhirui Fang,
Ming Yang,
Weishuai Zeng,
Boyu Li,
Junpeng Yue,
Ziluo Ding,
Xiu Li,
Zongqing Lu
Abstract:
We explore leveraging large multi-modal models (LMMs) and text2image models to build a more general embodied agent. LMMs excel in planning long-horizon tasks over symbolic abstractions but struggle with grounding in the physical world, often failing to accurately identify object positions in images. A bridge is needed to connect LMMs to the physical world. The paper proposes a novel approach, egoc…
▽ More
We explore leveraging large multi-modal models (LMMs) and text2image models to build a more general embodied agent. LMMs excel in planning long-horizon tasks over symbolic abstractions but struggle with grounding in the physical world, often failing to accurately identify object positions in images. A bridge is needed to connect LMMs to the physical world. The paper proposes a novel approach, egocentric vision language planning (EgoPlan), to handle long-horizon tasks from an egocentric perspective in varying household scenarios. This model leverages a diffusion model to simulate the fundamental dynamics between states and actions, integrating techniques like style transfer and optical flow to enhance generalization across different environmental dynamics. The LMM serves as a planner, breaking down instructions into sub-goals and selecting actions based on their alignment with these sub-goals, thus enabling more generalized and effective decision-making. Experiments show that EgoPlan improves long-horizon task success rates from the egocentric view compared to baselines across household scenarios.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation
Authors:
Zichao Long,
Lin Li,
Lei Han,
Xianglong Meng,
Chongjun Ding,
Ruiyan Li,
Wu Jiang,
Fuchen Ding,
Jiaqing Yue,
Zhichao Li,
Yisheng Hu,
Ding Li,
Heng Liao
Abstract:
Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame…
▽ More
Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parameter sensitivity analysis is complex and inefficient. Inspired by differentiable programming and leveraging the ecosystem benefits of open-source software, we propose an equations system constructor using the computational graph representation, along with its JSON format netlist, to address these limitations. This representation allows for runtime dependencies between signals and subcircuit/device parameters. The proposed method streamlines the model development process and facilitates end-to-end computation of gradients of equations remainders with respect to parameters. This paper discusses in detail the overarching concept of hierarchical subcircuit/device decomposition and nested invocation by drawing parallels to functions in programming languages, and introduces rules for parameters passing and gradient propagation across hierarchical circuit modules. The presented numerical examples, including (1) an uncoupled CMOS model representation using "equivalent circuit decomposition+dynamic parameters" and (2) operational amplifier (OpAmp) auto device sizing, have demonstrated that the proposed method supports circuit simulation and design and particularly subcircuit modeling with improved efficiency, simplicity, and decoupling compared to existing techniques.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results
Authors:
Jialin Yue,
Tianyuan Yao,
Ruining Deng,
Quan Liu,
Juming Xiong,
Junlin Guo,
Haichun Yang,
Yuankai Huo
Abstract:
Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn't crucial. Unfortunately, this widely adopted strategy is not re…
▽ More
Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn't crucial. Unfortunately, this widely adopted strategy is not readily available for combining circle representations. In this paper, we propose Weighted Circle Fusion (WCF), a simple approach for merging predictions from various circle detection models. Our method leverages confidence scores associated with each proposed bounding circle to generate averaged circles. We evaluate our method on a proprietary dataset for glomerular detection in whole slide imaging (WSI) and find a performance gain of 5% compared to existing ensemble methods. Additionally, we assess the efficiency of two annotation methods, fully manual annotation and a human-in-the-loop (HITL) approach, in labeling 200,000 glomeruli. The HITL approach, which integrates machine learning detection with human verification, demonstrated remarkable improvements in annotation efficiency. The Weighted Circle Fusion technique not only enhances object detection precision but also notably reduces false detections, presenting a promising direction for future research and application in pathological image analysis. The source code has been made publicly available at https://github.com/hrlblab/WeightedCircleFusion
△ Less
Submitted 27 November, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems
Authors:
Ruochen Jiao,
Shaoyuan Xie,
Justin Yue,
Takami Sato,
Lixu Wang,
Yixuan Wang,
Qi Alfred Chen,
Qi Zhu
Abstract:
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied artificial intelligence, especially when fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. However, this fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyb…
▽ More
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied artificial intelligence, especially when fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. However, this fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems. In this work, we propose the first comprehensive framework for Backdoor Attacks against LLM-based Decision-making systems (BALD) in embodied AI, systematically exploring the attack surfaces and trigger mechanisms. Specifically, we propose three distinct attack mechanisms: word injection, scenario manipulation, and knowledge injection, targeting various components in the LLM-based decision-making pipeline. We perform extensive experiments on representative LLMs (GPT-3.5, LLaMA2, PaLM2) in autonomous driving and home robot tasks, demonstrating the effectiveness and stealthiness of our backdoor triggers across various attack channels, with cases like vehicles accelerating toward obstacles and robots placing knives on beds. Our word and knowledge injection attacks achieve nearly 100% success rate across multiple models and datasets while requiring only limited access to the system. Our scenario manipulation attack yields success rates exceeding 65%, reaching up to 90%, and does not require any runtime system intrusion. We also assess the robustness of these attacks against defenses, revealing their resilience. Our findings highlight critical security vulnerabilities in embodied LLM systems and emphasize the urgent need for safeguarding these systems to mitigate potential risks.
△ Less
Submitted 30 April, 2025; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Modeling of Nitric Oxide Infrared radiative flux in lower thermosphere: a machine learning perspective
Authors:
Dayakrishna Nailwal,
MV Sunil Krishna,
Alok Kumar Ranjan,
Jia Yue
Abstract:
Nitric Oxide (NO) significantly impacts energy distribution and chemical processes in the mesosphere and lower thermosphere (MLT). During geomagnetic storms, a substantial influx of energy in the thermosphere leads to an increase in NO infrared emissions. Accurately predicting the radiative flux of Nitric Oxide is crucial for understanding the thermospheric energy budget, particularly during extre…
▽ More
Nitric Oxide (NO) significantly impacts energy distribution and chemical processes in the mesosphere and lower thermosphere (MLT). During geomagnetic storms, a substantial influx of energy in the thermosphere leads to an increase in NO infrared emissions. Accurately predicting the radiative flux of Nitric Oxide is crucial for understanding the thermospheric energy budget, particularly during extreme space weather events. With advancements in computational techniques, machine learning (ML) has become a highly effective tool for space weather forecasting. This effort becomes even more worthwhile considering the availability of two decades of continuous NO infrared emissions measurement by TIMED/SABER along with several other key thermospheric variables. We present the scheme of development of an ML-based predictive model for Nitric Oxide Infrared Radiative Flux (NOIRF). Various ML algorithms have been tested for better predictive ability, and an optimized model (NOEMLM) has been developed for the study of NOIRF. This model is able to extract the underlying relationships between the input features and effectively predict the NOIRF. The NOEMLM predictions have very good agreements with SABER observation during quiet time as well as geomagnetic storms. In comparison with the existing TIEGCM model, NOEMLM has very good performance, especially during extreme space weather conditions. The results of this study suggest that utilizing geomagnetic and space weather indices with ML/AI can serve as superior parameters for studying the upper atmosphere, as compared to focusing on specific species having complex chemical processes and associated uncertainties in constituents. ML techniques can effectively carry out the analysis with greater ease than traditional chemical studies.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
TauAD: MRI-free Tau Anomaly Detection in PET Imaging via Conditioned Diffusion Models
Authors:
Lujia Zhong,
Shuo Huang,
Jiaxin Yue,
Jianwei Zhang,
Zhiwei Deng,
Wenhao Chi,
Yonggang Shi
Abstract:
The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a…
▽ More
The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a high-resolution MRI is required to carry out conventional tau PET analysis, which is not commonly acquired in clinical practices and may not be acquired for many elderly patients with dementia due to strong motion artifacts, claustrophobia, or certain metal implants. In this work, we propose a novel conditional diffusion model to perform MRI-free anomaly detection from tau PET imaging data. By including individualized conditions and two complementary loss maps from pseudo-healthy and pseudo-unhealthy reconstructions, our model computes an anomaly map across the entire brain area that allows simply training a support vector machine (SVM) for classifying disease severity. We train our model on ADNI subjects (n=534) and evaluate its performance on a separate dataset from the preclinical subjects of the A4 clinical trial (n=447). We demonstrate that our method outperforms baseline generative models and the conventional Z-score-based method in anomaly localization without mis-detecting off-target bindings in sub-cortical and out-of-brain areas. By classifying the A4 subjects according to their anomaly map using the SVM trained on ADNI data, we show that our method can successfully group preclinical subjects with significantly different cognitive functions, which further demonstrates the effectiveness of our method in capturing biologically relevant anomaly in tau PET imaging.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video
Authors:
Hongsheng Wang,
Xiang Cai,
Xi Sun,
Jinhong Yue,
Zhanyun Tang,
Shengyu Zhang,
Feng Lin,
Fei Wu
Abstract:
Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom…
▽ More
Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.
△ Less
Submitted 21 June, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks
Authors:
Haijiang Tian,
Jingkun Yue,
Xiaohong Liu,
Guoxing Yang,
Zeyu Jiang,
Guangyu Wang
Abstract:
Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). How…
▽ More
Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). However, LP only considers the output after feature extraction. Yet, there exists a gap between input medical images and natural pretrained vision model. We introduce visual prompting (VP) to fill in the gap, and analyze the strategies of coupling between LP and VP. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images, naming this joint training strategy MoVL (Mixture of Visual Prompting and Linear Probe). We experiment on 4 medical image classification datasets, with two mainstream architectures, ResNet and CLIP. Results shows that without changing the parameters and architecture of backbone model and with less parameters, there is potential for MoVL to achieve full finetune (FF) accuracy (on four medical datasets, average 90.91% for MoVL and 91.13% for FF). On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Hierarchical Characterization of Thermoelectric Performance in Copper-Based Chalcogenide CsCu$_3$S$_2$: Unveiling the role of Anharmonic Lattice Dynamics
Authors:
Jincheng Yue,
Jiongzhi Zheng,
Junda Li,
Xingchen Shen,
Wenling Ren,
Yanhui Liu,
Tian Cui
Abstract:
We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillatio…
▽ More
We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillations of Cs atoms. Notably, the competition between phonon hardening described by the loop diagram and softening induced by the bubble diagram significantly influences particle-like propagation, predominantly reflected in group velocity and energy-conservation rule. Additionally, the electrical transport properties are determined by employing the precise momentum relaxation-time approximation (MRTA). At high temperatures, the thermoelectric performance of $p$-type CsCu$_3$S$_2$ reaches its optimum theoretical value of 0.94 along the in-plane direction based on advanced phonon renormalization theory. In striking contrast, the harmonic approximation theory significantly overestimates the thermoelectric efficiency at the same temperatures, rendering it an impractical expectation. Conversely, the first-order renormalization approach leads to a serious underestimation of the thermoelectric properties due to the over-correction of phonon energy. Our study not only reveals the pivotal role of anharmonic lattice dynamics in accurately assessing thermoelectric properties but also underscores the potential thermoelectric applications for novel copper-based chalcogenides.
△ Less
Submitted 6 September, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives
Authors:
Yidan Liu,
Jun Yue,
Shaobo Xia,
Pedram Ghamisi,
Weiying Xie,
Leyuan Fang
Abstract:
As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing (RS) community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in researc…
▽ More
As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing (RS) community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on diffusion models in the field of RS, it is necessary to conduct a comprehensive review of existing diffusion model-based RS papers, to help researchers recognize the potential of diffusion models and provide some directions for further exploration. Specifically, this article first introduces the theoretical background of diffusion models, and then systematically reviews the applications of diffusion models in RS, including image generation, enhancement, and interpretation. Finally, the limitations of existing RS diffusion models and worthy research directions for further exploration are discussed and summarized.
△ Less
Submitted 11 November, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Stability and noncentered PT symmetry of real topological phases
Authors:
S. J. Yue,
Qing Liu,
Shengyuan A. Yang,
Y. X. Zhao
Abstract:
Real topological phases protected by the spacetime inversion (P T) symmetry are a current research focus. The basis is that the P T symmetry endows a real structure in momentum space, which leads to Z2 topological classifications in 1D and 2D. Here, we provide solutions to two outstanding problems in the diagnosis of real topology. First, based on the stable equivalence in K-theory, we clarify tha…
▽ More
Real topological phases protected by the spacetime inversion (P T) symmetry are a current research focus. The basis is that the P T symmetry endows a real structure in momentum space, which leads to Z2 topological classifications in 1D and 2D. Here, we provide solutions to two outstanding problems in the diagnosis of real topology. First, based on the stable equivalence in K-theory, we clarify that the 2D topological invariant remains well defined in the presence of nontrivial 1D invariant, and we develop a general numerical approach for its evaluation, which was hitherto unavailable. Second, under the unit-cell convention, noncentered P T symmetries assume momentum dependence, which violates the presumption in previous methods for computing the topological invariants. We clarify the classifications for this case and formulate the invariants by introducing a twisted Wilson-loop operator for both 1D and 2D. A simple model on a rectangular lattice is constructed to demonstrate our theory, which can be readily realized using artificial crystals.
△ Less
Submitted 16 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to…
▽ More
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment
Authors:
Z. H. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept of detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range…
▽ More
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept of detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. With the higher radiation background but lower energy threshold (160 eV), CDEX-10 fill a part of the gap in the previous work. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s.
△ Less
Submitted 22 September, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
Human Motion Prediction under Unexpected Perturbation
Authors:
Jiangbei Yue,
Baiyi Li,
Julien Pettré,
Armin Seyfried,
He Wang
Abstract:
We investigate a new task in human motion prediction, which is predicting motions under unexpected physical perturbation potentially involving multiple people. Compared with existing research, this task involves predicting less controlled, unpremeditated and pure reactive motions in response to external impact and how such motions can propagate through people. It brings new challenges such as data…
▽ More
We investigate a new task in human motion prediction, which is predicting motions under unexpected physical perturbation potentially involving multiple people. Compared with existing research, this task involves predicting less controlled, unpremeditated and pure reactive motions in response to external impact and how such motions can propagate through people. It brings new challenges such as data scarcity and predicting complex interactions. To this end, we propose a new method capitalizing differential physics and deep neural networks, leading to an explicit Latent Differential Physics (LDP) model. Through experiments, we demonstrate that LDP has high data efficiency, outstanding prediction accuracy, strong generalizability and good explainability. Since there is no similar research, a comprehensive comparison with 11 adapted baselines from several relevant domains is conducted, showing LDP outperforming existing research both quantitatively and qualitatively, improving prediction accuracy by as much as 70%, and demonstrating significantly stronger generalization.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics
Authors:
Jiaqi Yue,
Chunhui Zhao,
Jiancheng Zhao,
Biao Huang
Abstract:
Generalized zero-shot learning (GZSL) focuses on recognizing seen and unseen classes against domain shift problem where data of unseen classes may be misclassified as seen classes. However, existing GZSL is still limited to seen domains. In the current work, we study cross-domain GZSL (CDGZSL) which addresses GZSL towards unseen domains. Different from existing GZSL methods, CDGZSL constructs a co…
▽ More
Generalized zero-shot learning (GZSL) focuses on recognizing seen and unseen classes against domain shift problem where data of unseen classes may be misclassified as seen classes. However, existing GZSL is still limited to seen domains. In the current work, we study cross-domain GZSL (CDGZSL) which addresses GZSL towards unseen domains. Different from existing GZSL methods, CDGZSL constructs a common feature space across domains and acquires the corresponding intrinsic semantics shared among domains to transfer from seen to unseen domains. Considering the information asymmetry problem caused by redundant class semantics annotated with large language models (LLMs), we present Meta Domain Alignment Semantic Refinement (MDASR). Technically, MDASR consists of two parts: Inter-class similarity alignment, which eliminates the non-intrinsic semantics not shared across all domains under the guidance of inter-class feature relationships, and unseen-class meta generation, which preserves intrinsic semantics to maintain connectivity between seen and unseen classes by simulating feature generation. MDASR effectively aligns the redundant semantic space with the common feature space, mitigating the information asymmetry in CDGZSL. The effectiveness of MDASR is demonstrated on two datasets, Office-Home and Mini-DomainNet, and we have shared the LLM-based semantics for these datasets as a benchmark.
△ Less
Submitted 10 March, 2025; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Learning to better see the unseen: Broad-Deep Mixed Anti-Forgetting Framework for Incremental Zero-Shot Fault Diagnosis
Authors:
Jiancheng Zhao,
Jiaqi Yue,
Chunhui Zhao
Abstract:
Zero-shot fault diagnosis (ZSFD) is capable of identifying unseen faults via predicting fault attributes labeled by human experts. We first recognize the demand of ZSFD to deal with continuous changes in industrial processes, i.e., the model's ability to adapt to new fault categories and attributes while avoiding forgetting the diagnosis ability learned previously. To overcome the issue that the e…
▽ More
Zero-shot fault diagnosis (ZSFD) is capable of identifying unseen faults via predicting fault attributes labeled by human experts. We first recognize the demand of ZSFD to deal with continuous changes in industrial processes, i.e., the model's ability to adapt to new fault categories and attributes while avoiding forgetting the diagnosis ability learned previously. To overcome the issue that the existing ZSFD paradigm cannot learn from evolving streams of training data in industrial scenarios, the incremental ZSFD (IZSFD) paradigm is proposed for the first time, which incorporates category increment and attribute increment for both traditional ZSFD and generalized ZSFD paradigms. To achieve IZSFD, we present a broad-deep mixed anti-forgetting framework (BDMAFF) that aims to learn from new fault categories and attributes. To tackle the issue of forgetting, BDMAFF effectively accumulates previously acquired knowledge from two perspectives: features and attribute prototypes. The feature memory is established through a deep generative model that employs anti-forgetting training strategies, ensuring the generation quality of historical categories is supervised and maintained. The diagnosis model SEEs the UNSEEN faults with the help of generated samples from the generative model. The attribute prototype memory is established through a diagnosis model inspired by the broad learning system. Unlike traditional incremental learning algorithms, BDMAFF introduces a memory-driven iterative update strategy for the diagnosis model, which allows the model to learn new faults and attributes without requiring the storage of all historical training samples. The effectiveness of the proposed method is verified by a real hydraulic system and the Tennessee-Eastman benchmark process.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Cradle: Empowering Foundation Agents Towards General Computer Control
Authors:
Weihao Tan,
Wentao Zhang,
Xinrun Xu,
Haochong Xia,
Ziluo Ding,
Boyu Li,
Bohan Zhou,
Junpeng Yue,
Jiechuan Jiang,
Yewen Li,
Ruyi An,
Molei Qin,
Chuqiao Zong,
Longtao Zheng,
Yujie Wu,
Xiaoqiang Chai,
Yifei Bi,
Tianbao Xie,
Pengjie Gu,
Xiyun Li,
Ceyao Zhang,
Long Tian,
Chaojie Wang,
Xinrun Wang,
Börje F. Karlsson
, et al. (3 additional authors not shown)
Abstract:
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through t…
▽ More
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Cradle can understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games, five software applications, and a comprehensive benchmark, OSWorld. Cradle is the first to enable foundation agents to follow the main storyline and complete 40-minute-long real missions in the complex AAA game Red Dead Redemption 2 (RDR2). Cradle can also create a city of a thousand people in Cities: Skylines, farm and harvest parsnips in Stardew Valley, and trade and bargain with a maximal weekly total profit of 87% in Dealer's Life 2. Cradle can not only operate daily software, like Chrome, Outlook, and Feishu, but also edit images and videos using Meitu and CapCut. Cradle greatly extends the reach of foundation agents by enabling the easy conversion of any software, especially complex games, into benchmarks to evaluate agents' various abilities and facilitate further data collection, thus paving the way for generalist agents.
△ Less
Submitted 2 July, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.