-
Physics-Guided Dual Implicit Neural Representations for Source Separation
Authors:
Yuan Ni,
Zhantao Chen,
Alexander N. Petsch,
Edmund Xu,
Cheng Peng,
Alexander I. Kolesnikov,
Sugata Chowdhury,
Arun Bansil,
Jana B. Thayer,
Joshua J. Turner
Abstract:
Significant challenges exist in efficient data analysis of most advanced experimental and observational techniques because the collected signals often include unwanted contributions--such as background and signal distortions--that can obscure the physically relevant information of interest. To address this, we have developed a self-supervised machine-learning approach for source separation using a…
▽ More
Significant challenges exist in efficient data analysis of most advanced experimental and observational techniques because the collected signals often include unwanted contributions--such as background and signal distortions--that can obscure the physically relevant information of interest. To address this, we have developed a self-supervised machine-learning approach for source separation using a dual implicit neural representation framework that jointly trains two neural networks: one for approximating distortions of the physical signal of interest and the other for learning the effective background contribution. Our method learns directly from the raw data by minimizing a reconstruction-based loss function without requiring labeled data or pre-defined dictionaries. We demonstrate the effectiveness of our framework by considering a challenging case study involving large-scale simulated as well as experimental momentum-energy-dependent inelastic neutron scattering data in a four-dimensional parameter space, characterized by heterogeneous background contributions and unknown distortions to the target signal. The method is found to successfully separate physically meaningful signals from a complex or structured background even when the signal characteristics vary across all four dimensions of the parameter space. An analytical approach that informs the choice of the regularization parameter is presented. Our method offers a versatile framework for addressing source separation problems across diverse domains, ranging from superimposed signals in astronomical measurements to structural features in biomedical image reconstructions.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
USVTrack: USV-Based 4D Radar-Camera Tracking Dataset for Autonomous Driving in Inland Waterways
Authors:
Shanliang Yao,
Runwei Guan,
Yi Ni,
Sen Xu,
Yong Yue,
Xiaohui Zhu,
Ryan Wen Liu
Abstract:
Object tracking in inland waterways plays a crucial role in safe and cost-effective applications, including waterborne transportation, sightseeing tours, environmental monitoring and surface rescue. Our Unmanned Surface Vehicle (USV), equipped with a 4D radar, a monocular camera, a GPS, and an IMU, delivers robust tracking capabilities in complex waterborne environments. By leveraging these sensor…
▽ More
Object tracking in inland waterways plays a crucial role in safe and cost-effective applications, including waterborne transportation, sightseeing tours, environmental monitoring and surface rescue. Our Unmanned Surface Vehicle (USV), equipped with a 4D radar, a monocular camera, a GPS, and an IMU, delivers robust tracking capabilities in complex waterborne environments. By leveraging these sensors, our USV collected comprehensive object tracking data, which we present as USVTrack, the first 4D radar-camera tracking dataset tailored for autonomous driving in new generation waterborne transportation systems. Our USVTrack dataset presents rich scenarios, featuring diverse various waterways, varying times of day, and multiple weather and lighting conditions. Moreover, we present a simple but effective radar-camera matching method, termed RCM, which can be plugged into popular two-stage association trackers. Experimental results utilizing RCM demonstrate the effectiveness of the radar-camera matching in improving object tracking accuracy and reliability for autonomous driving in waterborne environments. The USVTrack dataset is public on https://usvtrack.github.io.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Revisiting Sampling Strategies for Molecular Generation
Authors:
Yuyan Ni,
Shikun Feng,
Wei-Ying Ma,
Zhi-Ming Ma,
Yanyan Lan
Abstract:
Sampling strategies in diffusion models are critical to molecular generation yet remain relatively underexplored. In this work, we investigate a broad spectrum of sampling methods beyond conventional defaults and reveal that sampling choice substantially affects molecular generation performance. In particular, we identify a maximally stochastic sampling (StoMax), a simple yet underexplored strateg…
▽ More
Sampling strategies in diffusion models are critical to molecular generation yet remain relatively underexplored. In this work, we investigate a broad spectrum of sampling methods beyond conventional defaults and reveal that sampling choice substantially affects molecular generation performance. In particular, we identify a maximally stochastic sampling (StoMax), a simple yet underexplored strategy, as consistently outperforming default sampling methods for generative models DDPM and BFN. Our findings highlight the pivotal role of sampling design and suggest promising directions for advancing molecular generation through principled and more expressive sampling approaches.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
I$^2$S-TFCKD: Intra-Inter Set Knowledge Distillation with Time-Frequency Calibration for Speech Enhancement
Authors:
Jiaming Cheng,
Ruiyu Liang,
Chao Xu,
Ye Ni,
Wei Zhou,
Björn W. Schuller,
Xiaoshuai Hao
Abstract:
In recent years, complexity compression of neural network (NN)-based speech enhancement (SE) models has gradually attracted the attention of researchers, especially in scenarios with limited hardware resources or strict latency requirements. The main difficulties and challenges lie in achieving a balance between complexity and performance according to the characteristics of the task. In this paper…
▽ More
In recent years, complexity compression of neural network (NN)-based speech enhancement (SE) models has gradually attracted the attention of researchers, especially in scenarios with limited hardware resources or strict latency requirements. The main difficulties and challenges lie in achieving a balance between complexity and performance according to the characteristics of the task. In this paper, we propose an intra-inter set knowledge distillation (KD) framework with time-frequency calibration (I$^2$S-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully utilizes the time-frequency differential information of speech while promoting global knowledge flow. Firstly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. Secondly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through residual fusion to form the fused feature set that enables inter-set knowledge interaction. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Conditional Diffusion Model-Driven Generative Channels for Double RIS-Aided Wireless Systems
Authors:
Yiyang Ni,
Qi Zhang,
Guangji Chen,
Yan Cai,
Jun Li,
Shi Jin
Abstract:
With the development of the upcoming sixth-generation networks (6G), reconfigurable intelligent surfaces (RISs) have gained significant attention due to its ability of reconfiguring wireless channels via smart reflections. However, traditional channel state information (CSI) acquisition techniques for double-RIS systems face challenges (e.g., high pilot overhead or multipath interference). This pa…
▽ More
With the development of the upcoming sixth-generation networks (6G), reconfigurable intelligent surfaces (RISs) have gained significant attention due to its ability of reconfiguring wireless channels via smart reflections. However, traditional channel state information (CSI) acquisition techniques for double-RIS systems face challenges (e.g., high pilot overhead or multipath interference). This paper proposes a new channel generation method in double-RIS communication systems based on the tool of conditional diffusion model (CDM). The CDM is trained on synthetic channel data to capture channel characteristics. It addresses the limitations of traditional CSI generation methods, such as insufficient model understanding capability and poor environmental adaptability. We provide a detailed analysis of the diffusion process for channel generation, and it is validated through simulations. The simulation results demonstrate that the proposed CDM based method outperforms traditional channel acquisition methods in terms of normalized mean squared error (NMSE). This method offers a new paradigm for channel acquisition in double-RIS systems, which is expected to improve the quality of channel acquisition with low pilot overhead.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Biases in stellar masses of JWST high-z quasar host galaxies caused by quasar subtraction
Authors:
Sabrina Berger,
Madeline A. Marshall,
J. Stuart B. Wyithe,
Tiziana di Matteo,
Yueying Ni,
Stephen M. Wilkins,
Minghao Yue
Abstract:
JWST has enabled a new era of understanding high-z galaxy and black hole evolution with more than 30 high-z quasar host galaxy detections. Many of these observations imply galaxies with black holes that are overmassive compared to their low-z counterparts. However, the bright quasar point source removal may cause significant biases in these stellar mass measurements. We develop a simulation-based…
▽ More
JWST has enabled a new era of understanding high-z galaxy and black hole evolution with more than 30 high-z quasar host galaxy detections. Many of these observations imply galaxies with black holes that are overmassive compared to their low-z counterparts. However, the bright quasar point source removal may cause significant biases in these stellar mass measurements. We develop a simulation-based inference method to disentangle the quasar host galaxy stellar mass measurements from observational biases during the point source removal. We use the BlueTides simulation to generate mock images and perform point source removal on thousands of simulated high-z quasar host galaxies, constructing corrected host magnitude posteriors. We find that JWST photometry tends to either correctly recover or modestly misestimate host magnitudes, with a maximum magnitude underestimate of 0.21 mag. With our corrected magnitude posteriors, we perform SED fitting on each quasar host galaxy and compare the stellar mass measurement before and after the correction. We find that stellar mass estimates are generally robust, or overestimated by $\leq0.3$ dex. We also find that the stellar masses of a subset of hosts (J1120+0641, J0844-0132, J0911+0152, and J1146-0005) remain unconstrained, as key photometric bands provide only flux upper limits. Understanding these biases is essential to uncovering the evolutionary pathways of high-z quasars with their hosts.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
A Review of Cloud Computing in Seismology
Authors:
Yiyu Ni,
Marine A. Denolle,
Jannes Munchmeyer,
Yinzhi Wang,
Kuan-Fu Feng,
Carlos Garcia Jurado Suarez,
Amanda M. Thomas,
Chad Trabant,
Alex Hamilton,
David Mencin
Abstract:
Seismology has entered the petabyte era, driven by decades of continuous recordings of broadband networks, the increase in nodal seismic experiments, and the recent emergence of Distributed Acoustic Sensing (DAS). This review explains how commercial clouds - AWS, Google Cloud, and Azure - by providing object storage, elastic compute, and managed databases, enable researchers to "bring the code to…
▽ More
Seismology has entered the petabyte era, driven by decades of continuous recordings of broadband networks, the increase in nodal seismic experiments, and the recent emergence of Distributed Acoustic Sensing (DAS). This review explains how commercial clouds - AWS, Google Cloud, and Azure - by providing object storage, elastic compute, and managed databases, enable researchers to "bring the code to the data," thereby overcoming traditional HPC solutions' bandwidth and capacity limitations. After literature reviews of cloud concepts and their research applications in seismology, we illustrate the capacities of cloud-native workflows using two canonical end-to-end demonstrations: 1) ambient noise seismology and cross-correlation, and 2) earthquake detection, discrimination, and phase picking. Both workflows utilized S3 for streaming I/O and DocumentDB for provenance, demonstrating that cloud throughput can rival on-premises HPC at comparable costs, scanning 100 TBs to 1.3 PBs of seismic data in a few hours or days of processing. The review also discusses research and education initiatives, the reproducibility benefits of containers, and cost pitfalls (e.g., egress, I/O fees) of energy-intensive seismological research computing. While designing cloud pipelines remains non-trivial, partnerships with research software engineers enable converting domain code into scalable, automated, and environmentally conscious solutions for next-generation seismology.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Authors:
Yao Ni,
Song Wen,
Piotr Koniusz,
Anoop Cherian
Abstract:
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we p…
▽ More
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model's robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Authors:
Yuansheng Ni,
Ping Nie,
Kai Zou,
Xiang Yue,
Wenhu Chen
Abstract:
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruct…
▽ More
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
All-sky search for individual Primordial Black Hole bursts with LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
G. H. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen
, et al. (293 additional authors not shown)
Abstract:
Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for…
▽ More
Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for individual PBH burst events using the data collected from March 2021 to July 2024 by the Water Cherenkov Detector Array of the Large High Altitude Air Shower Observatory (LHAASO). Three PBH burst durations, 10~s, 20~s, and 100~s, are searched, with no significant PBH bursts observed. The upper limit on the local PBH burst rate density is set to be as low as 181~pc$^{-3}$~yr$^{-1}$ at 99$\%$ confidence level, representing the most stringent limit achieved to date.
△ Less
Submitted 2 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Attention-Enhanced Prompt Decision Transformers for UAV-Assisted Communications with AoI
Authors:
Chi Lu,
Yiyang Ni,
Zhe Wang,
Xiaoli Shi,
Jun Li,
Shi Jin
Abstract:
Decision Transformer (DT) has recently demonstrated strong generalizability in dynamic resource allocation within unmanned aerial vehicle (UAV) networks, compared to conventional deep reinforcement learning (DRL). However, its performance is hindered due to zero-padding for varying state dimensions, inability to manage long-term energy constraint, and challenges in acquiring expert samples for few…
▽ More
Decision Transformer (DT) has recently demonstrated strong generalizability in dynamic resource allocation within unmanned aerial vehicle (UAV) networks, compared to conventional deep reinforcement learning (DRL). However, its performance is hindered due to zero-padding for varying state dimensions, inability to manage long-term energy constraint, and challenges in acquiring expert samples for few-shot fine-tuning in new scenarios. To overcome these limitations, we propose an attention-enhanced prompt Decision Transformer (APDT) framework to optimize trajectory planning and user scheduling, aiming to minimize the average age of information (AoI) under long-term energy constraint in UAV-assisted Internet of Things (IoT) networks. Specifically, we enhance the convenional DT framework by incorporating an attention mechanism to accommodate varying numbers of terrestrial users, introducing a prompt mechanism based on short trajectory demonstrations for rapid adaptation to new scenarios, and designing a token-assisted method to address the UAV's long-term energy constraint. The APDT framework is first pre-trained on offline datasets and then efficiently generalized to new scenarios. Simulations demonstrate that APDT achieves twice faster in terms of convergence rate and reduces average AoI by $8\%$ compared to conventional DT.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Embed Progressive Implicit Preference in Unified Space for Deep Collaborative Filtering
Authors:
Zhongjin Zhang,
Yu Liang,
Cong Fu,
Yuxuan Zhu,
Kun Wang,
Yabo Ni,
Anxiang Zeng,
Jiazhi Xia
Abstract:
Embedding-based collaborative filtering, often coupled with nearest neighbor search, is widely deployed in large-scale recommender systems for personalized content selection. Modern systems leverage multiple implicit feedback signals (e.g., clicks, add to cart, purchases) to model user preferences comprehensively. However, prevailing approaches adopt a feedback-wise modeling paradigm, which (1) fa…
▽ More
Embedding-based collaborative filtering, often coupled with nearest neighbor search, is widely deployed in large-scale recommender systems for personalized content selection. Modern systems leverage multiple implicit feedback signals (e.g., clicks, add to cart, purchases) to model user preferences comprehensively. However, prevailing approaches adopt a feedback-wise modeling paradigm, which (1) fails to capture the structured progression of user engagement entailed among different feedback and (2) embeds feedback-specific information into disjoint spaces, making representations incommensurable, increasing system complexity, and leading to suboptimal retrieval performance. A promising alternative is Ordinal Logistic Regression (OLR), which explicitly models discrete ordered relations. However, existing OLR-based recommendation models mainly focus on explicit feedback (e.g., movie ratings) and struggle with implicit, correlated feedback, where ordering is vague and non-linear. Moreover, standard OLR lacks flexibility in handling feedback-dependent covariates, resulting in suboptimal performance in real-world systems. To address these limitations, we propose Generalized Neural Ordinal Logistic Regression (GNOLR), which encodes multiple feature-feedback dependencies into a unified, structured embedding space and enforces feedback-specific dependency learning through a nested optimization framework. Thus, GNOLR enhances predictive accuracy, captures the progression of user engagement, and simplifies the retrieval process. We establish a theoretical comparison with existing paradigms, demonstrating how GNOLR avoids disjoint spaces while maintaining effectiveness. Extensive experiments on ten real-world datasets show that GNOLR significantly outperforms state-of-the-art methods in efficiency and adaptability.
△ Less
Submitted 28 May, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
The Properties of Little Red Dot Galaxies in the ASTRID Simulation
Authors:
Patrick LaChance,
Rupert A. C. Croft,
Tiziana Di Matteo,
Yihao Zhou,
Fabio Pacucci,
Yueying Ni,
Nianyi Chen,
Simeon Bird
Abstract:
We present simulated counterparts of the ``Little Red Dot'' (LRD) galaxies observed with JWST, using the large cosmological hydrodynamic simulation, ASTRID. We create mock observations of the galaxies ($5 \leq z \leq 8$) in ASTRID, and find seventeen which fit the color and size criteria of LRDs. These LRDs are galaxies with high stellar masses ($\rm log(M_*/M_{\odot}) \geq 9.7$), and massive blac…
▽ More
We present simulated counterparts of the ``Little Red Dot'' (LRD) galaxies observed with JWST, using the large cosmological hydrodynamic simulation, ASTRID. We create mock observations of the galaxies ($5 \leq z \leq 8$) in ASTRID, and find seventeen which fit the color and size criteria of LRDs. These LRDs are galaxies with high stellar masses ($\rm log(M_*/M_{\odot}) \geq 9.7$), and massive black holes ($\rm log(M_{BH}/M_{\odot}) \geq 6.8$). The host galaxies are dense, with stellar half mass radii ($\rm 325\,pc \leq r_{{\rm half},*} \leq 620\,pc$), and dust attenuation in the F444W band above 1.25. Their star formation has been recently quenched. They host relatively bright AGN that are dust-obscured and contribute significantly to the rest-frame optical red slope and have relatively low luminosity in the rest-frame ultraviolet, where the host galaxy's stars are more dominant. These LRDs are in an evolutionary phase of miniquenching that is the result of AGN feedback from their massive black holes. The LRDs in ASTRID are bright with F444W magnitudes of $23.5-25.5$. The less massive and fainter galaxies in ASTRID lack the dust concentration necessary to produce the red slope of an LRD, though this could be an effect of limited resolution. Most of the highest Eddington black holes are not LRDs due to their host galaxies having typical dust levels and relatively high star formation rates accompanying their highly accreting black holes, resulting in their spectra being too flat.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
A Global-scale Database of Seismic Phases from Cloud-based Picking at Petabyte Scale
Authors:
Yiyu Ni,
Marine A. Denolle,
Amanda M. Thomas,
Alex Hamilton,
Jannes Münchmeyer,
Yinzhi Wang,
Loïc Bachelot,
Chad Trabant,
David Mencin
Abstract:
We present the first global-scale database of 4.3 billion P- and S-wave picks extracted from 1.3 PB continuous seismic data via a cloud-native workflow. Using cloud computing services on Amazon Web Services, we launched ~145,000 containerized jobs on continuous records from 47,354 stations spanning 2002-2025, completing in under three days. Phase arrivals were identified with a deep learning model…
▽ More
We present the first global-scale database of 4.3 billion P- and S-wave picks extracted from 1.3 PB continuous seismic data via a cloud-native workflow. Using cloud computing services on Amazon Web Services, we launched ~145,000 containerized jobs on continuous records from 47,354 stations spanning 2002-2025, completing in under three days. Phase arrivals were identified with a deep learning model, PhaseNet, through an open-source Python ecosystem for deep learning, SeisBench. To visualize and gain a global understanding of these picks, we present preliminary results about pick time series revealing Omori-law aftershock decay, seasonal variations linked to noise levels, and dense regional coverage that will enhance earthquake catalogs and machine-learning datasets. We provide all picks in a publicly queryable database, providing a powerful resource for researchers studying seismicity around the world. This report provides insights into the database and the underlying workflow, demonstrating the feasibility of petabyte-scale seismic data mining on the cloud and of providing intelligent data products to the community in an automated manner.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Authors:
Hui Shen,
Taiqiang Wu,
Qi Han,
Yunta Hsieh,
Jizhou Wang,
Yuyue Zhang,
Yuxin Cheng,
Zijian Hao,
Yuansheng Ni,
Xin Wang,
Zhongwei Wan,
Kai Zhang,
Wendong Xu,
Jing Xiong,
Ping Luo,
Wenhu Chen,
Chaofan Tao,
Zhuoqing Mao,
Ngai Wong
Abstract:
Existing benchmarks fail to capture a crucial aspect of intelligence: physical reasoning, the integrated ability to combine domain knowledge, symbolic reasoning, and understanding of real-world constraints. To address this gap, we introduce PhyX: the first large-scale benchmark designed to assess models capacity for physics-grounded reasoning in visual scenarios. PhyX includes 3K meticulously cura…
▽ More
Existing benchmarks fail to capture a crucial aspect of intelligence: physical reasoning, the integrated ability to combine domain knowledge, symbolic reasoning, and understanding of real-world constraints. To address this gap, we introduce PhyX: the first large-scale benchmark designed to assess models capacity for physics-grounded reasoning in visual scenarios. PhyX includes 3K meticulously curated multimodal questions spanning 6 reasoning types across 25 sub-domains and 6 core physics domains: thermodynamics, electromagnetism, mechanics, modern physics, optics, and wave\&acoustics. In our comprehensive evaluation, even state-of-the-art models struggle significantly with physical reasoning. GPT-4o, Claude3.7-Sonnet, and GPT-o4-mini achieve only 32.5%, 42.2%, and 45.8% accuracy respectively-performance gaps exceeding 29% compared to human experts. Our analysis exposes critical limitations in current models: over-reliance on memorized disciplinary knowledge, excessive dependence on mathematical formulations, and surface-level visual pattern matching rather than genuine physical understanding. We provide in-depth analysis through fine-grained statistics, detailed case studies, and multiple evaluation paradigms to thoroughly examine physical reasoning capabilities. To ensure reproducibility, we implement a compatible evaluation protocol based on widely-used toolkits such as VLMEvalKit, enabling one-click evaluation. More details are available on our project page: https://phyx-bench.github.io/.
△ Less
Submitted 29 May, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
G. H. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (292 additional authors not shown)
Abstract:
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst…
▽ More
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Parallel Layer Normalization for Universal Approximation
Authors:
Yunhao Ni,
Yuhe Liu,
Wenxin Sun,
Yitong Tang,
Yuxin Guo,
Peilin Feng,
Wenjun Wu,
Lei Huang
Abstract:
Universal approximation theorem (UAT) is a fundamental theory for deep neural networks (DNNs), demonstrating their powerful representation capacity to represent and approximate any function. The analyses and proofs of UAT are based on traditional network with only linear and nonlinear activation functions, but omitting normalization layers, which are commonly employed to enhance the training of mo…
▽ More
Universal approximation theorem (UAT) is a fundamental theory for deep neural networks (DNNs), demonstrating their powerful representation capacity to represent and approximate any function. The analyses and proofs of UAT are based on traditional network with only linear and nonlinear activation functions, but omitting normalization layers, which are commonly employed to enhance the training of modern networks. This paper conducts research on UAT of DNNs with normalization layers for the first time. We theoretically prove that an infinitely wide network -- composed solely of parallel layer normalization (PLN) and linear layers -- has universal approximation capacity. Additionally, we investigate the minimum number of neurons required to approximate $L$-Lipchitz continuous functions, with a single hidden-layer network. We compare the approximation capacity of PLN with traditional activation functions in theory. Different from the traditional activation functions, we identify that PLN can act as both activation function and normalization in deep neural networks at the same time. We also find that PLN can improve the performance when replacing LN in transformer architectures, which reveals the potential of PLN used in neural architectures.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Truncated Gaussian copula principal component analysis with application to pediatric acute lymphoblastic leukemia patients' gut microbiome
Authors:
Lei Wang,
Yang Ni,
Irina Gaynanova
Abstract:
Increasing epidemiologic evidence suggests that the diversity and composition of the gut microbiome can predict infection risk in cancer patients. Infections remain a major cause of morbidity and mortality during chemotherapy. Analyzing microbiome data to identify associations with infection pathogenesis for proactive treatment has become a critical research focus. However, the high-dimensional na…
▽ More
Increasing epidemiologic evidence suggests that the diversity and composition of the gut microbiome can predict infection risk in cancer patients. Infections remain a major cause of morbidity and mortality during chemotherapy. Analyzing microbiome data to identify associations with infection pathogenesis for proactive treatment has become a critical research focus. However, the high-dimensional nature of the data necessitates the use of dimension-reduction methods to facilitate inference and interpretation. Traditional dimension reduction methods, which assume Gaussianity, perform poorly with skewed and zero-inflated microbiome data. To address these challenges, we propose a semiparametric principal component analysis (PCA) method based on a truncated latent Gaussian copula model that accommodates both skewness and zero inflation. Simulation studies demonstrate that the proposed method outperforms existing approaches by providing more accurate estimates of scores and loadings across various copula transformation settings. We apply our method, along with competing approaches, to gut microbiome data from pediatric patients with acute lymphoblastic leukemia. The principal scores derived from the proposed method reveal the strongest associations between pre-chemotherapy microbiome composition and adverse events during subsequent chemotherapy, offering valuable insights for improving patient outcomes.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches
Authors:
Yuhang Zhou,
Xutian Chen,
Yixin Cao,
Yuchen Ni,
Yu He,
Siyu Tian,
Xiang Liu,
Jian Zhang,
Chuanjun Ji,
Guangnan Ye,
Xipeng Qiu
Abstract:
Recent progress in large language models (LLMs) has outpaced the development of effective evaluation methods. Traditional benchmarks rely on task-specific metrics and static datasets, which often suffer from fairness issues, limited scalability, and contamination risks. In this paper, we introduce Teach2Eval, an indirect evaluation framework inspired by the Feynman Technique. Instead of directly t…
▽ More
Recent progress in large language models (LLMs) has outpaced the development of effective evaluation methods. Traditional benchmarks rely on task-specific metrics and static datasets, which often suffer from fairness issues, limited scalability, and contamination risks. In this paper, we introduce Teach2Eval, an indirect evaluation framework inspired by the Feynman Technique. Instead of directly testing LLMs on predefined tasks, our method evaluates a model's multiple abilities to teach weaker student models to perform tasks effectively. By converting open-ended tasks into standardized multiple-choice questions (MCQs) through teacher-generated feedback, Teach2Eval enables scalable, automated, and multi-dimensional assessment. Our approach not only avoids data leakage and memorization but also captures a broad range of cognitive abilities that are orthogonal to current benchmarks. Experimental results across 26 leading LLMs show strong alignment with existing human and model-based dynamic rankings, while offering additional interpretability for training guidance.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
A Survey of Real-time Scheduling on Accelerator-based Heterogeneous Architecture for Time Critical Applications
Authors:
An Zou,
Yuankai Xu,
Yinchen Ni,
Jintao Chen,
Yehan Ma,
Jing Li,
Christopher Gill,
Xuan Zhang,
Yier Jin
Abstract:
Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time applications, such as robotics and autonomous vehicles, these architectures must meet stringent timing constraints. To summarize these achievements, this article…
▽ More
Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time applications, such as robotics and autonomous vehicles, these architectures must meet stringent timing constraints. To summarize these achievements, this article presents a comprehensive survey of real-time scheduling techniques for accelerator-based heterogeneous platforms. It highlights key advancements from the past ten years, showcasing how proposed solutions have evolved to address the distinct challenges and requirements of these systems.
This survey begins with an overview of the hardware characteristics and common task execution models used in accelerator-based heterogeneous systems. It then categorizes the reviewed works based on soft and hard deadline constraints. For soft real-time approaches, we cover real-time scheduling methods supported by hardware vendors and strategies focusing on timing-critical scheduling, energy efficiency, and thermal-aware scheduling. For hard real-time approaches, we first examine support from processor vendors. We then discuss scheduling techniques that guarantee hard deadlines (with strict response time analysis). After reviewing general soft and hard real-time scheduling methods, we explore application- or scenario-driven real-time scheduling techniques for accelerator-enabled heterogeneous computing platforms. Finally, the article concludes with a discussion of open issues and challenges within this research area.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Dependence of the intensity of the nonwave component of EUV waves on coronal magnetic field configuration
Authors:
Yuwei Li,
J. H. Guo,
Y. W. Ni,
Z. Y. Zhang,
P. F. Chen
Abstract:
Context. Mounting evidence has shown that EUV waves consist of a fast-mode magnetohydrodynamic (MHD) wave (or shock wave) followed by a slower nonwave component, as predicted by the magnetic fieldline stretching model. However, not all observed events display both wavefronts, particularly the slower nonwave component. Even in case that the slower nonwave component is present, the intensity distrib…
▽ More
Context. Mounting evidence has shown that EUV waves consist of a fast-mode magnetohydrodynamic (MHD) wave (or shock wave) followed by a slower nonwave component, as predicted by the magnetic fieldline stretching model. However, not all observed events display both wavefronts, particularly the slower nonwave component. Even in case that the slower nonwave component is present, the intensity distribution often exhibits strong anisotropy.
Aims. This study is intended to unveil the formation condition of the slower nonwave component of EUV waves. Methods. We analyzed the EUV wave event on 8 March 2019, and compared the EUV wave intensity map with the extrapolation coronal potential magnetic field. Data-inspired MHD simulation was also performed.
Results. Two types of EUV waves are identified, and the slower nonwave component exhibits strong anisotropy. By reconstructing 3D coronal magnetic fields, we found that the slower nonwave component of EUV waves is more pronounced in the regions where magnetic fields are backward-inclined, which is further reproduced by our MHD simulations.
Conclusions. The anisotropy of the slower nonwave component of EUV waves is strongly related to the magnetic configuration, with backward-inclined field lines favoring their appearance. The more the field lines are forward-inclined, the weaker such wavelike fronts are.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Enabling Group Fairness in Graph Unlearning via Bi-level Debiasing
Authors:
Yezi Liu,
Prathyush Poduval,
Wenjun Huang,
Yang Ni,
Hanning Chen,
Mohsen Imani
Abstract:
Graph unlearning is a crucial approach for protecting user privacy by erasing the influence of user data on trained graph models. Recent developments in graph unlearning methods have primarily focused on maintaining model prediction performance while removing user information. However, we have observed that when user information is deleted from the model, the prediction distribution across differe…
▽ More
Graph unlearning is a crucial approach for protecting user privacy by erasing the influence of user data on trained graph models. Recent developments in graph unlearning methods have primarily focused on maintaining model prediction performance while removing user information. However, we have observed that when user information is deleted from the model, the prediction distribution across different sensitive groups often changes. Furthermore, graph models are shown to be prone to amplifying biases, making the study of fairness in graph unlearning particularly important. This raises the question: Does graph unlearning actually introduce bias? Our findings indicate that the predictions of post-unlearning models become highly correlated with sensitive attributes, confirming the introduction of bias in the graph unlearning process. To address this issue, we propose a fair graph unlearning method, FGU. To guarantee privacy, FGU trains shard models on partitioned subgraphs, unlearns the requested data from the corresponding subgraphs, and retrains the shard models on the modified subgraphs. To ensure fairness, FGU employs a bi-level debiasing process: it first enables shard-level fairness by incorporating a fairness regularizer in the shard model retraining, and then achieves global-level fairness by aligning all shard models to minimize global disparity. Our experiments demonstrate that FGU achieves superior fairness while maintaining privacy and accuracy. Additionally, FGU is robust to diverse unlearning requests, ensuring fairness and utility performance across various data distributions.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
The THESAN-ZOOM project: Star formation efficiency from giant molecular clouds to galactic scale in high-redshift starbursts
Authors:
Zihao Wang,
Xuejian Shen,
Mark Vogelsberger,
Hui Li,
Rahul Kannan,
Ewald Puchwein,
Aaron Smith,
Josh Borrow,
Enrico Garaldi,
Laura Keating,
Oliver Zier,
William McClymont,
Sandro Tacchella,
Yang Ni,
Lars Hernquist
Abstract:
Star formation in galaxies is inherently complex, involving the interplay of physical processes over a hierarchy of spatial scales. In this work, we investigate the connection between global (galaxy-scale) and local (cloud-scale) star formation efficiencies (SFEs) at high redshifts ($z\gtrsim 3$), using the state-of-the-art cosmological zoom-in simulation suite THESAN-ZOOM. We find that the galaxy…
▽ More
Star formation in galaxies is inherently complex, involving the interplay of physical processes over a hierarchy of spatial scales. In this work, we investigate the connection between global (galaxy-scale) and local (cloud-scale) star formation efficiencies (SFEs) at high redshifts ($z\gtrsim 3$), using the state-of-the-art cosmological zoom-in simulation suite THESAN-ZOOM. We find that the galaxy-scale average SFE, $\langle ε^{\rm gal}_{\rm ff} \rangle$, scales with $M_{\rm halo}^{1/3}\,(1+z)^{1/2} \sim V_{\rm vir}$, consistent with expectations from feedback-regulated models. On cloud scales, we identify giant molecular clouds (GMCs) in a broad sample of high-redshift starbursts spanning a wide range of halo masses and redshifts. Star formation in these systems is predominantly hosted by filamentary GMCs embedded in a dense and highly turbulent interstellar medium (ISM). GMCs exhibit remarkably universal properties, including mass function, size, turbulence, and surface density, regardless of the environment in which they are identified. The global gas depletion time (and the Kennicutt-Schmidt relation) is determined by the GMC mass fraction in the ISM, while the cloud-scale SFE shows little variation. In particular, we find a nearly constant gas surface density of $Σ_{\rm GMC} \approx 70\,{\rm M}_{\odot}\,{\rm pc}^{-2}$ across different host galaxies. Nevertheless, we identify two regimes where phases with high SFE can arise. First, stars may form efficiently in the shock fronts generated by feedback from a preceding starburst. Second, the increasing background dark matter surface density with redshift may contribute to the gravitational potential of clouds at $z \gtrsim 8$ and confine them in high-SFE phases over extended periods.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Authors:
Yehui Tang,
Yichun Yin,
Yaoyuan Wang,
Hang Zhou,
Yu Pan,
Wei Guo,
Ziyang Zhang,
Miao Rang,
Fangcheng Liu,
Naifu Zhang,
Binghan Li,
Yonghan Dong,
Xiaojun Meng,
Yasheng Wang,
Dong Li,
Yin Li,
Dandan Tu,
Can Chen,
Youliang Yan,
Fisher Yu,
Ruiming Tang,
Yunhe Wang,
Botian Huang,
Bo Wang,
Boxiao Liu
, et al. (49 additional authors not shown)
Abstract:
Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r…
▽ More
Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing resources under the dynamic sparse model structures and materializing the expected performance gain on the actual hardware. To select model configurations suitable for Ascend NPUs without repeatedly running the expensive experiments, we leverage simulation to compare the trade-off of various model hyperparameters. This study led to Pangu Ultra MoE, a sparse LLM with 718 billion parameters, and we conducted experiments on the model to verify the simulation results. On the system side, we dig into Expert Parallelism to optimize the communication between NPU devices to reduce the synchronization overhead. We also optimize the memory efficiency within the devices to further reduce the parameter and activation management overhead. In the end, we achieve an MFU of 30.0% when training Pangu Ultra MoE, with performance comparable to that of DeepSeek R1, on 6K Ascend NPUs, and demonstrate that the Ascend system is capable of harnessing all the training stages of the state-of-the-art language models. Extensive experiments indicate that our recipe can lead to efficient training of large-scale sparse language models with MoE. We also study the behaviors of such models for future reference.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
High-redshift Millennium and Astrid galaxies in effective field theory at the field level
Authors:
James M. Sullivan,
Carolina Cuesta-Lazaro,
Mikhail M. Ivanov,
Yueying Ni,
Sownak Bose,
Boryana Hadzhiyska,
César Hernández-Aguayo,
Lars Hernquist,
Rahul Kannan
Abstract:
Effective Field Theory (EFT) modeling is expected to be a useful tool in the era of future higher-redshift galaxy surveys such as DESI-II and Spec-S5 due to its robust description of various large-scale structure tracers. However, large values of EFT bias parameters of higher-redshift galaxies could jeopardize the convergence of the perturbative expansion. In this paper we measure the bias paramet…
▽ More
Effective Field Theory (EFT) modeling is expected to be a useful tool in the era of future higher-redshift galaxy surveys such as DESI-II and Spec-S5 due to its robust description of various large-scale structure tracers. However, large values of EFT bias parameters of higher-redshift galaxies could jeopardize the convergence of the perturbative expansion. In this paper we measure the bias parameters and other EFT coefficients from samples of two types of star-forming galaxies in the state-of-the-art MilleniumTNG and Astrid hydrodynamical simulations. Our measurements are based on the field-level EFT forward model that allows for precision EFT parameter measurements by virtue of cosmic variance cancellation. Specifically, we consider approximately representative samples of Lyman-break galaxies (LBGs) and Lyman-alpha emitters (LAEs) that are consistent with the observed (angular) clustering and number density of these galaxies at $z=3$. Reproducing the linear biases and number densities observed from existing LAE and LBG data, we find quadratic bias parameters that are roughly consistent with those predicted from the halo model coupled with a simple halo occupation distribution model. We also find non-perturbative velocity contributions (Fingers of God) of a similar size for LBGs to the familiar case of Luminous Red Galaxies. However, these contributions are quite small for LAEs despite their large satellite fraction values of up to $\sim 30\%$. Our results indicate that the effective momentum reach $k_{\rm{Max}}$ at $z=3$ for LAEs (LBGs) will be in the range $0.3-0.6 ~h\rm{Mpc}^{-1}$ ($0.2-0.8~h\rm{Mpc}^{-1}$), suggesting that EFT will perform well for high redshift galaxy clustering. This work provides the first step toward obtaining realistic simulation-based priors on EFT parameters for LAEs and LBGs.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation
Authors:
Zongyuan Li,
Pengfei Li,
Runnan Qi,
Yanan Ni,
Lumin Jiang,
Hui Wu,
Xuebo Zhang,
Kuihua Huang,
Xian Guo
Abstract:
The lack of domain-specific data in the pre-training of Large Language Models (LLMs) severely limits LLM-based decision systems in specialized applications, while post-training a model in the scenarios requires significant computational resources. In this paper, we present Retrial-Augmented Learning (RAL), a reward-free self-supervised learning framework for LLMs that operates without model traini…
▽ More
The lack of domain-specific data in the pre-training of Large Language Models (LLMs) severely limits LLM-based decision systems in specialized applications, while post-training a model in the scenarios requires significant computational resources. In this paper, we present Retrial-Augmented Learning (RAL), a reward-free self-supervised learning framework for LLMs that operates without model training. By developing Retrieval-Augmented Generation (RAG) into a module for organizing intermediate data, we realized a three-stage autonomous knowledge generation of proposing a hypothesis, validating the hypothesis, and generating the knowledge. The method is evaluated in the LLM-PySC2 environment, a representative decision-making platform that combines sufficient complexity with domain-specific knowledge requirements. Experiments demonstrate that the proposed method effectively reduces hallucination by generating and utilizing validated knowledge, and increases decision-making performance at an extremely low cost. Meanwhile, the approach exhibits potential in out-of-distribution(OOD) tasks, robustness, and transferability, making it a cost-friendly but effective solution for decision-making problems and autonomous knowledge generation.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Hurdle Network Model With Latent Dynamic Shrinkage For Enhanced Edge Prediction in Zero-Inflated Directed Network Time Series
Authors:
Sandipan Pramanik,
Raymond Robertson,
Yang Ni
Abstract:
This article aims to model international trade relationships among 29 countries in the apparel industry between 1994 and 2013. Bilateral trade flows can be represented as a directed network, where nodes correspond to countries and directed edges indicate trade flows (i.e., whether one country exported to another in a given year). Additionally, node (e.g., GDP) and edge-specific (e.g., labor provis…
▽ More
This article aims to model international trade relationships among 29 countries in the apparel industry between 1994 and 2013. Bilateral trade flows can be represented as a directed network, where nodes correspond to countries and directed edges indicate trade flows (i.e., whether one country exported to another in a given year). Additionally, node (e.g., GDP) and edge-specific (e.g., labor provision) covariates are also available. The study focuses on two key challenges: (1) capturing multiple forms of temporal and network dependence, and dependence on covariates; and (2) accounting for potential trade volume as an important but partially observed edge-specific covariate, which is only available for country pairs that engaged in trade.
To address these challenges, we introduce the dynamic hurdle network model (Hurdle-Net) for zero-inflated directed network time series that incorporates several novel features. First, it represents the time series as a paired binary and continuous time series and utilizes a hurdle model that effectively handles sparsity in edge occurrence. Second, the model captures evolving network dependencies using node-specific latent variables governed by a dynamic shrinkage process. Third, it leverages a shared latent structure across the binary and continuous components, reflecting the fact that both networks involve the same nodes. Finally, the model employs a generalized logistic link function to relate edge occurrence to edge weight, allowing for a parsimonious and coherent hierarchical Bayesian framework that jointly models both network components. Compared to static or independent models, Hurdle-Net provides improved model selection, estimation, and prediction performance for analyzing international trade patterns. Its effectiveness is demonstrated through simulation studies and an application to bilateral trade flow data.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Stitching Inner Product and Euclidean Metrics for Topology-aware Maximum Inner Product Search
Authors:
Tingyang Chen,
Cong Fu,
Xiangyu Ke,
Yunjun Gao,
Yabo Ni,
Anxiang Zeng
Abstract:
Maximum Inner Product Search (MIPS) is a fundamental challenge in machine learning and information retrieval, particularly in high-dimensional data applications. Existing approaches to MIPS either rely solely on Inner Product (IP) similarity, which faces issues with local optima and redundant computations, or reduce the MIPS problem to the Nearest Neighbor Search under the Euclidean metric via spa…
▽ More
Maximum Inner Product Search (MIPS) is a fundamental challenge in machine learning and information retrieval, particularly in high-dimensional data applications. Existing approaches to MIPS either rely solely on Inner Product (IP) similarity, which faces issues with local optima and redundant computations, or reduce the MIPS problem to the Nearest Neighbor Search under the Euclidean metric via space projection, leading to topology destruction and information loss. Despite the divergence of the two paradigms, we argue that there is no inherent binary opposition between IP and Euclidean metrics. By stitching IP and Euclidean in the design of indexing and search algorithms, we can significantly enhance MIPS performance. Specifically, this paper explores the theoretical and empirical connections between these two metrics from the MIPS perspective. Our investigation, grounded in graph-based search, reveals that different indexing and search strategies offer distinct advantages for MIPS, depending on the underlying data topology. Building on these insights, we introduce a novel graph-based index called Metric-Amphibious Graph (MAG) and a corresponding search algorithm, Adaptive Navigation with Metric Switch (ANMS). To facilitate parameter tuning for optimal performance, we identify three statistical indicators that capture essential data topology properties and correlate strongly with parameter tuning. Extensive experiments on 12 real-world datasets demonstrate that MAG outperforms existing state-of-the-art methods, achieving up to 4x search speedup while maintaining adaptability and scalability.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration
Authors:
Hongji Li,
Hanwen Du,
Youhua Li,
Junchen Fu,
Chunxiao Li,
Ziyi Zhuang,
Jiakang Li,
Yongxin Ni
Abstract:
The surge in multimedia content has led to the development of Multi-Modal Recommender Systems (MMRecs), which use diverse modalities such as text, images, videos, and audio for more personalized recommendations. However, MMRecs struggle with noisy data caused by misalignment among modal content and the gap between modal semantics and recommendation semantics. Traditional denoising methods are inad…
▽ More
The surge in multimedia content has led to the development of Multi-Modal Recommender Systems (MMRecs), which use diverse modalities such as text, images, videos, and audio for more personalized recommendations. However, MMRecs struggle with noisy data caused by misalignment among modal content and the gap between modal semantics and recommendation semantics. Traditional denoising methods are inadequate due to the complexity of multi-modal data. To address this, we propose a universal guided in-sync distillation denoising framework for multi-modal recommendation (GUIDER), designed to improve MMRecs by denoising user feedback. Specifically, GUIDER uses a re-calibration strategy to identify clean and noisy interactions from modal content. It incorporates a Denoising Bayesian Personalized Ranking (DBPR) loss function to handle implicit user feedback. Finally, it applies a denoising knowledge distillation objective based on Optimal Transport distance to guide the alignment from modality representations to recommendation semantics. GUIDER can be seamlessly integrated into existing MMRecs methods as a plug-and-play solution. Experimental results on four public datasets demonstrate its effectiveness and generalizability. Our source code is available at https://github.com/Neon-Jing/Guider
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Bayesian Density-Density Regression with Application to Cell-Cell Communications
Authors:
Khai Nguyen,
Yang Ni,
Peter Mueller
Abstract:
We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types…
▽ More
We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types $(l \neq r)$ and each sample $i = 1, \ldots, n$, we observe a pair of distributions $(F_{ei}, G_{ei})$ of gene expressions for ligands and receptors of cell types $l$ and $r$, respectively. The aim is to set up a regression of receptor distributions $G_{ei}$ given ligand distributions $F_{ei}$. A key challenge is that these distributions reside in distinct spaces of differing dimensions. We formulate the regression of multivariate densities on multivariate densities using a generalized Bayes framework with the sliced Wasserstein distance between fitted and observed distributions. Finally, we use inference under such regressions to define a directed graph for cell-cell communications.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Authors:
Hanning Chen,
Yang Ni,
Wenjun Huang,
Hyunwoo Oh,
Yezi Liu,
Tamoghno Das,
Mohsen Imani
Abstract:
Large Vision Language Models (LVLMs) have been widely adopted to guide vision foundation models in performing reasoning segmentation tasks, achieving impressive performance. However, the substantial computational overhead associated with LVLMs presents a new challenge. The primary source of this computational cost arises from processing hundreds of image tokens. Therefore, an effective strategy to…
▽ More
Large Vision Language Models (LVLMs) have been widely adopted to guide vision foundation models in performing reasoning segmentation tasks, achieving impressive performance. However, the substantial computational overhead associated with LVLMs presents a new challenge. The primary source of this computational cost arises from processing hundreds of image tokens. Therefore, an effective strategy to mitigate such overhead is to reduce the number of image tokens, a process known as image token pruning. Previous studies on image token pruning for LVLMs have primarily focused on high level visual understanding tasks, such as visual question answering and image captioning. In contrast, guiding vision foundation models to generate accurate visual masks based on textual queries demands precise semantic and spatial reasoning capabilities. Consequently, pruning methods must carefully control individual image tokens throughout the LVLM reasoning process. Our empirical analysis reveals that existing methods struggle to adequately balance reductions in computational overhead with the necessity to maintain high segmentation accuracy. In this work, we propose LVLM_CSP, a novel training free visual token pruning method specifically designed for LVLM based reasoning segmentation tasks. LVLM_CSP consists of three stages: clustering, scattering, and pruning. Initially, the LVLM performs coarse-grained visual reasoning using a subset of selected image tokens. Next, fine grained reasoning is conducted, and finally, most visual tokens are pruned in the last stage. Extensive experiments demonstrate that LVLM_CSP achieves a 65% reduction in image token inference FLOPs with virtually no accuracy degradation, and a 70% reduction with only a minor 1% drop in accuracy on the 7B LVLM.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Authors:
Junchen Fu,
Yongxin Ni,
Joemon M. Jose,
Ioannis Arapakis,
Kaiwen Zheng,
Youhua Li,
Xuri Ge
Abstract:
Multimodal Foundation Models (MFMs) excel at representing diverse raw modalities (e.g., text, images, audio, videos, etc.). As recommender systems increasingly incorporate these modalities, leveraging MFMs to generate better representations has great potential. However, their application in sequential recommendation remains largely unexplored. This is primarily because mainstream adaptation method…
▽ More
Multimodal Foundation Models (MFMs) excel at representing diverse raw modalities (e.g., text, images, audio, videos, etc.). As recommender systems increasingly incorporate these modalities, leveraging MFMs to generate better representations has great potential. However, their application in sequential recommendation remains largely unexplored. This is primarily because mainstream adaptation methods, such as Fine-Tuning and even Parameter-Efficient Fine-Tuning (PEFT) techniques (e.g., Adapter and LoRA), incur high computational costs, especially when integrating multiple modality encoders, thus hindering research progress. As a result, it remains unclear whether we can efficiently and effectively adapt multiple (>2) MFMs for the sequential recommendation task.
To address this, we propose a plug-and-play Cross-modal Side Adapter Network (CROSSAN). Leveraging the fully decoupled side adapter-based paradigm, CROSSAN achieves high efficiency while enabling cross-modal learning across diverse modalities. To optimize the final stage of multimodal fusion across diverse modalities, we adopt the Mixture of Modality Expert Fusion (MOMEF) mechanism. CROSSAN achieves superior performance on the public datasets for adapting four foundation models with raw modalities. Performance consistently improves as more MFMs are adapted. We will release our code and datasets to facilitate future research.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting
Authors:
Junbang Liu,
Enpei Huang,
Dongxing Mao,
Hui Zhang,
Xinyuan Song,
Yongxin Ni
Abstract:
Creating 3D content from single-view images is a challenging problem that has attracted considerable attention in recent years. Current approaches typically utilize score distillation sampling (SDS) from pre-trained 2D diffusion models to generate multi-view 3D representations. Although some methods have made notable progress by balancing generation speed and model quality, their performance is of…
▽ More
Creating 3D content from single-view images is a challenging problem that has attracted considerable attention in recent years. Current approaches typically utilize score distillation sampling (SDS) from pre-trained 2D diffusion models to generate multi-view 3D representations. Although some methods have made notable progress by balancing generation speed and model quality, their performance is often limited by the visual inconsistencies of the diffusion model outputs. In this work, we propose ContrastiveGaussian, which integrates contrastive learning into the generative process. By using a perceptual loss, we effectively differentiate between positive and negative samples, leveraging the visual inconsistencies to improve 3D generation quality. To further enhance sample differentiation and improve contrastive learning, we incorporate a super-resolution model and introduce another Quantity-Aware Triplet Loss to address varying sample distributions during training. Our experiments demonstrate that our approach achieves superior texture fidelity and improved geometric consistency.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Authors:
Yichun Yin,
Wenyong Huang,
Kaikai Song,
Yehui Tang,
Xueyu Wu,
Wei Guo,
Peng Guo,
Yaoyuan Wang,
Xiaojun Meng,
Yasheng Wang,
Dong Li,
Can Chen,
Dandan Tu,
Yin Li,
Fisher Yu,
Ruiming Tang,
Yunhe Wang,
Baojun Wang,
Bin Wang,
Bo Wang,
Boxiao Liu,
Changzheng Zhang,
Duyu Tang,
Fei Mi,
Hui Jin
, et al. (27 additional authors not shown)
Abstract:
We present Pangu Ultra, a Large Language Model (LLM) with 135 billion parameters and dense Transformer modules trained on Ascend Neural Processing Units (NPUs). Although the field of LLM has been witnessing unprecedented advances in pushing the scale and capability of LLM in recent years, training such a large-scale model still involves significant optimization and system challenges. To stabilize…
▽ More
We present Pangu Ultra, a Large Language Model (LLM) with 135 billion parameters and dense Transformer modules trained on Ascend Neural Processing Units (NPUs). Although the field of LLM has been witnessing unprecedented advances in pushing the scale and capability of LLM in recent years, training such a large-scale model still involves significant optimization and system challenges. To stabilize the training process, we propose depth-scaled sandwich normalization, which effectively eliminates loss spikes during the training process of deep models. We pre-train our model on 13.2 trillion diverse and high-quality tokens and further enhance its reasoning capabilities during post-training. To perform such large-scale training efficiently, we utilize 8,192 Ascend NPUs with a series of system optimizations. Evaluations on multiple diverse benchmarks indicate that Pangu Ultra significantly advances the state-of-the-art capabilities of dense LLMs such as Llama 405B and Mistral Large 2, and even achieves competitive results with DeepSeek-R1, whose sparse model structure contains much more parameters. Our exploration demonstrates that Ascend NPUs are capable of efficiently and effectively training dense models with more than 100 billion parameters. Our model and system will be available for our commercial customers.
△ Less
Submitted 11 April, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Video-Bench: Human-Aligned Video Generation Benchmark
Authors:
Hui Han,
Siyuan Li,
Jiaqi Chen,
Yiwen Yuan,
Yuling Wu,
Chak Tou Leong,
Hanwen Du,
Junchen Fu,
Youhua Li,
Jie Zhang,
Chi Zhang,
Li-jia Li,
Yongxin Ni
Abstract:
Video generation assessment is essential for ensuring that generative models produce visually realistic, high-quality videos while aligning with human expectations. Current video generation benchmarks fall into two main categories: traditional benchmarks, which use metrics and embeddings to evaluate generated video quality across multiple dimensions but often lack alignment with human judgments; a…
▽ More
Video generation assessment is essential for ensuring that generative models produce visually realistic, high-quality videos while aligning with human expectations. Current video generation benchmarks fall into two main categories: traditional benchmarks, which use metrics and embeddings to evaluate generated video quality across multiple dimensions but often lack alignment with human judgments; and large language model (LLM)-based benchmarks, though capable of human-like reasoning, are constrained by a limited understanding of video quality metrics and cross-modal consistency. To address these challenges and establish a benchmark that better aligns with human preferences, this paper introduces Video-Bench, a comprehensive benchmark featuring a rich prompt suite and extensive evaluation dimensions. This benchmark represents the first attempt to systematically leverage MLLMs across all dimensions relevant to video generation assessment in generative models. By incorporating few-shot scoring and chain-of-query techniques, Video-Bench provides a structured, scalable approach to generated video evaluation. Experiments on advanced models including Sora demonstrate that Video-Bench achieves superior alignment with human preferences across all dimensions. Moreover, in instances where our framework's assessments diverge from human evaluations, it consistently offers more objective and accurate insights, suggesting an even greater potential advantage over traditional human judgment.
△ Less
Submitted 29 April, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
Large-scale surveys of the quasar proximity effect
Authors:
Rupert A. C. Croft,
Patrick Shaw,
Ann-Marsha Alexis,
Nianyi Chen,
Yihao Zhou,
Tiziana Di Matteo,
Simeon Bird,
Patrick Lachance,
Yueying Ni
Abstract:
The UV radiation from high redshift quasars causes a local deficit in the neutral hydrogen absorption (Lyman-alpha forest) in their spectra, known as the proximity effect. Measurements from small samples of tens to hundreds of quasars have been used to constrain the global intensity of the UV background radiation, but so far the power of large-scale surveys such as the Sloan Digital Sky Survey and…
▽ More
The UV radiation from high redshift quasars causes a local deficit in the neutral hydrogen absorption (Lyman-alpha forest) in their spectra, known as the proximity effect. Measurements from small samples of tens to hundreds of quasars have been used to constrain the global intensity of the UV background radiation, but so far the power of large-scale surveys such as the Sloan Digital Sky Survey and the Dark Energy Spectroscopic Instrument (DESI) survey has not been used to investigate the UV background in more detail. We develop a CDM-based halo model of the quasar proximity effect, which accounts by construction for the fact that quasars reside in overdense regions. We test this model on quasar Lyman-alpha spectra from the ASTRID cosmological hydrodynamic simulation, which includes self-consistent formation of quasar black holes and the intergalactic medium surrounding them. Fitting the model to individual quasar spectra, we constrain two parameters, r_eq (the radius at which the local quasar radiation intensity equals the background), and the quasar bias b_q (related to host halo mass). We find that r_eq can be recovered in an unbiased fashion with a statistical uncertainty of 25-50% from a single quasar spectrum. Applying such fitting to samples of millions of spectra from e.g., DESI would allow measurement of the UVBG intensity and its evolution with redshift with high precision. We use another, larger-scale, lower resolution simulation (Uchuu) to test how such a large sample of proximity effect measurements could be used to probe the spatial fluctuations in the intergalactic radiation field. We find that the large-scale structure of the UV radiation intensity could be mapped and its power spectrum measured on 100-1000 Mpc/h scales. This could allow the large-scale radiation field to join the density field as a dataset for constraining cosmology and the sources of radiation.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Gravitational Waves from Massive Black Hole Mergers in ASTRID: Predictions for LISA
Authors:
Bonny Y. Wang,
Yihao Zhou,
William Chen,
Nianyi Chen,
Tiziana Di Matteo,
Rupert Croft,
Simeon Bird,
Yueying Ni
Abstract:
We use the ASTRID cosmological simulation to forecast massive black hole (MBH) mergers detectable by LISA down to $z=0$. ASTRID directly models MBH dynamical friction, allowing a realistic tracking of their trajectory. It also incorporates relatively low-mass MBH seeds down to $5\times 10^{4}\mathrm{M}_{\odot}$, providing a more complete picture of LISA MBH mergers. We find that LISA MBH mergers i…
▽ More
We use the ASTRID cosmological simulation to forecast massive black hole (MBH) mergers detectable by LISA down to $z=0$. ASTRID directly models MBH dynamical friction, allowing a realistic tracking of their trajectory. It also incorporates relatively low-mass MBH seeds down to $5\times 10^{4}\mathrm{M}_{\odot}$, providing a more complete picture of LISA MBH mergers. We find that LISA MBH mergers initially have high eccentricities, peaking around $e_0 = 0.8$ across all redshifts. Accounting for this boosts the event rate from 5.6/yr (if circular orbits are assumed) to 10.5/yr. This enhancement is largely due to additional inspiral sources that will coalesce after LISA's observation, which constitute 46% of detected events. This underscores the importance of LISA's sensitivity to the early inspiral phase, especially for eccentric binaries that emit gravitational waves across a wider frequency band. Most LISA events in ASTRID arise from $M_{\mathrm{BH}} \sim 10^{5-6}~\mathrm{M}_{\odot}$, low-redshift ($z<2$) and low mass-ratio ($q\sim 0.01-0.1)$ mergers. Accounting for eccentricity broadens the detectable MBH mass range up to $10^9~\mathrm{M}_{\odot}$, and shifts the peak of detectable mergers to a lower redshift $z_{\rm peak} = 0.8$. This implies that the most massive LISA events may also be PTA sources. We predict LISA events to be in various galaxy environments, including many low-mass satellite galaxies. The EM counterparts of most LISA sources have AGN luminosities $L_{\rm bol}> 10^{42}$erg/s, albeit only $1\%$ with $ > 10^{44}$erg/s. The brightest AGN are those associated with the rare LISA/PTA events with $M_{\rm BH} > 10^{8}~\mathrm{M}_{\odot}$.
△ Less
Submitted 26 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
An Empirical Study of Rust-Specific Bugs in the rustc Compiler
Authors:
Zixi Liu,
Yang Feng,
Yunbo Ni,
Shaohua Li,
Xizhe Yin,
Qingkai Shi,
Baowen Xu,
Zhendong Su
Abstract:
Rust is gaining popularity for its well-known memory safety guarantees and high performance, distinguishing it from C/C++ and JVM-based languages. Its compiler, rustc, enforces these guarantees through specialized mechanisms such as trait solving, borrow checking, and specific optimizations. However, Rust's unique language mechanisms introduce complexity to its compiler, leading to Rust-specific c…
▽ More
Rust is gaining popularity for its well-known memory safety guarantees and high performance, distinguishing it from C/C++ and JVM-based languages. Its compiler, rustc, enforces these guarantees through specialized mechanisms such as trait solving, borrow checking, and specific optimizations. However, Rust's unique language mechanisms introduce complexity to its compiler, leading to Rust-specific compiler bugs that are less common in traditional compilers. With Rust's increasing adoption in safety-critical domains, understanding these language mechanisms and their impact on compiler bugs is essential for improving the reliability of both rustc and Rust programs. Yet, we still lack a large-scale, detailed, and in-depth study of Rust-specific bugs in rustc.
To bridge this gap, this work conducts a comprehensive and systematic study of Rust-specific bugs in rustc, with a particular focus on the components that support its unique language features. Our analysis examines issues and fixes reported between 2022 and 2024, with a manual review of 301 valid issues. We categorize these bugs based on their causes, symptoms, affected compilation stages, and test case characteristics. Additionally, we evaluate existing rustc testing tools to assess their effectiveness and limitations. Our key findings include: (1) rustc bugs primarily arise from Rust's type system and lifetime model, with frequent errors in the High-Level Intermediate Representation (HIR) and Mid-Level Intermediate Representation (MIR) modules due to complex checkers and optimizations; (2) bug-revealing test cases often involve unstable features, advanced trait usages, lifetime annotations, standard APIs, and specific optimization levels; (3) while both valid and invalid programs can trigger bugs, existing testing tools struggle to detect non-crash errors, underscoring the need for further advancements in rustc testing.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Infant Core-collapse Supernovae with Circumstellar Interactions from KMTNet I: Luminous Transitional Case of KSP-SN-2022c
Authors:
Nan Jiang,
Dae-Sik Moon,
Yuan Qi Ni,
Maria R. Drout,
Hong Soo Park,
Santiago González-Gaitán,
Sang Chul Kim,
Youngdae Lee,
Ernest Chang
Abstract:
We present $BVi$ multi-band high-cadence observations of a Type II supernova (SN) KSP-SN-2022c from a star-forming galaxy at $z$ $\simeq$ 0.041 from its infant to nebular phase. Early light curve fitting with a single power-law is consistent with the first detection of roughly 15 minutes after shock breakout. The SN light curves feature a rapid rise and decline across its luminous ($V$ $\simeq$ -1…
▽ More
We present $BVi$ multi-band high-cadence observations of a Type II supernova (SN) KSP-SN-2022c from a star-forming galaxy at $z$ $\simeq$ 0.041 from its infant to nebular phase. Early light curve fitting with a single power-law is consistent with the first detection of roughly 15 minutes after shock breakout. The SN light curves feature a rapid rise and decline across its luminous ($V$ $\simeq$ -18.41 mag) peak together with a short plateau. The presence of the short plateau and rapid post-peak decline place the SN within a small group of transitional type between Type II-P and II-L subtypes. Its (i) broad and asymmetric H profiles with large emission-to-absorption ratios and (ii) near-peak luminosity in excess of predictions from SN shock cooling models both point to circumstellar interactions in this SN. Early colour evolution exhibits a short-lived blueward motion in $B-V$ within the first few days and continuous reddening in $V-i$, inconsistent with simple blackbody heating. Our simulations of SN light curves estimate 13 $M_\odot$ and 680 $R_\odot$ for the mass and radius of the progenitor, respectively, together with CSM of 0.73 $M_\odot$ to account for the excess luminosity and rapid post-peak declines. We discuss the origin of its short plateau and early colour evolution in the context of partial envelope stripping of the progenitor star and a delayed SN shock breakout near the edge of the CSM, respectively, as indicated by our simulations. We establish a correlation between post-peak decline rates and CSM mass in Type II SNe, highlighting that CSM interactions play a major role in shaping the post-peak evolution of transitional types.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Authors:
NVIDIA,
:,
Alisson Azzolini,
Junjie Bai,
Hannah Brandon,
Jiaxin Cao,
Prithvijit Chattopadhyay,
Huayu Chen,
Jinju Chu,
Yin Cui,
Jenna Diamond,
Yifan Ding,
Liang Feng,
Francesco Ferroni,
Rama Govindaraju,
Jinwei Gu,
Siddharth Gururani,
Imad El Hanafi,
Zekun Hao,
Jacob Huffman,
Jingyi Jin,
Brendan Johnson,
Rizwan Khan,
George Kurian,
Elena Lantz
, et al. (29 additional authors not shown)
Abstract:
Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, wit…
▽ More
Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, with a focus on physical common sense and embodied reasoning. To represent physical common sense, we use a hierarchical ontology that captures fundamental knowledge about space, time, and physics. For embodied reasoning, we rely on a two-dimensional ontology that generalizes across different physical embodiments. Building on these capabilities, we develop two multimodal large language models, Cosmos-Reason1-7B and Cosmos-Reason1-56B. We curate data and train our models in two stages: Physical AI supervised fine-tuning (SFT) and Physical AI reinforcement learning (RL). To evaluate our models, we build comprehensive benchmarks for physical common sense and embodied reasoning according to our ontologies. Evaluation results show that Physical AI SFT and RL bring significant improvements. To facilitate the development of Physical AI, we make our code and pre-trained models available under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-reason1.
△ Less
Submitted 19 May, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
A High-Speed Time-Optimal Trajectory Generation Strategy via a Two-layer Planning Model
Authors:
Haotian Tan,
Yuan-Hua Ni
Abstract:
Motion planning and trajectory generation are crucial technologies in various domains including the control of Unmanned Aerial Vehicles, manipulators, and rockets. However, optimization-based real-time motion planning becomes increasingly challenging due to the problem's probable non-convexity and the inherent limitations of non-linear programming algorithms. Highly nonlinear dynamics, obstacle av…
▽ More
Motion planning and trajectory generation are crucial technologies in various domains including the control of Unmanned Aerial Vehicles, manipulators, and rockets. However, optimization-based real-time motion planning becomes increasingly challenging due to the problem's probable non-convexity and the inherent limitations of non-linear programming algorithms. Highly nonlinear dynamics, obstacle avoidance constraints, and non-convex inputs can exacerbate these difficulties. In order to enhance the robustness and reduce the computational burden, this paper proposes a two-layer trajectory generating algorithm for intelligent ground vehicles with convex optimization methods, aiming to provide real-time guarantees for trajectory optimization and to improve the calculate speed of motion prediction. Our approach involves breaking down the original problem into small horizon-based planning cycles with fixed final times, referred to as planning cycles. Each planning cycle is then solved within a series of restricted convex sets constructed by some customized search algorithms incrementally. We rigorously establish these advantages through mathematical analysis under moderate assumptions and comprehensive experimental validations. For linear vehicle models, comparative experiments with general sequential convex programming algorithms demonstrate the superior performance of our proposed method, particularly in terms of the computational efficiency in dynamic maps and the reduced final time.
△ Less
Submitted 6 April, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Safety Control of Impulsive Systems with Control Barrier Functions and Adaptive Gains
Authors:
Zihan Liu,
Yuan-Hua Ni
Abstract:
This paper addresses the safety challenges in impulsive systems, where abrupt state jumps introduce significant complexities into system dynamics. A unified framework is proposed by integrating Quadratic Programming (QP), Control Barrier Functions (CBFs), and adaptive gain mechanisms to ensure system safety during impulsive events. The CBFs are constructed to enforce safety constraints by capturin…
▽ More
This paper addresses the safety challenges in impulsive systems, where abrupt state jumps introduce significant complexities into system dynamics. A unified framework is proposed by integrating Quadratic Programming (QP), Control Barrier Functions (CBFs), and adaptive gain mechanisms to ensure system safety during impulsive events. The CBFs are constructed to enforce safety constraints by capturing the system's continuous dynamics and the effects of impulsive state transitions. An adaptive gain mechanism dynamically adjusts control inputs based on the magnitudes of the impulses and the system's proximity to safety boundaries, maintaining safety during instantaneous state jumps. A tailored QP formulation incorporates CBFs constraints and adaptive gain adjustments, optimizing control inputs while ensuring compliance with safety-critical requirements. Theoretical analysis establishes the boundedness, continuity, and feasibility of the adaptive gain and the overall framework. The effectiveness of the method is demonstrated through simulations on a robotic manipulator, showcasing its practical applicability to impulsive systems with state jumps.
△ Less
Submitted 9 April, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Spiritus: An AI-Assisted Tool for Creating 2D Characters and Animations
Authors:
Qirui Sun,
Yunyi Ni,
Teli Yuan,
Jingjing Zhang,
Fan Yang,
Zhihao Yao,
Haipeng Mi
Abstract:
This research presents Spiritus, an AI-assisted creation tool designed to streamline 2D character animation creation while enhancing creative flexibility. By integrating natural language processing and diffusion models, users can efficiently transform natural language descriptions into personalized 2D characters and animations. The system employs automated segmentation, layered costume techniques,…
▽ More
This research presents Spiritus, an AI-assisted creation tool designed to streamline 2D character animation creation while enhancing creative flexibility. By integrating natural language processing and diffusion models, users can efficiently transform natural language descriptions into personalized 2D characters and animations. The system employs automated segmentation, layered costume techniques, and dynamic mesh-skeleton binding solutions to support flexible adaptation of complex costumes and additional components. Spiritus further achieves real-time animation generation and efficient animation resource reuse between characters through the integration of BVH data and motion diffusion models. Experimental results demonstrate Spiritus's effectiveness in reducing technical barriers, enhancing creative freedom, and supporting resource universality. Future work will focus on optimizing user experience and further exploring the system's human-computer collaboration potential.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Maximum Inner Product is Query-Scaled Nearest Neighbor
Authors:
Tingyang Chen,
Cong Fu,
Kun Wang,
Xiangyu Ke,
Yunjun Gao,
Wenchao Zhou,
Yabo Ni,
Anxiang Zeng
Abstract:
Maximum Inner Product Search (MIPS) for high-dimensional vectors is pivotal across databases, information retrieval, and artificial intelligence. Existing methods either reduce MIPS to Nearest Neighbor Search (NNS) while suffering from harmful vector space transformations, or attempt to tackle MIPS directly but struggle to mitigate redundant computations due to the absence of the triangle inequali…
▽ More
Maximum Inner Product Search (MIPS) for high-dimensional vectors is pivotal across databases, information retrieval, and artificial intelligence. Existing methods either reduce MIPS to Nearest Neighbor Search (NNS) while suffering from harmful vector space transformations, or attempt to tackle MIPS directly but struggle to mitigate redundant computations due to the absence of the triangle inequality. This paper presents a novel theoretical framework that equates MIPS with NNS without requiring space transformation, thereby allowing us to leverage advanced graph-based indices for NNS and efficient edge pruning strategies, significantly reducing unnecessary computations. Despite a strong baseline set by our theoretical analysis, we identify and address two persistent challenges to further refine our method: the introduction of the Proximity Graph with Spherical Pathway (PSP), designed to mitigate the issue of MIPS solutions clustering around large-norm vectors, and the implementation of Adaptive Early Termination (AET), which efficiently curtails the excessive exploration once an accuracy bottleneck is reached. Extensive experiments reveal the superiority of our method over existing state-of-the-art techniques in search efficiency, scalability, and practical applicability. Compared with state-of-the-art graph based methods, it achieves an average 35% speed-up in query processing and a 3x reduction in index size. Notably, our approach has been validated and deployed in the search engines of Shopee, a well-known online shopping platform. Our code and an industrial-scale dataset for offline evaluation will also be released to address the absence of e-commerce data in public benchmarks.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
DependEval: Benchmarking LLMs for Repository Dependency Understanding
Authors:
Junjia Du,
Yadi Liu,
Hongcheng Guo,
Jiawei Wang,
Haojian Huang,
Yunyi Ni,
Zhoujun Li
Abstract:
While large language models (LLMs) have shown considerable promise in code generation, real-world software development demands advanced repository-level reasoning. This includes understanding dependencies, project structures, and managing multi-file changes. However, the ability of LLMs to effectively comprehend and handle complex code repositories has yet to be fully explored. To address challeng…
▽ More
While large language models (LLMs) have shown considerable promise in code generation, real-world software development demands advanced repository-level reasoning. This includes understanding dependencies, project structures, and managing multi-file changes. However, the ability of LLMs to effectively comprehend and handle complex code repositories has yet to be fully explored. To address challenges, we introduce a hierarchical benchmark designed to evaluate repository dependency understanding (DependEval). Benchmark is based on 15,576 repositories collected from real-world websites. It evaluates models on three core tasks: Dependency Recognition, Repository Construction, and Multi-file Editing, across 8 programming languages from actual code repositories. Our evaluation of over 25 LLMs reveals substantial performance gaps and provides valuable insights into repository-level code understanding.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning
Authors:
Ryozo Masukawa,
Sanggeon Yun,
Sungheon Jeong,
Wenjun Huang,
Yang Ni,
Ian Bryant,
Nathaniel D. Bastian,
Mohsen Imani
Abstract:
Traffic classification is vital for cybersecurity, yet encrypted traffic poses significant challenges. We present PacketCLIP, a multi-modal framework combining packet data with natural language semantics through contrastive pretraining and hierarchical Graph Neural Network (GNN) reasoning. PacketCLIP integrates semantic reasoning with efficient classification, enabling robust detection of anomalie…
▽ More
Traffic classification is vital for cybersecurity, yet encrypted traffic poses significant challenges. We present PacketCLIP, a multi-modal framework combining packet data with natural language semantics through contrastive pretraining and hierarchical Graph Neural Network (GNN) reasoning. PacketCLIP integrates semantic reasoning with efficient classification, enabling robust detection of anomalies in encrypted network flows. By aligning textual descriptions with packet behaviors, it offers enhanced interpretability, scalability, and practical applicability across diverse security scenarios. PacketCLIP achieves a 95% mean AUC, outperforms baselines by 11.6%, and reduces model size by 92%, making it ideal for real-time anomaly detection. By bridging advanced machine learning techniques and practical cybersecurity needs, PacketCLIP provides a foundation for scalable, efficient, and interpretable solutions to tackle encrypted traffic classification and network intrusion detection challenges in resource-constrained environments.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Authors:
Yuyan Ni,
Shikun Feng,
Haohan Chi,
Bowen Zheng,
Huan-ang Gao,
Wei-Ying Ma,
Zhi-Ming Ma,
Yanyan Lan
Abstract:
Diffusion-based models have shown great promise in molecular generation but often require a large number of sampling steps to generate valid samples. In this paper, we introduce a novel Straight-Line Diffusion Model (SLDM) to tackle this problem, by formulating the diffusion process to follow a linear trajectory. The proposed process aligns well with the noise sensitivity characteristic of molecul…
▽ More
Diffusion-based models have shown great promise in molecular generation but often require a large number of sampling steps to generate valid samples. In this paper, we introduce a novel Straight-Line Diffusion Model (SLDM) to tackle this problem, by formulating the diffusion process to follow a linear trajectory. The proposed process aligns well with the noise sensitivity characteristic of molecular structures and uniformly distributes reconstruction effort across the generative process, thus enhancing learning efficiency and efficacy. Consequently, SLDM achieves state-of-the-art performance on 3D molecule generation benchmarks, delivering a 100-fold improvement in sampling efficiency.
△ Less
Submitted 9 June, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
The Birth of a Major Coronal Mass Ejection with Intricate Magnetic Structure from Multiple Active Regions
Authors:
Jinhan Guo,
Y. W. Ni,
B. Schmieder,
Y. Guo,
C. Xia,
P. Devi,
R. Chandra,
S. Poedts,
R. Joshi,
Y. H. Zhou,
H. T. Li,
P. F. Chen
Abstract:
Coronal mass ejections (CMEs) are the eruptions of magnetised plasma from the Sun and are considered the main driver of adverse space weather events. Hence, undrstanding its formation process, particularly the magnetic topology, is critical for accurate space weather prediction. Here, based on imaging observations and three-dimensional (3D) data-constrained thermodynamic magnetohydrodynamical (MHD…
▽ More
Coronal mass ejections (CMEs) are the eruptions of magnetised plasma from the Sun and are considered the main driver of adverse space weather events. Hence, undrstanding its formation process, particularly the magnetic topology, is critical for accurate space weather prediction. Here, based on imaging observations and three-dimensional (3D) data-constrained thermodynamic magnetohydrodynamical (MHD) simulation in spherical coordinates, we exhibit the birth of a CME with intricate magnetic structure from multiple active regions (ARs) due to 3D magnetic reconnection. It is observed as a coronal jet between active regions, accompanied by the back-flowing of filament materials along the jet spine after the passage of the eruptive filament. This jet connects two dimming regions within different active regions. This is an observational proxy of 3D magnetic reconnection between the CME flux rope and the null-point magnetic field lines crossing active regions. Hereafter, the thermodynamic data-constrained MHD simulation successfully reproduces the observed jet and the reconnection process that flux ropes partake in, leading to a CME flux rope with a complex magnetic structure distinct from its progenitor. The generality of this scenario is then validated by data-inspired MHD simulations in a simple multipolar magnetic configuration. This work demonstrates the role of multiple active regions in forming CMEs with intricate magnetic structures. On the one hand, a non-coherent flux rope where not all twisted magnetic field lines wind around one common axis is naturally formed. On the other hand, our findings suggest that the topology of a real CME flux rope may not be solely determined by a single active region, particularly during periods of solar maximum.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Ultra-high-energy $γ$-ray emission associated with the tail of a bow-shock pulsar wind nebula
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (274 additional authors not shown)
Abstract:
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola f…
▽ More
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola function with $N0 = (1.93\pm0.23) \times 10^{-16} \rm{TeV^{-1}\,cm^{-2}\,s^{-2}}$, $α= 2.14\pm0.27$, and $β= 1.20\pm0.41$ at E0 = 30$\,$TeV. The associated pulsar, PSR J1740+1000, resides at a high galactic latitude and powers a bow-shock pulsar wind nebula (BSPWN) with an extended X-ray tail. The best-fit position of the gamma-ray source appeared to be shifted by $0.2^{\circ}$ with respect to the pulsar position. As the (i) currently identified pulsar halos do not demonstrate such offsets, and (ii) centroid of the gamma-ray emission is approximately located at the extension of the X-ray tail, we speculate that the UHE $γ$-ray emission may originate from re-accelerated electron/positron pairs that are advected away in the bow-shock tail.
△ Less
Submitted 24 February, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Authors:
M-A-P Team,
Xinrun Du,
Yifan Yao,
Kaijing Ma,
Bingli Wang,
Tianyu Zheng,
King Zhu,
Minghao Liu,
Yiming Liang,
Xiaolong Jin,
Zhenlin Wei,
Chujie Zheng,
Kaixin Deng,
Shawn Gavin,
Shian Jia,
Sichao Jiang,
Yiyan Liao,
Rui Li,
Qinrui Li,
Sirun Li,
Yizhi Li,
Yunwen Li,
David Ma,
Yuansheng Ni,
Haoran Que
, et al. (72 additional authors not shown)
Abstract:
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-orient…
▽ More
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.
△ Less
Submitted 28 March, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.