Search | arXiv e-print repository

arXiv:2503.12765 [pdf]

Stabilization Analysis and Mode Recognition of Kerosene Supersonic Combustion: A Deep Learning Approach Based on Res-CNN-beta-VAE

Authors: Weiming Xu, Tao Yang, Chang Liu, Kun Wu, Peng Zhang

Abstract: The scramjet engine is a key propulsion system for hypersonic vehicles, leveraging supersonic airflow to achieve high specific impulse, making it a promising technology for aerospace applications. Understanding and controlling the complex interactions between fuel injection, turbulent combustion, and aerodynamic effects of compressible flows are crucial for ensuring stable combustion in scramjet e… ▽ More The scramjet engine is a key propulsion system for hypersonic vehicles, leveraging supersonic airflow to achieve high specific impulse, making it a promising technology for aerospace applications. Understanding and controlling the complex interactions between fuel injection, turbulent combustion, and aerodynamic effects of compressible flows are crucial for ensuring stable combustion in scramjet engines. However, identifying stable modes in scramjet combustors is often challenging due to limited experimental measurement means and extremely complex spatiotemporal evolution of supersonic turbulent combustion. This work introduces an innovative deep learning framework that combines dimensionality reduction via the Residual Convolutional Neural Network-beta-Variational Autoencoder (Res-CNN-beta-VAE) model with unsupervised clustering (K-means) to identify and analyze dynamical combustion modes in a supersonic combustor. By mapping high-dimensional data of combustion snapshots to a reduced three-dimensional latent space, the Res-CNN-beta-VAE model captures the essential temporal and spatial features of flame behaviors and enables the observation of transitions between combustion states. By analyzing the standard deviation of latent variable trajectories, we introduce a novel method for objectively distinguishing between dynamic transitions, which provides a scalable and expert-independent alternative to traditional classification methods. Besides, the unsupervised K-means clustering approach effectively identifies the complex interplay between the cavity and the jet-wake stabilization mechanisms, offering new insights into the system's behavior across different gas-to-liquid mass flow ratios (GLRs). △ Less

Submitted 16 March, 2025; originally announced March 2025.

Comments: 10 pages, 6 figures

arXiv:2503.12485 [pdf, other]

Cross-Modal Consistency Learning for Sign Language Recognition

Authors: Kepeng Wu, Zecheng Li, Hezhen Hu, Wengang Zhou, Houqiang Li

Abstract: Pre-training has been proven to be effective in boosting the performance of Isolated Sign Language Recognition (ISLR). Existing pre-training methods solely focus on the compact pose data, which eliminates background perturbation but inevitably suffers from insufficient semantic cues compared to raw RGB videos. Nevertheless, learning representation directly from RGB videos remains challenging due t… ▽ More Pre-training has been proven to be effective in boosting the performance of Isolated Sign Language Recognition (ISLR). Existing pre-training methods solely focus on the compact pose data, which eliminates background perturbation but inevitably suffers from insufficient semantic cues compared to raw RGB videos. Nevertheless, learning representation directly from RGB videos remains challenging due to the presence of sign-independent visual features. To address this dilemma, we propose a Cross-modal Consistency Learning framework (CCL-SLR), which leverages the cross-modal consistency from both RGB and pose modalities based on self-supervised pre-training. First, CCL-SLR employs contrastive learning for instance discrimination within and across modalities. Through the single-modal and cross-modal contrastive learning, CCL-SLR gradually aligns the feature spaces of RGB and pose modalities, thereby extracting consistent sign representations. Second, we further introduce Motion-Preserving Masking (MPM) and Semantic Positive Mining (SPM) techniques to improve cross-modal consistency from the perspective of data augmentation and sample similarity, respectively. Extensive experiments on four ISLR benchmarks show that CCL-SLR achieves impressive performance, demonstrating its effectiveness. The code will be released to the public. △ Less

Submitted 21 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

arXiv:2503.12473 [pdf, other]

Oscillation-eliminating central DG schemes for hyperbolic conservation laws

Authors: Manting Peng, Kailiang Wu, Caiyou Yuan

Abstract: This paper proposes and analyzes a class of essentially non-oscillatory central discontinuous Galerkin (CDG) methods for general hyperbolic conservation laws. First, we introduce a novel compact, non-oscillatory stabilization mechanism that effectively suppresses spurious oscillations while preserving the high-order accuracy of CDG methods. Unlike existing limiter-based approaches that rely on lar… ▽ More This paper proposes and analyzes a class of essentially non-oscillatory central discontinuous Galerkin (CDG) methods for general hyperbolic conservation laws. First, we introduce a novel compact, non-oscillatory stabilization mechanism that effectively suppresses spurious oscillations while preserving the high-order accuracy of CDG methods. Unlike existing limiter-based approaches that rely on large stencils or problem-specific parameters for oscillation control, our dual damping mechanism is inspired by CDG-based numerical dissipation and leverages overlapping solutions within the CDG framework, significantly enhancing stability while maintaining compactness. Our approach is free of problem-dependent parameters and complex characteristic decomposition, making it efficient and robust. Second, we provide a rigorous stability and optimal error analysis for fully discrete Runge-Kutta (RK) CDG schemes, addressing a gap in the theoretical understanding of these methods. Specifically, we establish the approximate skew-symmetry and weak boundedness of the CDG discretization. These results enable us to rigorously analyze the fully discrete error estimates for our oscillation-eliminating CDG (OECDG) method, a challenging task due to its nonlinear nature, even for linear advection equations. Building on this framework, we reformulate nonlinear oscillation-eliminating CDG schemes as linear RK CDG schemes with a nonlinear source term, extending error estimates beyond the linear case to schemes with nonlinear oscillation control. While existing error analyses for DG or CDG schemes have largely been restricted to linear cases without nonlinear oscillation-control techniques, our analysis represents an important theoretical advancement. Experiments validate the theoretical findings and demonstrate the effectiveness of the OECDG method. △ Less

Submitted 16 March, 2025; originally announced March 2025.

Comments: 26 pages

arXiv:2503.12465 [pdf, other]

On Local Minimum Entropy Principle of High-Order Schemes for Relativistic Euler Equations

Authors: Shumo Cui, Kailiang Wu, Linfeng Xu

Abstract: This paper establishes the minimum entropy principle (MEP) for the relativistic Euler equations with a broad class of equations of state (EOSs) and addresses the challenge of preserving the local version of the discovered MEP in high-order numerical schemes. At the continuous level, we find out a family of entropy pairs for the relativistic Euler equations and provide rigorous analysis to prove th… ▽ More This paper establishes the minimum entropy principle (MEP) for the relativistic Euler equations with a broad class of equations of state (EOSs) and addresses the challenge of preserving the local version of the discovered MEP in high-order numerical schemes. At the continuous level, we find out a family of entropy pairs for the relativistic Euler equations and provide rigorous analysis to prove the strict convexity of entropy under a necessary and sufficient condition. At the numerical level, we develop a rigorous framework for designing provably entropy-preserving high-order schemes that ensure both physical admissibility and the discovered MEP. The relativistic effects, coupled with the abstract and general EOS formulation, introduce significant challenges not encountered in the nonrelativistic case or with the ideal EOS. In particular, entropy is a highly nonlinear and implicit function of the conservative variables, making it particularly difficult to enforce entropy preservation. To address these challenges, we establish a series of auxiliary theories via highly technical inequalities. Another key innovation is the use of geometric quasi-linearization (GQL), which reformulates the nonlinear constraints into equivalent linear ones by introducing additional free parameters. These advancements form the foundation of our entropy-preserving analysis. We propose novel, robust, locally entropy-preserving high-order frameworks. A central challenge is accurately estimating the local minimum of entropy, particularly in the presence of shock waves at unknown locations. To address this, we introduce two new approaches for estimating local lower bounds of specific entropy, which prove effective for both smooth and discontinuous problems. Numerical experiments demonstrate that our entropy-preserving methods maintain high-order accuracy while effectively suppressing spurious oscillations. △ Less

Submitted 16 March, 2025; originally announced March 2025.

Comments: 43 pages

arXiv:2503.11625 [pdf, other]

Neutrinos as a new tool to characterise the Milky Way Centre

Authors: Paul C. W. Lai, Beatrice Crudele, Matteo Agostini, Hayden P. H. Ng, Ellis R. Owen, Nishta Varma, Kinwah Wu

Abstract: The Central Molecular Zone (CMZ), a star-forming region rich in molecular clouds located within hundreds of parsecs from the centre of our Galaxy, converts gas into stars less efficient than anticipated. A key challenge in refining star-formation models is the lack of precise mapping of these dense molecular hydrogen clouds, where traditional tracers often yield inconsistent results due to environ… ▽ More The Central Molecular Zone (CMZ), a star-forming region rich in molecular clouds located within hundreds of parsecs from the centre of our Galaxy, converts gas into stars less efficient than anticipated. A key challenge in refining star-formation models is the lack of precise mapping of these dense molecular hydrogen clouds, where traditional tracers often yield inconsistent results due to environmental limitations. We demonstrate how, in the not-so-far future, neutrinos will emerge as a robust mass tracer thanks to advancements in neutrino telescopes. Since neutrinos are produced alongside gamma-rays when cosmic-rays interact with molecular clouds, they offer a complementary, systematics-independent measurement of the gas density. In an optimistic case where most gamma-ray emission from the Galactic Centre region originates from pion decays, we expect several tens of muon neutrinos to be detected in about two decades by KM3NeT, Baikal-GVD, and P-ONE combined, which will enable a better determination of the baryonic content in the Galactic Centre region. The CMZ will serve as a testbed to calibrate conventional tracers against neutrinos, ultimately improving gas measurements in distant galaxies, where neutrinos are undetectable, but traditional tracers remain available. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 6 pages, 4 figures

arXiv:2503.07588 [pdf, other]

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Authors: Junwei Luo, Yingying Zhang, Xue Yang, Kang Wu, Qi Zhu, Lei Liang, Jingdong Chen, Yansheng Li

Abstract: Efficient vision-language understanding of large Remote Sensing Images (RSIs) is meaningful but challenging. Current Large Vision-Language Models (LVLMs) typically employ limited pre-defined grids to process images, leading to information loss when handling gigapixel RSIs. Conversely, using unlimited grids significantly increases computational costs. To preserve image details while reducing comput… ▽ More Efficient vision-language understanding of large Remote Sensing Images (RSIs) is meaningful but challenging. Current Large Vision-Language Models (LVLMs) typically employ limited pre-defined grids to process images, leading to information loss when handling gigapixel RSIs. Conversely, using unlimited grids significantly increases computational costs. To preserve image details while reducing computational complexity, we propose a text-guided token pruning method with Dynamic Image Pyramid (DIP) integration. Our method introduces: (i) a Region Focus Module (RFM) that leverages text-aware region localization capability to identify critical vision tokens, and (ii) a coarse-to-fine image tile selection and vision token pruning strategy based on DIP, which is guided by RFM outputs and avoids directly processing the entire large imagery. Additionally, existing benchmarks for evaluating LVLMs' perception ability on large RSI suffer from limited question diversity and constrained image sizes. We construct a new benchmark named LRS-VQA, which contains 7,333 QA pairs across 8 categories, with image length up to 27,328 pixels. Our method outperforms existing high-resolution strategies on four datasets using the same data. Moreover, compared to existing token reduction methods, our approach demonstrates higher efficiency under high-resolution settings. Dataset and code are in https://github.com/VisionXLab/LRS-VQA. △ Less

Submitted 25 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

Comments: 12 pages, 6 figures, 7 tables

arXiv:2503.07527 [pdf, other]

Real-Time Load Estimation for Load-lifting Exoskeletons Using Insole Pressure Sensors and Machine Learning

Authors: Kaida Wu, Peihao Xiang, Chaohao Lin, Lixuan Chen, Ou Bai

Abstract: This paper presents a novel method for real-time lifting-load estimation to enhance the control strategies of upper-limb assistive exoskeletons. By leveraging cost-effective insole pressure sensors, the proposed system extracts differential pressure data that minimizes disturbances from variations in body weight and sensor placement. Two modeling approaches are explored: a channel-based method tha… ▽ More This paper presents a novel method for real-time lifting-load estimation to enhance the control strategies of upper-limb assistive exoskeletons. By leveraging cost-effective insole pressure sensors, the proposed system extracts differential pressure data that minimizes disturbances from variations in body weight and sensor placement. Two modeling approaches are explored: a channel-based method that employs traditional regression techniques-Elastic Net, Support Vector Regression (SVR), and Multi-Layer Perceptron (MLP)-and a map-based method that utilizes transfer learning with a pre-trained MobileNetV2 model. The experiment is in the preliminary test stage, covering load ranges from 2 kg to 10 kg in increments of 0.5 kg, and collecting data from three subjects to test the approach. In the Channel-based method, the average Weighted Mean Absolute Percentage Error(WMAPE) for three subjects showed that the SVR achieved 13.46%, with the MLP performing similarly. In the Map-based method, using data from one subject, the Fully Fine-Tuned MobileNetV2 model reached a WMAPE of 9.74%. The results indicate that the integration of insole sensor technology with advanced machine learning models provides an effective solution for dynamic load estimation, potentially reducing the risks of over- and under-compensation in exoskeleton control. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2503.06129 [pdf, other]

Viewport-Unaware Blind Omnidirectional Image Quality Assessment: A Flexible and Effective Paradigm

Authors: Jiebin Yan, Kangcheng Wu, Junjie Chen, Ziwen Tan, Yuming Fang

Abstract: Most of existing blind omnidirectional image quality assessment (BOIQA) models rely on viewport generation by modeling user viewing behavior or transforming omnidirectional images (OIs) into varying formats; however, these methods are either computationally expensive or less scalable. To solve these issues, in this paper, we present a flexible and effective paradigm, which is viewport-unaware and… ▽ More Most of existing blind omnidirectional image quality assessment (BOIQA) models rely on viewport generation by modeling user viewing behavior or transforming omnidirectional images (OIs) into varying formats; however, these methods are either computationally expensive or less scalable. To solve these issues, in this paper, we present a flexible and effective paradigm, which is viewport-unaware and can be easily adapted to 2D plane image quality assessment (2D-IQA). Specifically, the proposed BOIQA model includes an adaptive prior-equator sampling module for extracting a patch sequence from the equirectangular projection (ERP) image in a resolution-agnostic manner, a progressive deformation-unaware feature fusion module which is able to capture patch-wise quality degradation in a deformation-immune way, and a local-to-global quality aggregation module to adaptively map local perception to global quality. Extensive experiments across four OIQA databases (including uniformly distorted OIs and non-uniformly distorted OIs) demonstrate that the proposed model achieves competitive performance with low complexity against other state-of-the-art models, and we also verify its adaptive capacity to 2D-IQA. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.06109 [pdf, other]

A Digital Twin-Driven Recommendation System for Adaptive Campus Course Timetabling

Authors: Keshu Wu, Xinyue Ye, Suphanut Jamonnak, Xin Feng

Abstract: Efficient and adaptive course timetabling for large, dynamic university campuses remains a significant challenge due to the complex interplay of hard and soft constraints. Traditional static optimization methods often fail to accommodate real-time disruptions, evolving user preferences, and the nuanced spatial-temporal relationships inherent in campus environments. This paper reconceptualizes the… ▽ More Efficient and adaptive course timetabling for large, dynamic university campuses remains a significant challenge due to the complex interplay of hard and soft constraints. Traditional static optimization methods often fail to accommodate real-time disruptions, evolving user preferences, and the nuanced spatial-temporal relationships inherent in campus environments. This paper reconceptualizes the timetabling problem as a recommendation-based task and leverages the Texas A&M Campus Digital Twin as a dynamic data platform. Our proposed framework integrates collaborative and content-based filtering techniques with iterative feedback mechanisms, thereby generating a ranked set of adaptive timetable recommendations. A composite scoring function, incorporating metrics for classroom occupancy, travel distance, travel time, and vertical transitions, enables the framework to systematically balance resource utilization with user-centric factors. Extensive experiments using real-world data from Texas A&M University demonstrate that our approach effectively reduces travel inefficiencies, optimizes classroom utilization, and enhances overall user satisfaction. By coupling a recommendation-oriented paradigm with a digital twin environment, this study offers a robust and scalable blueprint for intelligent campus planning and resource allocation, with potential applications in broader urban contexts. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.05077 [pdf, other]

Adaptive-LIO: Enhancing Robustness and Precision through Environmental Adaptation in LiDAR Inertial Odometry

Authors: Chengwei Zhao, Kun Hu, Jie Xu, Lijun Zhao, Baiwen Han, Kaidi Wu, Maoshan Tian, Shenghai Yuan

Abstract: The emerging Internet of Things (IoT) applications, such as driverless cars, have a growing demand for high-precision positioning and navigation. Nowadays, LiDAR inertial odometry becomes increasingly prevalent in robotics and autonomous driving. However, many current SLAM systems lack sufficient adaptability to various scenarios. Challenges include decreased point cloud accuracy with longer frame… ▽ More The emerging Internet of Things (IoT) applications, such as driverless cars, have a growing demand for high-precision positioning and navigation. Nowadays, LiDAR inertial odometry becomes increasingly prevalent in robotics and autonomous driving. However, many current SLAM systems lack sufficient adaptability to various scenarios. Challenges include decreased point cloud accuracy with longer frame intervals under the constant velocity assumption, coupling of erroneous IMU information when IMU saturation occurs, and decreased localization accuracy due to the use of fixed-resolution maps during indoor-outdoor scene transitions. To address these issues, we propose a loosely coupled adaptive LiDAR-Inertial-Odometry named \textbf{Adaptive-LIO}, which incorporates adaptive segmentation to enhance mapping accuracy, adapts motion modality through IMU saturation and fault detection, and adjusts map resolution adaptively using multi-resolution voxel maps based on the distance from the LiDAR center. Our proposed method has been tested in various challenging scenarios, demonstrating the effectiveness of the improvements we introduce. The code is open-source on GitHub: \href{https://github.com/chengwei0427/adaptive_lio}{Adaptive-LIO}. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.04138 [pdf, other]

Mixed Likelihood Variational Gaussian Processes

Authors: Kaiwen Wu, Craig Sanders, Benjamin Letham, Phillip Guan

Abstract: Gaussian processes (GPs) are powerful models for human-in-the-loop experiments due to their flexibility and well-calibrated uncertainty. However, GPs modeling human responses typically ignore auxiliary information, including a priori domain expertise and non-task performance information like user confidence ratings. We propose mixed likelihood variational GPs to leverage auxiliary information, whi… ▽ More Gaussian processes (GPs) are powerful models for human-in-the-loop experiments due to their flexibility and well-calibrated uncertainty. However, GPs modeling human responses typically ignore auxiliary information, including a priori domain expertise and non-task performance information like user confidence ratings. We propose mixed likelihood variational GPs to leverage auxiliary information, which combine multiple likelihoods in a single evidence lower bound to model multiple types of data. We demonstrate the benefits of mixing likelihoods in three real-world experiments with human participants. First, we use mixed likelihood training to impose prior knowledge constraints in GP classifiers, which accelerates active learning in a visual perception task where users are asked to identify geometric errors resulting from camera position errors in virtual reality. Second, we show that leveraging Likert scale confidence ratings by mixed likelihood training improves model fitting for haptic perception of surface roughness. Lastly, we show that Likert scale confidence ratings improve human preference learning in robot gait optimization. The modeling performance improvements found using our framework across this diverse set of applications illustrates the benefits of incorporating auxiliary information into active learning and preference learning by using mixed likelihoods to jointly model multiple inputs. △ Less

Submitted 6 March, 2025; originally announced March 2025.

Comments: 16 pages

arXiv:2503.02252 [pdf, other]

Real-Time Burst-Mode Digital Signal Processing for Passive Optical Networks

Authors: Ji Zhou, Kainan Wu, Haide Wang, Jinyang Yang, Weiping Liu, Junwen Zhang, Changyuan Yu, Xiangjun Xin, Liangchuan Li

Abstract: Driven by the ever-increasing capacity demands, the 50G passive optical network (PON) is maturing gradually. One of the main challenges for the 50G PON is implementing burst-mode digital signal processing (BM-DSP) for the burst upstream signal. In this paper, we demonstrate a real-time BM-DSP for burst reception of 25Gbit/s on-off keying signal to meet the asymmetric-mode 50G PON demand. The real-… ▽ More Driven by the ever-increasing capacity demands, the 50G passive optical network (PON) is maturing gradually. One of the main challenges for the 50G PON is implementing burst-mode digital signal processing (BM-DSP) for the burst upstream signal. In this paper, we demonstrate a real-time BM-DSP for burst reception of 25Gbit/s on-off keying signal to meet the asymmetric-mode 50G PON demand. The real-time BM-DSP includes the BM frequency-domain timing recovery and BM frequency-domain equalizer, which can be fast converged based on the 42ns designed preamble. Meanwhile, the simplified implementations for fast-Fourier-transform, minimum-mean-square-error, and decision-directed least-mean-square-error algorithms decrease the DSP resources by 28.57%, enabling the loading of real-time BM-DSP in the field programmable gate array with the limited DSP resources. The real-time implementation of BM-DSP can guide the design of application-specific integrated circuits for 50G PON. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: This manuscript has been submitted to Journal of Optical Communications and Networking

arXiv:2503.02239 [pdf, other]

V2X-LLM: Enhancing V2X Integration and Understanding in Connected Vehicle Corridors

Authors: Keshu Wu, Pei Li, Yang Zhou, Rui Gan, Junwei You, Yang Cheng, Jingwen Zhu, Steven T. Parker, Bin Ran, David A. Noyce, Zhengzhong Tu

Abstract: The advancement of Connected and Automated Vehicles (CAVs) and Vehicle-to-Everything (V2X) offers significant potential for enhancing transportation safety, mobility, and sustainability. However, the integration and analysis of the diverse and voluminous V2X data, including Basic Safety Messages (BSMs) and Signal Phase and Timing (SPaT) data, present substantial challenges, especially on Connected… ▽ More The advancement of Connected and Automated Vehicles (CAVs) and Vehicle-to-Everything (V2X) offers significant potential for enhancing transportation safety, mobility, and sustainability. However, the integration and analysis of the diverse and voluminous V2X data, including Basic Safety Messages (BSMs) and Signal Phase and Timing (SPaT) data, present substantial challenges, especially on Connected Vehicle Corridors. These challenges include managing large data volumes, ensuring real-time data integration, and understanding complex traffic scenarios. Although these projects have developed an advanced CAV data pipeline that enables real-time communication between vehicles, infrastructure, and other road users for managing connected vehicle and roadside unit (RSU) data, significant hurdles in data comprehension and real-time scenario analysis and reasoning persist. To address these issues, we introduce the V2X-LLM framework, a novel enhancement to the existing CV data pipeline. V2X-LLM leverages Large Language Models (LLMs) to improve the understanding and real-time analysis of V2X data. The framework includes four key tasks: Scenario Explanation, offering detailed narratives of traffic conditions; V2X Data Description, detailing vehicle and infrastructure statuses; State Prediction, forecasting future traffic states; and Navigation Advisory, providing optimized routing instructions. By integrating LLM-driven reasoning with V2X data within the data pipeline, the V2X-LLM framework offers real-time feedback and decision support for traffic management. This integration enhances the accuracy of traffic analysis, safety, and traffic optimization. Demonstrations in a real-world urban corridor highlight the framework's potential to advance intelligent transportation systems. △ Less

Submitted 3 March, 2025; originally announced March 2025.

arXiv:2503.01535 [pdf]

doi 10.1103/PhysRevLett.134.086202

Flat bands and temperature-driven phase transition in quasi-one-dimensional zigzag chains

Authors: Jisong Gao, Haijun Cao, Xuegao Hu, Hui Zhou, Zhihao Cai, Qiaoxiao Zhao, Dong Li, Zhicheng Gao, Shin-ichiro Ideta, Kenya Shimada, Peng Cheng, Lan Chen, Kehui Wu, Sheng Meng, Baojie Feng

Abstract: Flat-band materials have garnered extensive attention due to their captivating properties associated with strong correlation effects. While flat bands have been discovered in several types of 2D materials, their existence in 1D systems remains elusive. Here, we propose a 1D frustrated lattice, specifically the 1D zigzag lattice, as a platform for hosting flat bands. This lattice can be experimenta… ▽ More Flat-band materials have garnered extensive attention due to their captivating properties associated with strong correlation effects. While flat bands have been discovered in several types of 2D materials, their existence in 1D systems remains elusive. Here, we propose a 1D frustrated lattice, specifically the 1D zigzag lattice, as a platform for hosting flat bands. This lattice can be experimentally realized by growing CuTe chains on Cu(111). The presence of flat bands was confirmed by tight-binding model analysis, first-principles calculations, and angle-resolved photoemission spectroscopy measurements. In addition, we discovered a temperature-driven phase transition at approximately 250 K. Detailed analyses demonstrate that the system has a Tomonaga-Luttinger liquid behavior, accompanied by spin-charge separation effects. Our work unveils new prospects for investigating strongly correlated electron behaviors and topological properties in the 1D limit. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Journal ref: Physical Review Letters 134, 086202 (2025)

arXiv:2502.20801 [pdf, other]

Quantifying Bias due to non-Gaussian Foregrounds in an Optimal Reconstruction of CMB Lensing and Temperature Power Spectra

Authors: M. Doohan, M. Millea, S. Raghunathan, F. Ge, L. Knox, K. Prabhu, C. L. Reichardt, W. L. K. Wu

Abstract: We estimate the magnitude of the bias due to non-Gaussian extragalactic foregrounds on the optimal reconstruction of the cosmic microwave background (CMB) lensing potential and temperature power spectra. The reconstruction is performed using a Bayesian inference method known as the marginal unbiased score expansion (MUSE). We apply MUSE to a minimum variance combination of multifrequency maps draw… ▽ More We estimate the magnitude of the bias due to non-Gaussian extragalactic foregrounds on the optimal reconstruction of the cosmic microwave background (CMB) lensing potential and temperature power spectra. The reconstruction is performed using a Bayesian inference method known as the marginal unbiased score expansion (MUSE). We apply MUSE to a minimum variance combination of multifrequency maps drawn from the Agora publicly available simulations of the lensed CMB and correlated extragalactic foreground emission. Taking noise levels appropriate to two years of data with the SPT-3G instrument on the South Pole Telescope, we find no statistically significant bias in the MUSE reconstruction when limited to angular multipoles $\ell \leq 3000$. We find a 4.7$σ$ bias in the recovered lensing potential power spectrum when smaller scale modes ($\ell \leq 3500$) are included. This work is a first step toward understanding the impact of extragalactic foregrounds on optimal reconstructions of CMB temperature and lensing potential power spectra. △ Less

Submitted 2 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

Comments: Submitted to JCAP

arXiv:2502.20768 [pdf, ps, other]

Convex inequalities in Hilbert $C^*$-modules

Authors: Kangjian Wu, Jia Li, Qingxiang Xu

Abstract: The H$\ddot{\rm o}$lder-McCarty inequalities are originally derived in the Hilbert space case and have been generalized via a convex inequality. The main purpose of this paper is to extend this convex inequality to the Hilbert $C^*$-module case, and meanwhile to make some investigations on the H$\ddot{\rm o}$lder-McCarty inequalities in the Hilbert $C^*$-module case. The H$\ddot{\rm o}$lder-McCarty inequalities are originally derived in the Hilbert space case and have been generalized via a convex inequality. The main purpose of this paper is to extend this convex inequality to the Hilbert $C^*$-module case, and meanwhile to make some investigations on the H$\ddot{\rm o}$lder-McCarty inequalities in the Hilbert $C^*$-module case. △ Less

Submitted 28 February, 2025; originally announced February 2025.

arXiv:2502.20259 [pdf, ps, other]

Notes on the numerical radius for adjointable operators on Hilbert $C^*$-modules

Authors: J. Li, K. Wu, Q. Xu

Abstract: Given a Hilbert module $H$ over a $C^*$-algebra, let $\mathcal{L}(H)$ be the set of all adjointable operators on $H$. For each $T\in\mathcal{L}(H)$, its numerical radius is defined by $w(T)=\sup\big\{\|\langle Tx, x \rangle\|: x\in H, \|x\|=1\big\}$. It is proved that $w(T)=\|T\|$ whenever $T$ is normal. Examples are constructed to show that there exist Hilbert module $H$ over certain $C^*$-algebr… ▽ More Given a Hilbert module $H$ over a $C^*$-algebra, let $\mathcal{L}(H)$ be the set of all adjointable operators on $H$. For each $T\in\mathcal{L}(H)$, its numerical radius is defined by $w(T)=\sup\big\{\|\langle Tx, x \rangle\|: x\in H, \|x\|=1\big\}$. It is proved that $w(T)=\|T\|$ whenever $T$ is normal. Examples are constructed to show that there exist Hilbert module $H$ over certain $C^*$-algebra and $T_1,T_2\in \mathcal{L}(H)$ with $T_1^2=0$ such that $w(T_1)\ne \frac12 \|T_1\|$ and $\sup\limits_{θ\in [0,2π]}\|\mbox{Re}(e^{iθ}T_2)\|<w(T_2)$. In addition, a new characterization of the spatial numerical radius is given, and it is proved that $w\big(π(T)\big)\le w(T)$ for every faithful representation $(π, X)$ of $\mathcal{L}(H)$ and every $T\in\mathcal{L}(H)$. Some inequalities are derived based on the newly obtained results. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.19823 [pdf, other]

GraphSparseNet: a Novel Method for Large Scale Traffic Flow Prediction

Authors: Weiyang Kong, Kaiqi Wu, Sen Zhang, Yubao Liu

Abstract: Traffic flow forecasting is a critical spatio-temporal data mining task with wide-ranging applications in intelligent route planning and dynamic traffic management. Recent advancements in deep learning, particularly through Graph Neural Networks (GNNs), have significantly enhanced the accuracy of these forecasts by capturing complex spatio-temporal dynamics. However, the scalability of GNNs remain… ▽ More Traffic flow forecasting is a critical spatio-temporal data mining task with wide-ranging applications in intelligent route planning and dynamic traffic management. Recent advancements in deep learning, particularly through Graph Neural Networks (GNNs), have significantly enhanced the accuracy of these forecasts by capturing complex spatio-temporal dynamics. However, the scalability of GNNs remains a challenge due to their exponential growth in model complexity with increasing nodes in the graph. Existing methods to address this issue, including sparsification, decomposition, and kernel-based approaches, either do not fully resolve the complexity issue or risk compromising predictive accuracy. This paper introduces GraphSparseNet (GSNet), a novel framework designed to improve both the scalability and accuracy of GNN-based traffic forecasting models. GraphSparseNet is comprised of two core modules: the Feature Extractor and the Relational Compressor. These modules operate with linear time and space complexity, thereby reducing the overall computational complexity of the model to a linear scale. Our extensive experiments on multiple real-world datasets demonstrate that GraphSparseNet not only significantly reduces training time by 3.51x compared to state-of-the-art linear models but also maintains high predictive performance. △ Less

Submitted 13 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

Comments: Accepted by VLDB 2025

arXiv:2502.17772 [pdf, other]

An Improved Privacy and Utility Analysis of Differentially Private SGD with Bounded Domain and Smooth Losses

Authors: Hao Liang, Wanrong Zhang, Xinlei He, Kaishun Wu, Hong Xing

Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to protect sensitive data during the training of machine learning models, but its privacy guarantees often come at the cost of model performance, largely due to the inherent challenge of accurately quantifying privacy loss. While recent efforts have strengthened privacy guarantees by focusing solely on the final output and b… ▽ More Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to protect sensitive data during the training of machine learning models, but its privacy guarantees often come at the cost of model performance, largely due to the inherent challenge of accurately quantifying privacy loss. While recent efforts have strengthened privacy guarantees by focusing solely on the final output and bounded domain cases, they still impose restrictive assumptions, such as convexity and other parameter limitations, and often lack a thorough analysis of utility. In this paper, we provide rigorous privacy and utility characterization for DPSGD for smooth loss functions in both bounded and unbounded domains. We track the privacy loss over multiple iterations by exploiting the noisy smooth-reduction property and establish the utility analysis by leveraging the projection's non-expansiveness and clipped SGD properties. In particular, we show that for DPSGD with a bounded domain, (i) the privacy loss can still converge without the convexity assumption, and (ii) a smaller bounded diameter can improve both privacy and utility simultaneously under certain conditions. Numerical results validate our results. △ Less

Submitted 28 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

Comments: 18 pages, 2 figures, submitted for possible publication

arXiv:2502.15466 [pdf, other]

Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation

Authors: Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang, Jing Liu

Abstract: Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as data scarcity and data imbalance continue to hinder their development. To address this, we consider modeling complex systems through symbolic expressions that serve as semantic descriptors of time series. Building on this concept, we introduce a series-symbol (S2) dual-modulity data g… ▽ More Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as data scarcity and data imbalance continue to hinder their development. To address this, we consider modeling complex systems through symbolic expressions that serve as semantic descriptors of time series. Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations. Leveraging the S2 dataset, we develop SymTime, a pre-trained foundation model for TSA. SymTime demonstrates competitive performance across five major TSA tasks when fine-tuned with downstream task, rivaling foundation models pre-trained on real-world datasets. This approach underscores the potential of dual-modality data generation and pretraining mechanisms in overcoming data scarcity and enhancing task performance. △ Less

Submitted 21 February, 2025; originally announced February 2025.

arXiv:2502.14907 [pdf, other]

GneissWeb: Preparing High Quality Data for LLMs at Scale

Authors: Hajar Emami Gohari, Swanand Ravindra Kadhe, Syed Yousaf Shah. Constantin Adam, Abdulhamid Adebayo, Praneet Adusumilli, Farhan Ahmed, Nathalie Baracaldo Angel, Santosh Borse, Yuan-Chi Chang, Xuan-Hong Dang, Nirmit Desai, Ravital Eres, Ran Iwamoto, Alexei Karve, Yan Koyfman, Wei-Han Lee, Changchang Liu, Boris Lublinsky, Takuyo Ohko, Pablo Pesce, Maroun Touma, Shiqiang Wang, Shalisha Witherspoon, Herbert Woisetschlager, David Wood , et al. (6 additional authors not shown)

Abstract: Data quantity and quality play a vital role in determining the performance of Large Language Models (LLMs). High-quality data, in particular, can significantly boost the LLM's ability to generalize on a wide range of downstream tasks. Large pre-training datasets for leading LLMs remain inaccessible to the public, whereas many open datasets are small in size (less than 5 trillion tokens), limiting… ▽ More Data quantity and quality play a vital role in determining the performance of Large Language Models (LLMs). High-quality data, in particular, can significantly boost the LLM's ability to generalize on a wide range of downstream tasks. Large pre-training datasets for leading LLMs remain inaccessible to the public, whereas many open datasets are small in size (less than 5 trillion tokens), limiting their suitability for training large models. In this paper, we introduce GneissWeb, a large dataset yielding around 10 trillion tokens that caters to the data quality and quantity requirements of training LLMs. Our GneissWeb recipe that produced the dataset consists of sharded exact sub-string deduplication and a judiciously constructed ensemble of quality filters. GneissWeb achieves a favorable trade-off between data quality and quantity, producing models that outperform models trained on state-of-the-art open large datasets (5+ trillion tokens). We show that models trained using GneissWeb dataset outperform those trained on FineWeb-V1.1.0 by 2.73 percentage points in terms of average score computed on a set of 11 commonly used benchmarks (both zero-shot and few-shot) for pre-training dataset evaluation. When the evaluation set is extended to 20 benchmarks (both zero-shot and few-shot), models trained using GneissWeb still achieve a 1.75 percentage points advantage over those trained on FineWeb-V1.1.0. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.14162 [pdf, other]

Untangling New Physics in Single Resonant Top Quarks

Authors: Krish Wu, Brandon Sun, Nitish Polishetty, Justin Kline, Max Fieg, Daniel Whiteson

Abstract: Collisions of particles at the energy frontier can reveal new particles and forces via localized excesses. However, the initial observation may be consistent with a large variety of theoretical models, especially in sectors with new top quark partners, which feature a rich set of possible underlying interactions. We explore the power of the LHC dataset to distinguish between models of the singly p… ▽ More Collisions of particles at the energy frontier can reveal new particles and forces via localized excesses. However, the initial observation may be consistent with a large variety of theoretical models, especially in sectors with new top quark partners, which feature a rich set of possible underlying interactions. We explore the power of the LHC dataset to distinguish between models of the singly produced heavy top-like quark which interacts with the Standard Model through an electromagnetic form factor. We study the heavy top decay to a top quark and a virtual photon which produces a pair of fermions, propose a technique to disentangle the models, and calculate the expected statistical significance to distinguish between various hypotheses. △ Less

Submitted 19 February, 2025; originally announced February 2025.

arXiv:2502.12671 [pdf, other]

Baichuan-M1: Pushing the Medical Capability of Large Language Models

Authors: Bingning Wang, Haizhou Zhao, Huozhi Zhou, Liang Song, Mingyu Xu, Wei Cheng, Xiangrong Zeng, Yupeng Zhang, Yuqi Huo, Zecheng Wang, Zhengyun Zhao, Da Pan, Fei Kou, Fei Li, Fuzhong Chen, Guosheng Dong, Han Liu, Hongda Zhang, Jin He, Jinjie Yang, Kangxi Wu, Kegeng Wu, Lei Su, Linlin Niu, Linzhuang Sun , et al. (17 additional authors not shown)

Abstract: The current generation of large language models (LLMs) is typically designed for broad, general-purpose applications, while domain-specific LLMs, especially in vertical fields like medicine, remain relatively scarce. In particular, the development of highly efficient and practical LLMs for the medical domain is challenging due to the complexity of medical knowledge and the limited availability of… ▽ More The current generation of large language models (LLMs) is typically designed for broad, general-purpose applications, while domain-specific LLMs, especially in vertical fields like medicine, remain relatively scarce. In particular, the development of highly efficient and practical LLMs for the medical domain is challenging due to the complexity of medical knowledge and the limited availability of high-quality data. To bridge this gap, we introduce Baichuan-M1, a series of large language models specifically optimized for medical applications. Unlike traditional approaches that simply continue pretraining on existing models or apply post-training to a general base model, Baichuan-M1 is trained from scratch with a dedicated focus on enhancing medical capabilities. Our model is trained on 20 trillion tokens and incorporates a range of effective training methods that strike a balance between general capabilities and medical expertise. As a result, Baichuan-M1 not only performs strongly across general domains such as mathematics and coding but also excels in specialized medical fields. We have open-sourced Baichuan-M1-14B, a mini version of our model, which can be accessed through the following links. △ Less

Submitted 5 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

Comments: 33 pages, technical report

arXiv:2502.10785 [pdf, other]

REGNav: Room Expert Guided Image-Goal Navigation

Authors: Pengna Li, Kangyi Wu, Jingwen Fu, Sanping Zhou

Abstract: Image-goal navigation aims to steer an agent towards the goal location specified by an image. Most prior methods tackle this task by learning a navigation policy, which extracts visual features of goal and observation images, compares their similarity and predicts actions. However, if the agent is in a different room from the goal image, it's extremely challenging to identify their similarity and… ▽ More Image-goal navigation aims to steer an agent towards the goal location specified by an image. Most prior methods tackle this task by learning a navigation policy, which extracts visual features of goal and observation images, compares their similarity and predicts actions. However, if the agent is in a different room from the goal image, it's extremely challenging to identify their similarity and infer the likely goal location, which may result in the agent wandering around. Intuitively, when humans carry out this task, they may roughly compare the current observation with the goal image, having an approximate concept of whether they are in the same room before executing the actions. Inspired by this intuition, we try to imitate human behaviour and propose a Room Expert Guided Image-Goal Navigation model (REGNav) to equip the agent with the ability to analyze whether goal and observation images are taken in the same room. Specifically, we first pre-train a room expert with an unsupervised learning technique on the self-collected unlabelled room images. The expert can extract the hidden room style information of goal and observation images and predict their relationship about whether they belong to the same room. In addition, two different fusion approaches are explored to efficiently guide the agent navigation with the room relation knowledge. Extensive experiments show that our REGNav surpasses prior state-of-the-art works on three popular benchmarks. △ Less

Submitted 15 February, 2025; originally announced February 2025.

Comments: Accepted by AAAI 2025 Oral

arXiv:2502.10250 [pdf, other]

VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models

Authors: Gokul Karthik Kumar, Iheb Chaabane, Kebin Wu

Abstract: Vision-language models (VLMs) excel in various visual benchmarks but are often constrained by the lack of high-quality visual fine-tuning data. To address this challenge, we introduce VisCon-100K, a novel dataset derived from interleaved image-text web documents. Our approach transforms 45K web documents from the OBELICS dataset into 100K image conversation samples. We utilize GPT-4V to generate i… ▽ More Vision-language models (VLMs) excel in various visual benchmarks but are often constrained by the lack of high-quality visual fine-tuning data. To address this challenge, we introduce VisCon-100K, a novel dataset derived from interleaved image-text web documents. Our approach transforms 45K web documents from the OBELICS dataset into 100K image conversation samples. We utilize GPT-4V to generate image-contextual captions and OpenChat 3.5 model to convert these captions into diverse free-form and multiple-choice question-answer pairs. Integrating this dataset for fine-tuning considerably enhances VLM performance across multiple benchmarks. Unlike methods that focus solely on fine-grained visual content, our approach leverages accompanying web context, yielding superior results. We also discover that a 'leaky modality mix', where conversation samples contain questions answerable from both the image and its contextual caption, outperforms non-leaky combinations of captions and Q&A pairs. VisCon-100k dataset shows strong performance with two popular VLM approaches: text-only large language model (LLM) aligned with a vision encoder using image captions data (ShareGPT4V-7b) and multimodally pretrained LLM (IDEFICS2-8b) using interleaved image-text data. In addition to releasing the VisCon-100K dataset, we provide a contextual captioner trained on this dataset, facilitating scalable fine-tuning data generation for future research and open-source applications. Using the same pipeline, but substituting our trained contextual captioner for GPT-4V, we also release the larger VisCon-1M dataset. △ Less

Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

Comments: Accepted at PAKDD 2025

arXiv:2502.10040 [pdf, other]

Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation

Authors: Shichao Fan, Quantao Yang, Yajie Liu, Kun Wu, Zhengping Che, Qingjie Liu, Min Wan

Abstract: Recently, Vision-Language-Action models (VLA) have advanced robot imitation learning, but high data collection costs and limited demonstrations hinder generalization and current imitation learning methods struggle in out-of-distribution scenarios, especially for long-horizon tasks. A key challenge is how to mitigate compounding errors in imitation learning, which lead to cascading failures over ex… ▽ More Recently, Vision-Language-Action models (VLA) have advanced robot imitation learning, but high data collection costs and limited demonstrations hinder generalization and current imitation learning methods struggle in out-of-distribution scenarios, especially for long-horizon tasks. A key challenge is how to mitigate compounding errors in imitation learning, which lead to cascading failures over extended trajectories. To address these challenges, we propose the Diffusion Trajectory-guided Policy (DTP) framework, which generates 2D trajectories through a diffusion model to guide policy learning for long-horizon tasks. By leveraging task-relevant trajectories, DTP provides trajectory-level guidance to reduce error accumulation. Our two-stage approach first trains a generative vision-language model to create diffusion-based trajectories, then refines the imitation policy using them. Experiments on the CALVIN benchmark show that DTP outperforms state-of-the-art baselines by 25% in success rate, starting from scratch without external pretraining. Moreover, DTP significantly improves real-world robot performance. △ Less

Submitted 14 February, 2025; originally announced February 2025.

arXiv:2502.09657 [pdf]

Integrating Spatiotemporal Vision Transformer into Digital Twins for High-Resolution Heat Stress Forecasting in Campus Environments

Authors: Wenjing Gong, Xinyue Ye, Keshu Wu, Suphanut Jamonnak, Wenyu Zhang, Yifan Yang, Xiao Huang

Abstract: Extreme heat events exacerbated by climate change pose significant challenges to urban resilience and planning. This study introduces a climate-responsive digital twin framework integrating the Spatiotemporal Vision Transformer (ST-ViT) model to enhance heat stress forecasting and decision-making. Using a Texas campus as a testbed, we synthesized high-resolution physical model simulations with spa… ▽ More Extreme heat events exacerbated by climate change pose significant challenges to urban resilience and planning. This study introduces a climate-responsive digital twin framework integrating the Spatiotemporal Vision Transformer (ST-ViT) model to enhance heat stress forecasting and decision-making. Using a Texas campus as a testbed, we synthesized high-resolution physical model simulations with spatial and meteorological data to develop fine-scale human thermal predictions. The ST-ViT-powered digital twin enables efficient, data-driven insights for planners, policymakers, and campus stakeholders, supporting targeted heat mitigation strategies and advancing climate-adaptive urban design. △ Less

Submitted 12 February, 2025; originally announced February 2025.

arXiv:2502.09212 [pdf, other]

doi 10.4204/EPTCS.416.5

LP-LM: No Hallucinations in Question Answering with Logic Programming

Authors: Katherine Wu, Yanhong A. Liu

Abstract: Large language models (LLMs) are able to generate human-like responses to user queries. However, LLMs exhibit inherent limitations, especially because they hallucinate. This paper introduces LP-LM, a system that grounds answers to questions in known facts contained in a knowledge base (KB), facilitated through semantic parsing in Prolog, and always produces answers that are reliable. LP-LM gener… ▽ More Large language models (LLMs) are able to generate human-like responses to user queries. However, LLMs exhibit inherent limitations, especially because they hallucinate. This paper introduces LP-LM, a system that grounds answers to questions in known facts contained in a knowledge base (KB), facilitated through semantic parsing in Prolog, and always produces answers that are reliable. LP-LM generates a most probable constituency parse tree along with a corresponding Prolog term for an input question via Prolog definite clause grammar (DCG) parsing. The term is then executed against a KB of natural language sentences also represented as Prolog terms for question answering. By leveraging DCG and tabling, LP-LM runs in linear time in the size of input sentences for sufficiently many grammar rules. Performing experiments comparing LP-LM with current well-known LLMs in accuracy, we show that LLMs hallucinate on even simple questions, unlike LP-LM. △ Less

Submitted 13 February, 2025; originally announced February 2025.

Comments: In Proceedings ICLP 2024, arXiv:2502.08453

Journal ref: EPTCS 416, 2025, pp. 69-77

arXiv:2502.07527 [pdf, ps, other]

Nature Language Model: Deciphering the Language of Nature for Scientific Discovery

Authors: Yingce Xia, Peiran Jin, Shufang Xie, Liang He, Chuan Cao, Renqian Luo, Guoqing Liu, Yue Wang, Zequn Liu, Yuan-Jyue Chen, Zekun Guo, Yeqi Bai, Pan Deng, Yaosen Min, Ziheng Lu, Hongxia Hao, Han Yang, Jielan Li, Chang Liu, Jia Zhang, Jianwei Zhu, Ran Bi, Kehan Wu, Wei Zhang, Kaiyuan Gao , et al. (21 additional authors not shown)

Abstract: Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, RNA and even cells. However, these models… ▽ More Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, RNA and even cells. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) top performance across different domains, matching or surpassing state-of-the-art specialist models. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases. △ Less

Submitted 20 June, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

Comments: 95 pages

arXiv:2502.04300 [pdf, other]

CMB-S4: Foreground-Cleaning Pipeline Comparison for Measuring Primordial Gravitational Waves

Authors: Federico Bianchini, Dominic Beck, W. L. Kimmy Wu, Zeeshan Ahmed, Sebastian Belkner, Julien Carron, Brandon S. Hensley, Clement L. Pryke, Caterina Umilta

Abstract: We compare multiple foreground-cleaning pipelines for estimating the tensor-to-scalar ratio, $r$, using simulated maps of the planned CMB-S4 experiment within the context of the South Pole Deep Patch. To evaluate robustness, we analyze bias and uncertainty on $r$ across various foreground suites using map-based simulations. The foreground-cleaning methods include: a parametric maximum likelihood a… ▽ More We compare multiple foreground-cleaning pipelines for estimating the tensor-to-scalar ratio, $r$, using simulated maps of the planned CMB-S4 experiment within the context of the South Pole Deep Patch. To evaluate robustness, we analyze bias and uncertainty on $r$ across various foreground suites using map-based simulations. The foreground-cleaning methods include: a parametric maximum likelihood approach applied to auto- and cross-power spectra between frequency maps; a map-based parametric maximum-likelihood method; and a harmonic-space internal linear combination using frequency maps. We summarize the conceptual basis of each method to highlight their similarities and differences. To better probe the impact of foreground residuals, we implement an iterative internal delensing step, leveraging a map-based pipeline to generate a lensing $B$-mode template from the Large Aperture Telescope frequency maps. Our results show that the performance of the three approaches is comparable for simple and intermediate-complexity foregrounds, with $σ(r)$ ranging from 3 to 5 $\times 10^{-4}$. However, biases at the $1-2σ$ level appear when analyzing more complex forms of foreground emission. By extending the baseline pipelines to marginalize over foreground residuals, we demonstrate that contamination can be reduced to within statistical uncertainties, albeit with a pipeline-dependent impact on $σ(r)$, which translates to a detection significance between 2 and 4$σ$ for an input value of $r = 0.003$. These findings suggest varying levels of maturity among the tested pipelines, with the auto- and cross-spectra-based approach demonstrating the best stability and overall performance. Moreover, given the extremely low noise levels, mutual validation of independent foreground-cleaning pipelines is essential to ensure the robustness of any potential detection. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: 25 pages, 14 figures

arXiv:2502.03763 [pdf, other]

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration

Authors: Endri Taka, Ning-Chi Huang, Chi-Chih Chang, Kai-Chiang Wu, Aman Arora, Diana Marculescu

Abstract: FPGA architectures have recently been enhanced to meet the substantial computational demands of modern deep neural networks (DNNs). To this end, both FPGA vendors and academic researchers have proposed in-fabric blocks that perform efficient tensor computations. However, these blocks are primarily optimized for dense computation, while most DNNs exhibit sparsity. To address this limitation, we pro… ▽ More FPGA architectures have recently been enhanced to meet the substantial computational demands of modern deep neural networks (DNNs). To this end, both FPGA vendors and academic researchers have proposed in-fabric blocks that perform efficient tensor computations. However, these blocks are primarily optimized for dense computation, while most DNNs exhibit sparsity. To address this limitation, we propose incorporating structured sparsity support into FPGA architectures. We architect 2D systolic in-fabric blocks, named systolic sparse tensor (SST) slices, that support multiple degrees of sparsity to efficiently accelerate a wide variety of DNNs. SSTs support dense operation, 2:4 (50%) and 1:4 (75%) sparsity, as well as a new 1:3 (66.7%) sparsity level to further increase flexibility. When demonstrating on general matrix multiplication (GEMM) accelerators, which are the heart of most current DNN accelerators, our sparse SST-based designs attain up to 5x higher FPGA frequency and 10.9x lower area, compared to traditional FPGAs. Moreover, evaluation of the proposed SSTs on state-of-the-art sparse ViT and CNN models exhibits up to 3.52x speedup with minimal area increase of up to 13.3%, compared to dense in-fabric acceleration. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: Accepted as full paper at FPGA 2025

arXiv:2502.00477 [pdf]

An Inorganic Liquid Crystalline Dispersion with 2D Ferroelectric Moieties

Authors: Ziyang Huang, Zehao Zhang, Rongjie Zhang, Baofu Ding, Liu Yang, Keyou Wu, Youan Xu, Gaokuo Zhong, Chuanlai Ren, Jiarong Liu, Yugan Hao, Menghao Wu, Teng Ma, Bilu Liu

Abstract: Electro-optical effect based liquid crystal devices have been extensively used in optical modulation techniques, in which the Kerr coefficient reflects the sensitivity of the liquid crystals and determines the strength of the device operational electric field. The Peterlin-Stuart theory and the O'Konski model jointly indicate that a giant Kerr coefficient could be obtained in a material with both… ▽ More Electro-optical effect based liquid crystal devices have been extensively used in optical modulation techniques, in which the Kerr coefficient reflects the sensitivity of the liquid crystals and determines the strength of the device operational electric field. The Peterlin-Stuart theory and the O'Konski model jointly indicate that a giant Kerr coefficient could be obtained in a material with both a large geometrical anisotropy and an intrinsic polarization, but such a material is not yet reported. Here we reveal a ferroelectric effect in a monolayer two-dimensional mineral vermiculite. A large geometrical anisotropy factor and a large inherent electric dipole together raise the record value of Kerr coefficient by an order of magnitude, till $3.0\times 10^{-4}$ m V$^{-2}$. This finding enables an ultra-low operational electric field of $10^2$-$10^4$ V m$^{-1}$ and the fabrication of electro-optical devices with an inch-level electrode separation, which is not practical previously. Because of its high ultraviolet stability (decay <1% under ultraviolet exposure of 1000 hours), large-scale, and energy-efficiency, prototypical displayable billboards have been fabricated for outdoor interactive scenes. The work provides new insights for both liquid crystal optics and two-dimensional ferroelectrics. △ Less

Submitted 1 February, 2025; originally announced February 2025.

Comments: 26 pages, 3 figures. Published in National Science Review 2024, 11 (5), nwae108

Journal ref: National Science Review, 2024

arXiv:2501.19160 [pdf, other]

RMDM: Radio Map Diffusion Model with Physics Informed

Authors: Haozhe Jia, Wenshuo Chen, Zhihui Huang, Hongru Xiao, Nanqian Jia, Keming Wu, Songning Lai, Yutao Yue

Abstract: With the rapid development of wireless communication technology, the efficient utilization of spectrum resources, optimization of communication quality, and intelligent communication have become critical. Radio map reconstruction is essential for enabling advanced applications, yet challenges such as complex signal propagation and sparse data hinder accurate reconstruction. To address these issues… ▽ More With the rapid development of wireless communication technology, the efficient utilization of spectrum resources, optimization of communication quality, and intelligent communication have become critical. Radio map reconstruction is essential for enabling advanced applications, yet challenges such as complex signal propagation and sparse data hinder accurate reconstruction. To address these issues, we propose the **Radio Map Diffusion Model (RMDM)**, a physics-informed framework that integrates **Physics-Informed Neural Networks (PINNs)** to incorporate constraints like the **Helmholtz equation**. RMDM employs a dual U-Net architecture: the first ensures physical consistency by minimizing PDE residuals, boundary conditions, and source constraints, while the second refines predictions via diffusion-based denoising. By leveraging physical laws, RMDM significantly enhances accuracy, robustness, and generalization. Experiments demonstrate that RMDM outperforms state-of-the-art methods, achieving **NMSE of 0.0031** and **RMSE of 0.0125** under the Static RM (SRM) setting, and **NMSE of 0.0047** and **RMSE of 0.0146** under the Dynamic RM (DRM) setting. These results establish a novel paradigm for integrating physics-informed and data-driven approaches in radio map reconstruction, particularly under sparse data conditions. △ Less

Submitted 19 March, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

arXiv:2501.18232 [pdf, other]

Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Authors: Wenshuo Chen, Haozhe Jia, Songning Lai, Keming Wu, Hongru Xiao, Lijie Hu, Yutao Yue

Abstract: Rapid progress in text-to-motion generation has been largely driven by diffusion models. However, existing methods focus solely on temporal modeling, thereby overlooking frequency-domain analysis. We identify two key phases in motion denoising: the **semantic planning stage** and the **fine-grained improving stage**. To address these phases effectively, we propose **Fre**quency **e**nhanced **t**e… ▽ More Rapid progress in text-to-motion generation has been largely driven by diffusion models. However, existing methods focus solely on temporal modeling, thereby overlooking frequency-domain analysis. We identify two key phases in motion denoising: the **semantic planning stage** and the **fine-grained improving stage**. To address these phases effectively, we propose **Fre**quency **e**nhanced **t**ext-**to**-**m**otion diffusion model (**Free-T2M**), incorporating stage-specific consistency losses that enhance the robustness of static features and improve fine-grained accuracy. Extensive experiments demonstrate the effectiveness of our method. Specifically, on StableMoFusion, our method reduces the FID from **0.189** to **0.051**, establishing a new SOTA performance within the diffusion architecture. These findings highlight the importance of incorporating frequency-domain insights into text-to-motion generation for more precise and robust results. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.15368 [pdf, other]

Baichuan-Omni-1.5 Technical Report

Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks. △ Less

Submitted 25 January, 2025; originally announced January 2025.

arXiv:2501.15187 [pdf, other]

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Authors: Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li

Abstract: Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks. However, existing methods often suffer from a gap between pre-training and fine-tuning, leading to suboptimal results. To address this, we propose Uni-Sign, a unified pre-training framework that eliminates the gap between pre-training and downstr… ▽ More Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks. However, existing methods often suffer from a gap between pre-training and fine-tuning, leading to suboptimal results. To address this, we propose Uni-Sign, a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks through a large-scale generative pre-training strategy and a novel fine-tuning paradigm. First, we introduce CSL-News, a large-scale Chinese Sign Language (CSL) dataset containing 1,985 hours of video paired with textual annotations, which enables effective large-scale pre-training. Second, Uni-Sign unifies SLU tasks by treating downstream tasks as a single sign language translation (SLT) task during fine-tuning, ensuring seamless knowledge transfer between pre-training and fine-tuning. Furthermore, we incorporate a prior-guided fusion (PGF) module and a score-aware sampling strategy to efficiently fuse pose and RGB information, addressing keypoint inaccuracies and improving computational efficiency. Extensive experiments across multiple SLU benchmarks demonstrate that Uni-Sign achieves state-of-the-art performance across multiple downstream SLU tasks. Dataset and code are available at github.com/ZechengLi19/Uni-Sign. △ Less

Submitted 13 March, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

Comments: Accepted by ICLR 2025

arXiv:2501.14449 [pdf, ps, other]

Classification of $\mathrm{GL}_{n}(\mathbb{C})$-Representations Distinguished by $\mathrm{GL}_n(\mathbb{R})$

Authors: Basudev Pattanayak, Kaidi Wu, Hongfeng Zhang

Abstract: This paper provides a complete classification of $\mathrm{GL}_n(\mathbb{R})$-distinguished irreducible representations of $\mathrm{GL}_n(\mathbb{C})$ when the representations are either generic or unitary. Additionally, for each such $\mathrm{GL}_n(\mathbb{R})$-distinguished representation, we explicitly construct the associated period and prove its non-vanishing on the distinguished minimal $K$-t… ▽ More This paper provides a complete classification of $\mathrm{GL}_n(\mathbb{R})$-distinguished irreducible representations of $\mathrm{GL}_n(\mathbb{C})$ when the representations are either generic or unitary. Additionally, for each such $\mathrm{GL}_n(\mathbb{R})$-distinguished representation, we explicitly construct the associated period and prove its non-vanishing on the distinguished minimal $K$-type. Furthermore, we offer some applications to the branching problem using theta correspondence. △ Less

Submitted 10 March, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: We add some new results and modify some proofs

MSC Class: 22E50; 11F70

arXiv:2501.11818 [pdf, other]

Group-Agent Reinforcement Learning with Heterogeneous Agents

Authors: Kaiyue Wu, Xiao-Jun Zeng, Tingting Mu

Abstract: Group-agent reinforcement learning (GARL) is a newly arising learning scenario, where multiple reinforcement learning agents study together in a group, sharing knowledge in an asynchronous fashion. The goal is to improve the learning performance of each individual agent. Under a more general heterogeneous setting where different agents learn using different algorithms, we advance GARL by designing… ▽ More Group-agent reinforcement learning (GARL) is a newly arising learning scenario, where multiple reinforcement learning agents study together in a group, sharing knowledge in an asynchronous fashion. The goal is to improve the learning performance of each individual agent. Under a more general heterogeneous setting where different agents learn using different algorithms, we advance GARL by designing novel and effective group-learning mechanisms. They guide the agents on whether and how to learn from action choices from the others, and allow the agents to adopt available policy and value function models sent by another agent if they perform better. We have conducted extensive experiments on a total of 43 different Atari 2600 games to demonstrate the superior performance of the proposed method. After the group learning, among the 129 agents examined, 96% are able to achieve a learning speed-up, and 72% are able to learn over 100 times faster. Also, around 41% of those agents have achieved a higher accumulated reward score by learning in less than 5% of the time steps required by a single agent when learning on its own. △ Less

Submitted 15 February, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

arXiv:2501.11196 [pdf, other]

Enhancing Brain Tumor Segmentation Using Channel Attention and Transfer learning

Authors: Majid Behzadpour, Ebrahim Azizi, Kai Wu, Bengie L. Ortiz

Abstract: Accurate and efficient segmentation of brain tumors is critical for diagnosis, treatment planning, and monitoring in clinical practice. In this study, we present an enhanced ResUNet architecture for automatic brain tumor segmentation, integrating an EfficientNetB0 encoder, a channel attention mechanism, and an Atrous Spatial Pyramid Pooling (ASPP) module. The EfficientNetB0 encoder leverages pre-t… ▽ More Accurate and efficient segmentation of brain tumors is critical for diagnosis, treatment planning, and monitoring in clinical practice. In this study, we present an enhanced ResUNet architecture for automatic brain tumor segmentation, integrating an EfficientNetB0 encoder, a channel attention mechanism, and an Atrous Spatial Pyramid Pooling (ASPP) module. The EfficientNetB0 encoder leverages pre-trained features to improve feature extraction efficiency, while the channel attention mechanism enhances the model's focus on tumor-relevant features. ASPP enables multiscale contextual learning, crucial for handling tumors of varying sizes and shapes. The proposed model was evaluated on two benchmark datasets: TCGA LGG and BraTS 2020. Experimental results demonstrate that our method consistently outperforms the baseline ResUNet and its EfficientNet variant, achieving Dice coefficients of 0.903 and 0.851 and HD95 scores of 9.43 and 3.54 for whole tumor and tumor core regions on the BraTS 2020 dataset, respectively. compared with state-of-the-art methods, our approach shows competitive performance, particularly in whole tumor and tumor core segmentation. These results indicate that combining a powerful encoder with attention mechanisms and ASPP can significantly enhance brain tumor segmentation performance. The proposed approach holds promise for further optimization and application in other medical image segmentation tasks. △ Less

Submitted 19 January, 2025; originally announced January 2025.

Comments: 13 pages, 1 figure

arXiv:2501.08377 [pdf, other]

Complete Hamiltonian Framework of Relativistic Hierarchical Triple Systems: Capabilities and Limitations of Secular Perturbation Theory

Authors: Kaye Jiale Li, Kinwah Wu, Ziri Younsi, Tjonnie G. F. Li

Abstract: Relativistic secular perturbation theory has ignited significant interest in uncovering intricate cross-term effects, especially the interplay between 1PN and quadrupole terms. While most existing studies rely on the Lagrangian planetary perturbation method for computing cross terms, a comprehensive Hamiltonian framework for the field has been missing. In this work, we introduce a framework based… ▽ More Relativistic secular perturbation theory has ignited significant interest in uncovering intricate cross-term effects, especially the interplay between 1PN and quadrupole terms. While most existing studies rely on the Lagrangian planetary perturbation method for computing cross terms, a comprehensive Hamiltonian framework for the field has been missing. In this work, we introduce a framework based on von Zeipel transformation, utilizing two sequential canonical transformations to systematically compute cross terms to arbitrary orders. Our results reveal secular cross terms up to quadrupole-squared order, showcasing remarkable consistency with both the Lagrangian method [1] and the effective-field-theory approach [2]. We present leading-order periodic cross terms arising from the interactions between 1PN and quadrupole, and present estimates of higher-order cross terms. It is demonstrated that this method not only accurately predicts the long-term evolution of hierarchical systems but also captures fast oscillations observed in N-body simulations. We identify and validate resonances caused by quadrupole-squared effects, highlighting both consistencies and discrepancies when compared to N-body simulations. These discrepancies underscore the importance of mean-motion resonances, a factor overlooked in current secular perturbation frameworks. Finally, we provide a comprehensive review of the subtleties and limitations inherent to secular perturbation theory, paving the way for future research and advancements in this field. △ Less

Submitted 14 January, 2025; originally announced January 2025.

Comments: 28 pages, 9 figures, comments welcome

arXiv:2501.08286 [pdf, other]

VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes

Authors: Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding

Abstract: VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping modu… ▽ More VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2D Gaussian map. Key components of the 2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS) capabilities of Gaussian Splatting for loop closure detection and correction of the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our framework can generate high-quality Gaussian maps in real time using only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge, VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in outdoor environments and supporting kilometer-scale large scenes. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.08166 [pdf, other]

Asymptotic-Preserving Neural Networks based on Even-odd Decomposition for Multiscale Gray Radiative Transfer Equations

Authors: Keke Wu, Xizhe Xie, Wengu Chen, Han Wang, Zheng Ma

Abstract: We present a novel Asymptotic-Preserving Neural Network (APNN) approach utilizing even-odd decomposition to tackle the nonlinear gray radiative transfer equations (GRTEs). Our AP loss demonstrates consistent stability concerning the small Knudsen number, ensuring the neural network solution uniformly converges to the macro solution. This APNN method alleviates the rigorous conservation requirement… ▽ More We present a novel Asymptotic-Preserving Neural Network (APNN) approach utilizing even-odd decomposition to tackle the nonlinear gray radiative transfer equations (GRTEs). Our AP loss demonstrates consistent stability concerning the small Knudsen number, ensuring the neural network solution uniformly converges to the macro solution. This APNN method alleviates the rigorous conservation requirements while simultaneously incorporating an auxiliary deep neural network, distinguishing it from the APNN method based on micro-macro decomposition for GRTE. Several numerical problems are examined to demonstrate the effectiveness of our proposed APNN technique. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.07490 [pdf, other]

Rationalisation of multiple square roots in Feynman integrals

Authors: Georgios Papathanasiou, Stefan Weinzierl, Konglong Wu, Yang Zhang

Abstract: Feynman integrals are very often computed from their differential equations. It is not uncommon that the $\varepsilon$-factorised differential equation contains only dlog-forms with algebraic arguments, where the algebraic part is given by (multiple) square roots. It is well-known that if all square roots are simultaneously rationalisable, the Feynman integrals can be expressed in terms of multipl… ▽ More Feynman integrals are very often computed from their differential equations. It is not uncommon that the $\varepsilon$-factorised differential equation contains only dlog-forms with algebraic arguments, where the algebraic part is given by (multiple) square roots. It is well-known that if all square roots are simultaneously rationalisable, the Feynman integrals can be expressed in terms of multiple polylogarithms. This is a sufficient, but not a necessary criterium. In this paper we investigate weaker requirements. We discuss under which conditions we may use different rationalisations in different parts of the calculation. In particular we show that we may use different rationalisations if they correspond to different parameterisations of the same integration path. We present a non-trivial example -- the one-loop pentagon function with three adjacent massive external legs involving seven square roots -- where this technique can be used to express the result in terms of multiple polylogarithms. △ Less

Submitted 2 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

Comments: 31 pages, v2: version to be published

arXiv:2501.06890 [pdf, other]

Measurements of the Temperature and E-mode Polarization of the Cosmic Microwave Background from the Full 500-square-degree SPTpol Dataset

Authors: T. -L. Chou, P. A. R. Ade, A. J. Anderson, J. E. Austermann, L. Balkenhol, J. A. Beall, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, J. E. Carlstrom, C. L. Chang, P. Chaubal, H. C. Chiang, R. Citron, C. Corbett Moran, T. M. Crawford, A. T. Crites, T. de Haan, M. A. Dobbs, D. Dutcher, W. Everett, J. Gallicchio, E. M. George, N. Gupta , et al. (37 additional authors not shown)

Abstract: Using the full four-year SPTpol 500 deg$^2$ dataset in both the 95 GHz and 150 GHz frequency bands, we present measurements of the temperature and $E$-mode polarization of the cosmic microwave background (CMB), as well as the $E$-mode polarization auto-power spectrum ($EE$) and temperature-$E$-mode cross-power spectrum ($TE$) in the angular multipole range $50<\ell<8000$. We find the SPTpol datase… ▽ More Using the full four-year SPTpol 500 deg$^2$ dataset in both the 95 GHz and 150 GHz frequency bands, we present measurements of the temperature and $E$-mode polarization of the cosmic microwave background (CMB), as well as the $E$-mode polarization auto-power spectrum ($EE$) and temperature-$E$-mode cross-power spectrum ($TE$) in the angular multipole range $50<\ell<8000$. We find the SPTpol dataset to be self-consistent, passing several internal consistency tests based on maps, frequency bands, bandpowers, and cosmological parameters. The full SPTpol dataset is well-fit by the $ΛCDM$ model, for which we find $H_0=70.48\pm2.16$ km s$^{-1}$ Mpc$^{-1}$ and $Ω_m=0.271\pm0.026$, when using only the SPTpol data and a Planck-based prior on the optical depth to reionization. The $ΛCDM$ parameter constraints are consistent across the 95 GHz-only, 150 GHz-only, $TE$-only, and $EE$-only data splits. Between the $\ell<1000$ and $\ell>1000$ data splits, the $ΛCDM$ parameter constraints are borderline consistent at the $\sim2σ$ level. This consistency improves when including a parameter $A_L$, the degree of lensing of the CMB inferred from the smearing of acoustic peaks. When marginalized over $A_L$, the $ΛCDM$ parameter constraints from SPTpol are consistent with those from Planck. The power spectra presented here are the most sensitive measurements of the lensed CMB damping tail to date for roughly $\ell > 1700$ in $TE$ and $\ell > 2000$ in $EE$. △ Less

Submitted 12 January, 2025; originally announced January 2025.

arXiv:2501.06419 [pdf, other]

Holographic Entanglement Entropy as a Probe of Dynamical Criticality in Scalarizing Black Holes

Authors: Yi Li, Ke-tai Wu, Chong-Ye Chen, Chao Niu, Cheng-Yong Zhang, Peng Liu

Abstract: We demonstrate that holographic entanglement entropy (HEE) serves as a powerful diagnostic tool for both static and dynamical critical phenomena in the Einstein-Born-Infeld-Scalar (EBIS) model. While HEE is well-known for capturing static phase transitions, we reveal its novel ability to probe dynamical criticality, particularly the ''flip'' phenomenon-a sign inversion in the scalar field at a cri… ▽ More We demonstrate that holographic entanglement entropy (HEE) serves as a powerful diagnostic tool for both static and dynamical critical phenomena in the Einstein-Born-Infeld-Scalar (EBIS) model. While HEE is well-known for capturing static phase transitions, we reveal its novel ability to probe dynamical criticality, particularly the ''flip'' phenomenon-a sign inversion in the scalar field at a critical point. Near the flip, HEE exhibits relaxation dynamics that closely mirror those of the scalar field, with both relaxation times scaling logarithmically with the distance from the critical point. This intimate connection between the relaxation of HEE and the scalar field highlights HEE as a sensitive probe of dynamical critical phenomena. Our findings provide new insights into the interplay between quantum information and gravitational dynamics, offering a deeper understanding of critical behavior in strongly coupled systems. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: 31 pages, 13 figures

arXiv:2501.04841 [pdf, other]

Blockchain-Based Secure Vehicle Auction System with Smart Contracts

Authors: Ka Wai Wu

Abstract: The problem of a single point of failure in centralized systems poses a great challenge to the stability of such systems. Meanwhile, the tamperability of data within centralized systems makes users reluctant to trust and use centralized applications in many scenarios, including the financial and business sectors. Blockchain, as a new decentralized technology, addresses these issues effectively.… ▽ More The problem of a single point of failure in centralized systems poses a great challenge to the stability of such systems. Meanwhile, the tamperability of data within centralized systems makes users reluctant to trust and use centralized applications in many scenarios, including the financial and business sectors. Blockchain, as a new decentralized technology, addresses these issues effectively. As a typical decentralized system, blockchain can be utilized to build a data-sharing model. Users in a blockchain do not need to trust other users; instead, they trust that the majority of miner nodes are honest. Smart contracts enable developers to write distributed programs based on blockchain systems, ensuring that all code is immutable and secure. In this paper, we analyze the security of blockchain technology to illustrate its advantages and justify its use. Furthermore, we design a new system for storing and trading vehicle information based on the Ethereum blockchain and smart contract technology. Specifically, our system allows users to upload vehicle information and auction vehicles to transfer ownership. Our application provides great convenience to buyers and owners, while the use of smart contracts enhances the security and privacy of the system. △ Less

Submitted 19 April, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

arXiv:2501.03574 [pdf, ps, other]

A Liouville theorem for supercritical Fujita equation and its applications

Authors: Kelei Wang, Juncheng Wei, Ke Wu

Abstract: We prove a Liouville theorem for ancient solutions to the supercritical Fujita equation \[\partial_tu-Δu=|u|^{p-1}u, \quad -\infty <t<0, \quad p>\frac{n+2}{n-2},\] which says if $u$ is close to the ODE solution $u_0(t):=(p-1)^{-\frac{1}{p-1}}(-t)^{-\frac{1}{p-1}}$ at large scales, then it is an ODE solution (i.e. it depends only on $t$). This implies a stability property for ODE blow ups in this p… ▽ More We prove a Liouville theorem for ancient solutions to the supercritical Fujita equation \[\partial_tu-Δu=|u|^{p-1}u, \quad -\infty <t<0, \quad p>\frac{n+2}{n-2},\] which says if $u$ is close to the ODE solution $u_0(t):=(p-1)^{-\frac{1}{p-1}}(-t)^{-\frac{1}{p-1}}$ at large scales, then it is an ODE solution (i.e. it depends only on $t$). This implies a stability property for ODE blow ups in this problem. As an application of these results, we show that for a suitable weak solution, its singular set at the end time can be decomposed into two parts: one part is relatively open and $(n-1)$-rectifiable, and it is characterized by the property that tangent functions at these points are the two constants $\pm(p-1)^{-\frac{1}{p-1}}$; the other part is relatively closed and its Hausdorff dimension is not larger than $n-\left[2\frac{p+1}{p-1}\right]-1$. △ Less

Submitted 7 January, 2025; originally announced January 2025.

MSC Class: 35K58; 35B44; 35B45

arXiv:2501.03141 [pdf, ps, other]

Foundations of Platform-Assisted Auctions

Authors: Hao Chung, Ke Wu, Elaine Shi

Abstract: Today, many auctions are carried out with the help of intermediary platforms like Google and eBay. We refer to such auctions as platform-assisted auctions.Traditionally, the auction theory literature mainly focuses on designing auctions that incentivize the buyers to bid truthfully,assuming that the platform always faithfully implements the auction. In practice, however, the platforms have been fo… ▽ More Today, many auctions are carried out with the help of intermediary platforms like Google and eBay. We refer to such auctions as platform-assisted auctions.Traditionally, the auction theory literature mainly focuses on designing auctions that incentivize the buyers to bid truthfully,assuming that the platform always faithfully implements the auction. In practice, however, the platforms have been found to manipulate the auctions to earn more profit, resulting in high-profile anti-trust lawsuits. We propose a new model for studying platform-assisted auctions in the permissionless setting. We explore whether it is possible to design a dream auction in thisnew model, such that honest behavior is the utility-maximizing strategy for each individual buyer, the platform, the seller, as well as platform-seller or platform-buyer coalitions.Through a collection of feasibility and infeasibility results,we carefully characterize the mathematical landscape of platform-assisted auctions. We show how cryptography can lend to the design of an efficient platform-assisted auction with dream properties. Although a line of works have also used MPC or the blockchain to remove the reliance on a trusted auctioneer, our work is distinct in nature in several dimensions.First, we initiate a systematic exploration of the game theoretic implications when the service providers are strategic and can collude with sellers or buyers. Second, we observe that the full simulation paradigm is too stringent and leads to high asymptotical costs. Specifically, because every player has a different private outcomein an auction protocol, running any generic MPC protocol among the players would incur at least $n^2$ total cost. We propose a new notion of simulation calledutility-dominated emulation.Under this new notion, we showhow to design efficient auction protocols with quasilinear efficiency. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: To be submitted

MSC Class: 94A60; 91A40 ACM Class: J.4.1; J.4.3

arXiv:2501.02914 [pdf]

Nonrelativistic spin-splitting multiferroic antiferromagnet and compensated ferrimagnet with zero net magnetization

Authors: Jianting Dong, Kun Wu, Meng Zhu, Fanxing Zheng, Xinlu Li, Jia Zhang

Abstract: Spin-splitting antiferromagnets with spin-polarized band structures in momentum space have garnered intensive research attention due to their zero net magnetic moments, ultras fast spin dynamics as conventional antiferromagnets, and spin-polarized transport properties akin to ferromagnets, making them promising candidates for antiferromagnetic spintronics. However, unlike spin-torque switching of… ▽ More Spin-splitting antiferromagnets with spin-polarized band structures in momentum space have garnered intensive research attention due to their zero net magnetic moments, ultras fast spin dynamics as conventional antiferromagnets, and spin-polarized transport properties akin to ferromagnets, making them promising candidates for antiferromagnetic spintronics. However, unlike spin-torque switching of ferromagnets by electric current, efficient electric control of spin-splitting antiferromagnetic order remains challenges. In this work, we identify prototypes of multiferroic spin-splitting antiferromagnets, including BiFeO3, Fe2Mo3O8 and compensated ferrimagnet GaFeO3 with ferroelectric polarization as well as spin-polarized electronic structures. We establish design principles for the spin-splitting multiferroic antiferromagnets and compensated ferrimagnets, elucidating the band symmetry features in Brillouin zone. We demonstrate that the spin polarization in spin-splitting magnets, despite of zero net magnetic moment, can be switched by ferroelectric polarization, providing an efficient means of controlling the antiferromagnetic order. Our work may inspire future development of novel multiferroic functional magnets with zero magnetic moments and pave the way for their applications in magnetoelectric spintronic devices. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2501.02497 [pdf, ps, other]

A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning

Authors: Yixin Ji, Juntao Li, Yang Xiang, Hai Ye, Kaixin Wu, Kai Yao, Jia Xu, Linjian Mo, Min Zhang

Abstract: The remarkable performance of the o1 model in complex reasoning demonstrates that test-time compute scaling can further unlock the model's potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time compute scaling. We trace the concept of test-time compute back to System-1 models. In System-1 models, test-time compute addresses distributio… ▽ More The remarkable performance of the o1 model in complex reasoning demonstrates that test-time compute scaling can further unlock the model's potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time compute scaling. We trace the concept of test-time compute back to System-1 models. In System-1 models, test-time compute addresses distribution shifts and improves robustness and generalization through parameter updating, input modification, representation editing, and output calibration. In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search. We organize this survey according to the trend of System-1 to System-2 thinking, highlighting the key role of test-time compute in the transition from System-1 models to weak System-2 models, and then to strong System-2 models. We also point out advanced topics and future directions. △ Less

Submitted 29 June, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

Comments: Work in progress

Showing 101–150 of 1,591 results for author: Wu, K