-
Introduction to the China Space Station Telescope (CSST)
Authors:
CSST Collaboration,
Yan Gong,
Haitao Miao,
Hu Zhan,
Zhao-Yu Li,
Jinyi Shangguan,
Haining Li,
Chao Liu,
Xuefei Chen,
Haibo Yuan,
Jilin Zhou,
Hui-Gen Liu,
Cong Yu,
Jianghui Ji,
Zhaoxiang Qi,
Jiacheng Liu,
Zigao Dai,
Xiaofeng Wang,
Zhenya Zheng,
Lei Hao,
Jiangpei Dou,
Yiping Ao,
Zhenhui Lin,
Kun Zhang,
Wei Wang
, et al. (88 additional authors not shown)
Abstract:
The China Space Station Telescope (CSST) is a next-generation Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instrum…
▽ More
The China Space Station Telescope (CSST) is a next-generation Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, the CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxy and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Maximal transitivity of the cactus group on standard Young tableaux
Authors:
Sophia Liao,
Leonid Rybnikov
Abstract:
The action of the cactus group $C_n$ on Young tableaux of a given shape $λ$ goes back to Berenstein and Kirillov and arises naturally in the study of crystal bases and quantum integrable systems. We show that this action is $2$-transitive on standard Young tableaux of the shape $λ$ if and only if $λ$ is not self-transpose and not a single hook. Moreover, we show that in these cases, the image of t…
▽ More
The action of the cactus group $C_n$ on Young tableaux of a given shape $λ$ goes back to Berenstein and Kirillov and arises naturally in the study of crystal bases and quantum integrable systems. We show that this action is $2$-transitive on standard Young tableaux of the shape $λ$ if and only if $λ$ is not self-transpose and not a single hook. Moreover, we show that in these cases, the image of the cactus group in the permutation group of standard Young tableaux is either the whole permutation group or the alternating group. As an application, this implies that the Galois group of solutions to the Bethe ansatz in the Gaudin model attached to the Lie group $GL_d$ is, in many cases, at least the alternating group. This also extends the results of Sottile and White on the multiple transitivity of the Galois group of Schubert calculus problems in Grassmannians to many new cases.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Symbolic Regression-Enhanced Dynamic Wake Meandering: Fast and Physically Consistent Wind-Turbine Wake Modeling
Authors:
Ding Wang,
Dachuan Feng,
Kangcheng Zhou,
Yuntian Chen,
Shijun Liao,
Shiyi Chen
Abstract:
Accurately modeling wind turbine wakes is essential for optimizing wind farm performance but remains a persistent challenge. While the dynamic wake meandering (DWM) model captures unsteady wake behavior, it suffers from near-wake inaccuracies due to empirical closures. We propose a Symbolic Regression-enhanced DWM (SRDWM) framework that achieves equation-level closure by embedding symbolic express…
▽ More
Accurately modeling wind turbine wakes is essential for optimizing wind farm performance but remains a persistent challenge. While the dynamic wake meandering (DWM) model captures unsteady wake behavior, it suffers from near-wake inaccuracies due to empirical closures. We propose a Symbolic Regression-enhanced DWM (SRDWM) framework that achieves equation-level closure by embedding symbolic expressions for volumetric forcing and boundary terms explicitly into governing equations. These physically consistent expressions are discovered from LES data using symbolic regression guided by a hierarchical, domain-informed decomposition strategy. A revised wake-added turbulence formulation is further introduced to enhance turbulence intensity predictions. Extensive validation across varying inflows shows that SRDWM accurately reproduces both mean wake characteristics and turbulent dynamics, achieving full spatiotemporal resolution with over three orders of magnitude speedup compared to LES. The results highlight symbolic regression as a bridge between data and physics, enabling interpretable and generalizable modeling.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
LSM-2: Learning from Incomplete Wearable Sensor Data
Authors:
Maxwell A. Xu,
Girish Narayanswamy,
Kumar Ayush,
Dimitris Spathis,
Shun Liao,
Shyam A. Tailor,
Ahmed Metwally,
A. Ali Heydari,
Yuwei Zhang,
Jake Garrison,
Samy Abdel-Ghaffar,
Xuhai Xu,
Ken Gu,
Jacob Sunshine,
Ming-Zher Poh,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Mark Malhotra,
Shwetak Patel,
Yuzhe Yang,
James M. Rehg,
Xin Liu,
Daniel McDuff
Abstract:
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-…
▽ More
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
A Be star-black hole binary with a wide orbit from LAMOST time-domain survey
Authors:
Qian-Yu An,
Yang Huang,
Wei-Min Gu,
Yong Shao,
Zhi-Xiang Zhang,
Tuan Yi,
B. D. Lailey,
T. A. A. Sigut,
Kyle Akira Rocha,
Meng Sun,
Seth Gossage,
Shi-Jie Gao,
Shan-Shan Weng,
Song Wang,
Bowen Zhang,
Xinlin Zhao,
Senyu Qi,
Shilong Liao,
Jianghui Ji,
Junfeng Wang,
Jianfeng Wu,
Mouyuan Sun,
Xiang-Dong Li,
Jifeng Liu
Abstract:
Binary systems consisting of an early type star and a black hole (BH) are crucial for understanding various astrophysical phenomena, particularly the origins of detected gravitational wave sources. Be binary systems are expected to represent a key evolutionary stage in hosting BHs. However, while hundreds of Be X-ray binaries are known, the only confirmed BH candidate in a Be binary remains highly…
▽ More
Binary systems consisting of an early type star and a black hole (BH) are crucial for understanding various astrophysical phenomena, particularly the origins of detected gravitational wave sources. Be binary systems are expected to represent a key evolutionary stage in hosting BHs. However, while hundreds of Be X-ray binaries are known, the only confirmed BH candidate in a Be binary remains highly controversial. We report the discovery of ALS 8814, a Be star-BH binary with a moderately eccentric ($e = 0.23$) and wide orbit ($P = 176.6$ days), revealed by the radial velocity (RV) measurement of the visible Be star. Our analysis, combining flux-calibrated spectra in the Balmer discontinuity region and spectral template matching, yields a mass of $11.2^{+1.4}_{-1.2}$ $M_\odot$ for the Be star. The minimum mass of the unseen companion, assuming an edge-on inclination ($i = 90^{\circ}$), is $9.8\pm 0.7\,M_\odot$. We rule out the presence of non-degenerate companions in ALS 8814, indicating that it can only be a BH. This discovery represents a robust case of a Be-BH binary, identified purely through precise RV measurements from a single set of lines. The extremely low peculiar velocity of ALS 8814 suggests that the BH is formed via a direct core-collapse with a negligible natal kick, implying an almost perfect alignment between the Be star's spin and the orbital plane. In this context, the binary's inclination angle is estimated to be 22$^{\circ}$-49$^{\circ}$ by analyzing the shallow double-peaked profile of the H$α$ emission line. This inclination range corresponds to a BH mass estimate between $15\,M_\odot$ and $58\,M_\odot$. As the only unambiguous Be-BH binary system known to date, ALS 8814 provides valuable constraints on the BH formation in a binary system with a high-mass companion.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization
Authors:
Weizhou Shen,
Chenliang Li,
Fanqi Wan,
Shengyi Liao,
Shaopeng Lai,
Bo Zhang,
Yingcheng Shi,
Yuning Wu,
Gang Fu,
Zhansheng Li,
Bin Yang,
Ji Zhang,
Fei Huang,
Jingren Zhou,
Ming Yan
Abstract:
This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enab…
▽ More
This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enables multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance.
Evolved from the Qwen architecture series, QwenLong-CPRS introduces four key innovations: (1) Natural language-guided dynamic optimization, (2) Bidirectional reasoning layers for enhanced boundary awareness, (3) Token critic mechanisms with language modeling heads, and (4) Window-parallel inference.
Comprehensive evaluations across five benchmarks (4K-2M word contexts) demonstrate QwenLong-CPRS's threefold effectiveness: (1) Consistent superiority over other context management methods like RAG and sparse attention in both accuracy and efficiency. (2) Architecture-agnostic integration with all flagship LLMs, including GPT-4o, Gemini2.0-pro, Claude3.7-sonnet, DeepSeek-v3, and Qwen2.5-max, achieves 21.59$\times$ context compression alongside 19.15-point average performance gains; (3) Deployed with Qwen2.5-32B-Instruct, QwenLong-CPRS surpasses leading proprietary LLMs by 4.85 and 10.88 points on Ruler-128K and InfiniteBench, establishing new SOTA performance.
△ Less
Submitted 27 May, 2025; v1 submitted 23 May, 2025;
originally announced May 2025.
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Authors:
Fanqi Wan,
Weizhou Shen,
Shengyi Liao,
Yingcheng Shi,
Chenliang Li,
Ziyi Yang,
Ji Zhang,
Fei Huang,
Jingren Zhou,
Ming Yan
Abstract:
Recent large reasoning models (LRMs) have demonstrated strong reasoning capabilities through reinforcement learning (RL). These improvements have primarily been observed within the short-context reasoning tasks. In contrast, extending LRMs to effectively process and reason on long-context inputs via RL remains a critical unsolved challenge. To bridge this gap, we first formalize the paradigm of lo…
▽ More
Recent large reasoning models (LRMs) have demonstrated strong reasoning capabilities through reinforcement learning (RL). These improvements have primarily been observed within the short-context reasoning tasks. In contrast, extending LRMs to effectively process and reason on long-context inputs via RL remains a critical unsolved challenge. To bridge this gap, we first formalize the paradigm of long-context reasoning RL, and identify key challenges in suboptimal training efficiency and unstable optimization process. To address these issues, we propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling. Specifically, we utilize a warm-up supervised fine-tuning (SFT) stage to establish a robust initial policy, followed by a curriculum-guided phased RL technique to stabilize the policy evolution, and enhanced with a difficulty-aware retrospective sampling strategy to incentivize the policy exploration. Experiments on seven long-context document question-answering benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs. This work advances the development of practical long-context LRMs capable of robust reasoning across information-intensive environments.
△ Less
Submitted 27 May, 2025; v1 submitted 23 May, 2025;
originally announced May 2025.
-
On the Input-Output Monotonicity of Voltage Dynamics of Power System with Grid-Forming Converters
Authors:
Zhenyao Li,
Shengwen Liao,
Qian Zhang,
Xuechun Zhang,
Deqiang Gan
Abstract:
Integration of renewable resources is profoundly reshaping the dynamics of modern power systems. This study shows that the voltage dynamics of power systems with multiple grid-forming (GFM) converters often enjoys a desirable property called input-output monotonicity. A systematic approach for computing the derivatives of the voltage subsystem is presented first, which provides insight into the st…
▽ More
Integration of renewable resources is profoundly reshaping the dynamics of modern power systems. This study shows that the voltage dynamics of power systems with multiple grid-forming (GFM) converters often enjoys a desirable property called input-output monotonicity. A systematic approach for computing the derivatives of the voltage subsystem is presented first, which provides insight into the structural characteristics of these models. Next, the sign pattern of the trajectory Jacobian matrix associated with the voltage subsystem is analyzed and revealed. The analysis indicates that the voltage dynamics of power systems often exhibits the so-called input-output monotonicity property. The theoretical results are then validated through several simulation examples, underscoring their practical implications.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
First Light And Reionization Epoch Simulations (FLARES) -- XIX: Supermassive black hole mergers in the early Universe and their environmental dependence
Authors:
Shihong Liao,
Dimitrios Irodotou,
Maxwell G. A. Maltz,
Christopher C. Lovell,
Zhen Jiang,
Sophie L. Newman,
Aswin P. Vijayan,
Paurush Punyasheel,
William J. Roper,
Louise T. C. Seeyave,
Sonja Soininen,
Peter A. Thomas,
Stephen M. Wilkins
Abstract:
The upcoming space-based gravitational wave (GW) observatory, LISA, is expected to detect GW signals from supermassive black hole (SMBH) mergers occurring at high redshifts. However, understanding the origin and growth of SMBHs in the early Universe remains an open problem in astrophysics. In this work, we utilize the First Light And Reionization Epoch Simulations (FLARES), a suite of cosmological…
▽ More
The upcoming space-based gravitational wave (GW) observatory, LISA, is expected to detect GW signals from supermassive black hole (SMBH) mergers occurring at high redshifts. However, understanding the origin and growth of SMBHs in the early Universe remains an open problem in astrophysics. In this work, we utilize the First Light And Reionization Epoch Simulations (FLARES), a suite of cosmological hydrodynamical zoom-in simulations, to study SMBH mergers at $5 \lesssim z \lesssim 10$ across a wide range of environments. Most mergers in FLARES involve secondary SMBHs near the seed mass ($m_{seed} \approx 1.5 \times 10^{5} M_{\odot}$) while primary SMBHs span up to $10^{9} M_{\odot}$, resulting in mass ratios from $q \sim 10^{-4}$ to $1$, with a peak at $q \sim 1$. The number of mergers increases rapidly towards lower redshifts, and the comoving total number density scales with overdensity as $n_{merger} = 10^{-3.80} (1 + δ)^{4.56}$. Denser regions host more massive mergers, with higher merger redshifts and lower mass ratios. Within the FLARES redshift range, LISA is expected to detect mergers with $10^{5} \lesssim M_{tot} / M_{\odot} \lesssim 10^{8}$ and $q \gtrsim 10^{-2}$, corresponding to a detection rate of 0.030 $yr^{-1}$ for events with signal-to-noise ratio $SNR \geq 10$. Our study demonstrates the sensitivity of GW predictions at high redshifts to SMBH seed models and merger time delays, highlighting the need for improved modeling in future cosmological simulations to maximize LISA's scientific return.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Authors:
Huashan Sun,
Shengyi Liao,
Yansen Han,
Yu Bai,
Yang Gao,
Cheng Fu,
Weizhou Shen,
Fanqi Wan,
Ming Yan,
Ji Zhang,
Fei Huang
Abstract:
Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named…
▽ More
Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named $\textbf{S}$h$\textbf{o}$rt-to-$\textbf{Lo}$ng $\textbf{P}$reference $\textbf{O}$ptimization ($\textbf{SoLoPO}$), decoupling long-context preference optimization (PO) into two components: short-context PO and short-to-long reward alignment (SoLo-RA), supported by both theoretical and empirical evidence. Specifically, short-context PO leverages preference pairs sampled from short contexts to enhance the model's contextual knowledge utilization ability. Meanwhile, SoLo-RA explicitly encourages reward score consistency utilization for the responses when conditioned on both short and long contexts that contain identical task-relevant information. This facilitates transferring the model's ability to handle short contexts into long-context scenarios. SoLoPO is compatible with mainstream preference optimization algorithms, while substantially improving the efficiency of data construction and training processes. Experimental results show that SoLoPO enhances all these algorithms with respect to stronger length and domain generalization abilities across various long-context benchmarks, while achieving notable improvements in both computational and memory efficiency.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Block Circulant Adapter for Large Language Models
Authors:
Xinyu Ding,
Meiqi Wang,
Siyu Liao,
Zhongfeng Wang
Abstract:
Fine-tuning large language models (LLMs) is difficult due to their huge model size. Recent Fourier domain-based methods show potential for reducing fine-tuning costs. We propose a block circulant matrix-based fine-tuning method with a stable training heuristic to leverage the properties of circulant matrices and one-dimensional Fourier transforms to reduce storage and computation costs. Experiment…
▽ More
Fine-tuning large language models (LLMs) is difficult due to their huge model size. Recent Fourier domain-based methods show potential for reducing fine-tuning costs. We propose a block circulant matrix-based fine-tuning method with a stable training heuristic to leverage the properties of circulant matrices and one-dimensional Fourier transforms to reduce storage and computation costs. Experiments show that our method uses $14\times$ less number of parameters than VeRA, $16\times$ smaller than LoRA and $32\times$ less FLOPs than FourierFT, while maintaining close or better task performance. Our approach presents a promising way in frequency domain to fine-tune large models on downstream tasks.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors
Authors:
Xinyu Ding,
Lexuan Chen,
Siyu Liao,
Zhongfeng Wang
Abstract:
Foundation models have achieved tremendous success in different domains. However, their huge computation and storage complexity make these models difficult to fine-tune and also less applicable in practice. Recent study shows training in Fourier domain can be an effective fine-tuning method in terms of both model performance and number of training parameters. In this work, we propose to further re…
▽ More
Foundation models have achieved tremendous success in different domains. However, their huge computation and storage complexity make these models difficult to fine-tune and also less applicable in practice. Recent study shows training in Fourier domain can be an effective fine-tuning method in terms of both model performance and number of training parameters. In this work, we propose to further reduce the complexity by the factorization through the product of interleaved circulant and diagonal matrices. In addition, we address the case of non-square fine-tuning weights by partitioning the circulant matrix into blocks. Our method avoids the construction of weight change matrix and utilizes 1D fast Fourier transform (FFT) instead of 2D FFT. Experimental results show that our method achieves similar or better performance across various tasks with much less floating-point operations (FLOPs) and the number of trainable parameters.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Ultra-diffuse galaxies in the EAGLE simulation
Authors:
Haonan Zheng,
Shihong Liao,
Liang Gao,
Fangzhou Jiang
Abstract:
We use the highest-resolution EAGLE simulation, Recal-L025N0752, to study the properties and formation of ultra-diffuse galaxies (UDGs). We identify 181 UDGs and find their properties closely match observations. The total masses of EAGLE UDGs range from ${\sim}5\times 10^{8}~M_{\odot}$ to ${\sim}2\times 10^{11}~M_{\odot}$, indicating that they are dwarf galaxies rather than failed $L_\star$ galaxi…
▽ More
We use the highest-resolution EAGLE simulation, Recal-L025N0752, to study the properties and formation of ultra-diffuse galaxies (UDGs). We identify 181 UDGs and find their properties closely match observations. The total masses of EAGLE UDGs range from ${\sim}5\times 10^{8}~M_{\odot}$ to ${\sim}2\times 10^{11}~M_{\odot}$, indicating that they are dwarf galaxies rather than failed $L_\star$ galaxies. EAGLE UDGs are not a distinct population, but rather a subset of dwarf galaxies, as their properties generally form a continuous distribution with those of normal dwarf galaxies. Unlike the situations in previous studies, the extended sizes of field UDGs in EAGLE are not driven by high halos spin or by supernova-induced stellar expansion, but instead largely arise from high spins in their star-forming gas and thus the newly formed stars at large radii. This might be attributed to galactic fountains, by which star-forming gas are launched to large halo-centric distances and acquire additional angular momentum through interactions with the circumgalactic medium. For satellite UDGs, ${\sim} 60 \%$ of them were already UDGs before falling into the host galaxy, while the remaining ${\sim} 40\%$ were normal galaxies prior to infall and subsequently transformed into UDGs due to tidal effects after infall.
△ Less
Submitted 21 April, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Physical significance of artificial numerical noise in direct numerical simulation of turbulence
Authors:
Shijun Liao,
Shijie Qin
Abstract:
Using clean numerical simulation (CNS) in which artificial numerical noise is negligible over a finite, sufficiently long interval of time, we provide evidence, for the first time, that artificial numerical noise in direct numerical simulation (DNS) of turbulence is approximately equivalent to thermal fluctuation and/or stochastic environmental noise. This confers physical significance on the arti…
▽ More
Using clean numerical simulation (CNS) in which artificial numerical noise is negligible over a finite, sufficiently long interval of time, we provide evidence, for the first time, that artificial numerical noise in direct numerical simulation (DNS) of turbulence is approximately equivalent to thermal fluctuation and/or stochastic environmental noise. This confers physical significance on the artificial numerical noise of DNS of the Navier-Stokes equations. As a result, DNS on a fine mesh should correspond to turbulence under small internal/external physical disturbance, whereas DNS on a sparse mesh corresponds to turbulent flow under large physical disturbance, respectively. The key point is that: all of them have physical meanings and so are correct in terms of their deterministic physics, even if their statistics are quite different. This is illustrated herein. Our paper provides a positive viewpoint regarding the presence of artificial numerical noise in DNS.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Effects of appendages on the turbulence and flow noise of a submarine model using high-order scheme
Authors:
Peng Jiang,
Shijun Liao,
Ling Liu,
Bin Xie
Abstract:
This study employs high-fidelity numerical simulations to investigate the influence of appendages on the turbulent flow dynamics and far-field acoustic radiation of the SUBOFF submarine model at a Reynolds number of Re = 1.2*10^7. Utilizing a third-order numerical scheme combined with wall-modeled large eddy simulation (WMLES) and the Ffowcs Williams-Hawkings (FW-H) acoustic analogy, the hydrodyna…
▽ More
This study employs high-fidelity numerical simulations to investigate the influence of appendages on the turbulent flow dynamics and far-field acoustic radiation of the SUBOFF submarine model at a Reynolds number of Re = 1.2*10^7. Utilizing a third-order numerical scheme combined with wall-modeled large eddy simulation (WMLES) and the Ffowcs Williams-Hawkings (FW-H) acoustic analogy, the hydrodynamic and acoustic behaviors of an appended SUBOFF configuration are compared to those of a bare hull. A computational grid of 103 million cells resolves the intricate flow interactions, while 648 hydrophones positioned 500 diameters from the model capture far-field acoustic signatures. Key results reveal that appendages significantly amplify hydrodynamic and acoustic disturbances. Flow separations and vortex shedding at appendage junctions elevate pressure-induced drag contributions, contrasting the viscous-dominated drag of the bare hull. The sail-hull interaction intensifies local surface pressure fluctuations, increasing power spectral density (PSD) amplitudes by up to an order of magnitude. In the far field, the appended SUBOFF generates sound pressure levels approximately 20 dB higher than the bare hull, with distinct dipole directivity patterns and peak noise levels (85.10 dB) observed on the central plane. Appendages also disrupt wake symmetry, introducing complex vortical structures such as horseshoe and necklace vortices. These findings demonstrate the critical influence of appendages on hydrodynamic and acoustic behavior, filling a gap in turbulence noise research for complex underwater geometries and providing a vital foundation for the noise reduction optimization of advanced underwater vehicles.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Free-Space Twin-Field Quantum Key Distribution
Authors:
Yu-Huai Li,
Ting Zeng,
Min-Yan Wang,
Cong Jiang,
Jin Lin,
Hao-Bin Fu,
Xin-Yang Zheng,
Jiu-Peng Chen,
Zeng-Sen Lin,
Cheng-Lin Li,
Jian-Yu Guan,
Yang Li,
Qi Shen,
Hao Li,
Lixing You,
Zhen Wang,
Fei Zhou,
Juan Yin,
Sheng-Kai Liao,
Ji-Gang Ren,
Xiang-Bin Wang,
Yuan Cao,
Qiang Zhang,
Cheng-Zhi Peng,
Jian-Wei Pan
Abstract:
Twin-field quantum key distribution (TF-QKD) elevates the secure key rate from a linear to a square-root dependence on channel loss while preserving measurement-device-independent security. This protocol is uniquely positioned to enable global-scale quantum networks, even under extreme channel loss. While fiber-based TF-QKD implementations have advanced rapidly since its proposal, free-space reali…
▽ More
Twin-field quantum key distribution (TF-QKD) elevates the secure key rate from a linear to a square-root dependence on channel loss while preserving measurement-device-independent security. This protocol is uniquely positioned to enable global-scale quantum networks, even under extreme channel loss. While fiber-based TF-QKD implementations have advanced rapidly since its proposal, free-space realizations have remained elusive due to atmospheric turbulence-induced phase distortions. Here, we report the first experimental demonstration of free-space TF-QKD over 14.2 km urban atmospheric channels, surpassing the effective atmospheric thickness -- a critical threshold for satellite compatibility. We achieve a secret key rate exceeding the repeaterless capacity bound, a milestone for practical quantum communication. Our approach eliminates the need for an auxiliary channel to stabilize a closed interferometer, instead leveraging open-channel time and phase control of optical pulses. This work represents a pivotal advance toward satellite-based global quantum networks, combining high-speed key distribution with inherent resistance to real-world channel fluctuations.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Precoder Learning by Leveraging Unitary Equivariance Property
Authors:
Yilun Ge,
Shuyao Liao,
Shengqian Han,
Chenyang Yang
Abstract:
Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of…
▽ More
Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of the weight matrix of DNNs. In this paper, we study a stronger property than permutation equivariance, namely unitary equivariance, for precoder learning. We first show that a DNN with unitary equivariance designed by further introducing parameter sharing into a permutation equivariant DNN is unable to learn the optimal precoder. We proceed to develop a novel non-linear weighting process satisfying unitary equivariance and then construct a joint unitary and permutation equivariant DNN. Simulation results demonstrate that the proposed DNN not only outperforms existing learning methods in learning performance and generalizability but also reduces training complexity.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
HELM: Human-Preferred Exploration with Language Models
Authors:
Shuhao Liao,
Xuxin Lv,
Yuhong Cao,
Jeric Lew,
Wenjun Wu,
Guillaume Sartoretti
Abstract:
In autonomous exploration tasks, robots are required to explore and map unknown environments while efficiently planning in dynamic and uncertain conditions. Given the significant variability of environments, human operators often have specific preference requirements for exploration, such as prioritizing certain areas or optimizing for different aspects of efficiency. However, existing methods str…
▽ More
In autonomous exploration tasks, robots are required to explore and map unknown environments while efficiently planning in dynamic and uncertain conditions. Given the significant variability of environments, human operators often have specific preference requirements for exploration, such as prioritizing certain areas or optimizing for different aspects of efficiency. However, existing methods struggle to accommodate these human preferences adaptively, often requiring extensive parameter tuning or network retraining. With the recent advancements in Large Language Models (LLMs), which have been widely applied to text-based planning and complex reasoning, their potential for enhancing autonomous exploration is becoming increasingly promising. Motivated by this, we propose an LLM-based human-preferred exploration framework that seamlessly integrates a mobile robot system with LLMs. By leveraging the reasoning and adaptability of LLMs, our approach enables intuitive and flexible preference control through natural language while maintaining a task success rate comparable to state-of-the-art traditional methods. Experimental results demonstrate that our framework effectively bridges the gap between human intent and policy preference in autonomous exploration, offering a more user-friendly and adaptable solution for real-world robotic applications.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Passive Heart Rate Monitoring During Smartphone Use in Everyday Life
Authors:
Shun Liao,
Paolo Di Achille,
Jiang Wu,
Silviu Borac,
Jonathan Wang,
Xin Liu,
Eric Teasley,
Lawrence Cai,
Yuzhe Yang,
Yun Liu,
Daniel McDuff,
Hao-Wei Su,
Brent Winslow,
Anupam Pathak,
Shwetak Patel,
James A. Taylor,
Jameson K. Rogers,
Ming-Zher Poh
Abstract:
Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos…
▽ More
Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions, representing the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) < 10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error < 5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring.
△ Less
Submitted 21 March, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN
Authors:
Jiayu Zhang,
Zhiyu Zhu,
Xinyi Wang,
Silin Liao,
Zhibo Jin,
Flora D. Salim,
Huaming Chen
Abstract:
Deep neural networks have demonstrated remarkable performance across various domains. However, they are vulnerable to adversarial examples, which can lead to erroneous predictions. Generative Adversarial Networks (GANs) can leverage the generators and discriminators model to quickly produce high-quality adversarial examples. Since both modules train in a competitive and simultaneous manner, GAN-ba…
▽ More
Deep neural networks have demonstrated remarkable performance across various domains. However, they are vulnerable to adversarial examples, which can lead to erroneous predictions. Generative Adversarial Networks (GANs) can leverage the generators and discriminators model to quickly produce high-quality adversarial examples. Since both modules train in a competitive and simultaneous manner, GAN-based algorithms like AdvGAN can generate adversarial examples with better transferability compared to traditional methods. However, the generation of perturbations is usually limited to a single iteration, preventing these examples from fully exploiting the potential of the methods. To tackle this issue, we introduce a novel approach named Progressive Auto-Regression AdvGAN (PAR-AdvGAN). It incorporates an auto-regressive iteration mechanism within a progressive generation network to craft adversarial examples with enhanced attack capability. We thoroughly evaluate our PAR-AdvGAN method with a large-scale experiment, demonstrating its superior performance over various state-of-the-art black-box adversarial attacks, as well as the original AdvGAN.Moreover, PAR-AdvGAN significantly accelerates the adversarial example generation, i.e., achieving the speeds of up to 335.5 frames per second on Inception-v3 model, outperforming the gradient-based transferable attack algorithms. Our code is available at: https://anonymous.4open.science/r/PAR-01BF/
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Analysis of the Gaia Data Release 3 parallax bias at bright magnitudes
Authors:
Ye Ding,
Shilong Liao,
Shangyu Wen,
Zhaoxiang Qi
Abstract:
The combination of visual and spectroscopic orbits in binary systems enables precise distance measurements without additional assumptions, making them ideal for examining the parallax zero-point offset (PZPO) at bright magnitudes (G < 13) in Gaia. We compiled 249 orbital parallaxes from 246 binary systems and used Markov Chain Monte Carlo (MCMC) simulations to exclude binaries where orbital motion…
▽ More
The combination of visual and spectroscopic orbits in binary systems enables precise distance measurements without additional assumptions, making them ideal for examining the parallax zero-point offset (PZPO) at bright magnitudes (G < 13) in Gaia. We compiled 249 orbital parallaxes from 246 binary systems and used Markov Chain Monte Carlo (MCMC) simulations to exclude binaries where orbital motion significantly impacts parallaxes. After removing systems with substantial parallax errors, large discrepancies between orbital and Gaia parallaxes, and selecting systems with orbital periods under 100 days, a final sample of 44 binaries was retained.The weighted mean PZPO for this sample is -38.9 $\pm$ 10.3 $μ$as, compared to -58.0 $\pm$ 10.1 $μ$as for the remaining systems, suggesting that orbital motion significantly affects parallax measurements. These formal uncertainties of the PZPO appear to be underestimated by a factor of approximately 2.0. For bright stars with independent trigonometric parallaxes from VLBI and HST, the weighted mean PZPOs are -14.8 $\pm$ 10.6 and -31.9 $\pm$ 14.1 $μ$as, respectively. Stars with $G \leq 8$ exhibit a more pronounced parallax bias, with some targets showing unusually large deviations, likely due to systematic calibration errors in Gaia for bright stars. The orbital parallaxes dataset compiled in this work serves as a vital resource for validating parallaxes in future Gaia data releases.
△ Less
Submitted 13 February, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding
Authors:
Shuhao Liao,
Weihang Xia,
Yuhong Cao,
Weiheng Dai,
Chengyang He,
Wenjun Wu,
Guillaume Sartoretti
Abstract:
The Multi-Agent Path Finding (MAPF) problem aims to determine the shortest and collision-free paths for multiple agents in a known, potentially obstacle-ridden environment. It is the core challenge for robotic deployments in large-scale logistics and transportation. Decentralized learning-based approaches have shown great potential for addressing the MAPF problems, offering more reactive and scala…
▽ More
The Multi-Agent Path Finding (MAPF) problem aims to determine the shortest and collision-free paths for multiple agents in a known, potentially obstacle-ridden environment. It is the core challenge for robotic deployments in large-scale logistics and transportation. Decentralized learning-based approaches have shown great potential for addressing the MAPF problems, offering more reactive and scalable solutions. However, existing learning-based MAPF methods usually rely on agents making decisions based on a limited field of view (FOV), resulting in short-sighted policies and inefficient cooperation in complex scenarios. There, a critical challenge is to achieve consensus on potential movements between agents based on limited observations and communications. To tackle this challenge, we introduce a new framework that applies sheaf theory to decentralized deep reinforcement learning, enabling agents to learn geometric cross-dependencies between each other through local consensus and utilize them for tightly cooperative decision-making. In particular, sheaf theory provides a mathematical proof of conditions for achieving global consensus through local observation. Inspired by this, we incorporate a neural network to approximately model the consensus in latent space based on sheaf theory and train it through self-supervised learning. During the task, in addition to normal features for MAPF as in previous works, each agent distributedly reasons about a learned consensus feature, leading to efficient cooperation on pathfinding and collision avoidance. As a result, our proposed method demonstrates significant improvements over state-of-the-art learning-based MAPF planners, especially in relatively large and complex scenarios, demonstrating its superiority over baselines in various simulations and real-world robot experiments.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Overcoming the surface paradox: Buried perovskite quantum dots in wide-bandgap perovskite thin films
Authors:
Hao Zhang,
Altaf Pasha,
Isaac Metcalf,
Jianlin Zhou,
Mathias Staunstrup,
Yunxuan Zhu,
Shusen Liao,
Ken Ssennyimba,
Jia-Shiang Chen,
Surya Prakash Reddy,
Simon Thébaud,
Jin Hou,
Xinting Shuai,
Faiz Mandani,
Siraj Sidhik,
Matthew R. Jones,
Xuedan Ma,
R Geetha Balakrishna,
Sandhya Susarla,
David S. Ginger,
Claudine Katan,
Mercouri G. Kanatzidis,
Moungi G. Bawendi,
Douglas Natelson,
Philippe Tamarat
, et al. (3 additional authors not shown)
Abstract:
Colloidal perovskite quantum dots (PQDs) are an exciting platform for on-demand quantum, and classical optoelectronic and photonic devices. However, their potential success is limited by the extreme sensitivity and low stability arising from their weak intrinsic lattice bond energy and complex surface chemistry. Here we report a novel platform of buried perovskite quantum dots (b-PQDs) in a three-…
▽ More
Colloidal perovskite quantum dots (PQDs) are an exciting platform for on-demand quantum, and classical optoelectronic and photonic devices. However, their potential success is limited by the extreme sensitivity and low stability arising from their weak intrinsic lattice bond energy and complex surface chemistry. Here we report a novel platform of buried perovskite quantum dots (b-PQDs) in a three-dimensional perovskite thin-film, fabricated using one-step, flash annealing, which overcomes surface related instabilities in colloidal perovskite dots. The b-PQDs demonstrate ultrabright and stable single-dot emission, with resolution-limited linewidths below 130 μeV, photon-antibunching (g^2(0)=0.1), no blinking, suppressed spectral diffusion, and high photon count rates of 10^4/s, consistent with unity quantum yield. The ultrasharp linewidth resolves exciton fine-structures (dark and triplet excitons) and their dynamics under a magnetic field. Additionally, b-PQDs can be electrically driven to emit single photons with 1 meV linewidth and photon-antibunching (g^2(0)=0.4). These results pave the way for on-chip, low-cost single-photon sources for next generation quantum optical communication and sensing.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Phase diagram of Rydberg atoms in a two-leg rectangular ladder
Authors:
Shu-Ao Liao,
Jin Zhang,
Li-Ping Yang
Abstract:
Using the density matrix renormalization group algorithm, we map the ground-state phase diagram of a two-leg Rydberg ladder array with lattice spacings $a_x=2a_y$. We identify various density wave phases that spontaneously break the translational symmetry or the top-bottom reflection symmetry within the ladder. By increasing the laser detuning from zero, where the system is in a disordered phase t…
▽ More
Using the density matrix renormalization group algorithm, we map the ground-state phase diagram of a two-leg Rydberg ladder array with lattice spacings $a_x=2a_y$. We identify various density wave phases that spontaneously break the translational symmetry or the top-bottom reflection symmetry within the ladder. By increasing the laser detuning from zero, where the system is in a disordered phase that preserves all symmetries, we observe density wave orders with spontaneous breaking of the translational $\mathbb{Z}_p$ symmetries at intermediate detuning values, while the reflection symmetry is preserved. These orders exhibit nonzero bond orders with positive expectation values on every $p$th rung, thus labeled as $\mathbb{Z}_p^+$ phases. At larger detuning values, another spontaneous breaking of the reflection symmetry, which disrupted the bond orders on the rungs, occurs via an Ising phase transition. In these phases, either the top or the bottom site is occupied in a staggered way on every $p$th rung, breaking the translational $\mathbb{Z}_{2p}$ symmetry, thus labeled by $\mathbb{Z}_{2p}$ phases. We locate and characterize the 3-state Potts point and Ashkin-Teller point along the commensurate lines, as well as the direct chiral phase transitions between the disordered phase and the $\mathbb{Z}_p^+$ ($p = 3, 4$) phases. Critical exponents $ν$ and $z$ are calculated for both conformal and chiral phase transition points. We finally identify two types of floating phases in the phase diagram: one characterized by a quasi-long-range incommensurate bond-order wave, and the other by a quasi-long-range incommensurate wave of density differences in the rungs. Our work motivates further applications of Rydberg atom arrays in quantum simulation.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Measuring the Hubble constant through the galaxy pairwise peculiar velocity
Authors:
Wangzheng Zhang,
Ming-chung Chu,
Shihong Liao,
Shek Yeung,
Hui-Jie Hu
Abstract:
The Hubble constant $H_0$, the current expansion rate of the universe, is one of the most important parameters in cosmology. The cosmic expansion regulates the mutually approaching motion of a pair of celestial objects due to their gravity. Therefore, the mean pairwise peculiar velocity of celestial objects, which quantifies their relative motion, is sensitive to both $H_0$ and the dimensionless t…
▽ More
The Hubble constant $H_0$, the current expansion rate of the universe, is one of the most important parameters in cosmology. The cosmic expansion regulates the mutually approaching motion of a pair of celestial objects due to their gravity. Therefore, the mean pairwise peculiar velocity of celestial objects, which quantifies their relative motion, is sensitive to both $H_0$ and the dimensionless total matter density $Ω_m$. Based on this, using the Cosmicflows-4 data, we measured $H_0$ for the first time via the galaxy pairwise velocity in the nonlinear and quasi-linear range. Our results yield $H_0=75.5\pm1.4$ km s$^{-1}$ Mpc$^{-1}$ and $Ω_m=0.311^{+0.029}_{-0.028}$ . The uncertainties of $H_0$ and $Ω_m$ can be improved to around 0.6% and 2%, respectively, if the statistical errors become negligible in the future.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Bench-CoE: a Framework for Collaboration of Experts from Benchmark
Authors:
Yuanshuai Wang,
Xingjian Zhang,
Jinkun Zhao,
Siwei Wen,
Peilin Feng,
Shuhao Liao,
Lei Huang,
Wenjun Wu
Abstract:
Large Language Models (LLMs) are key technologies driving intelligent systems to handle multiple tasks. To meet the demands of various tasks, an increasing number of LLMs-driven experts with diverse capabilities have been developed, accompanied by corresponding benchmarks to evaluate their performance. This paper proposes the Bench-CoE framework, which enables Collaboration of Experts (CoE) by eff…
▽ More
Large Language Models (LLMs) are key technologies driving intelligent systems to handle multiple tasks. To meet the demands of various tasks, an increasing number of LLMs-driven experts with diverse capabilities have been developed, accompanied by corresponding benchmarks to evaluate their performance. This paper proposes the Bench-CoE framework, which enables Collaboration of Experts (CoE) by effectively leveraging benchmark evaluations to achieve optimal performance across various tasks. Bench-CoE includes a set of expert models, a router for assigning tasks to corresponding experts, and a benchmark dataset for training the router. Moreover, we formulate Query-Level and Subject-Level approaches based on our framework, and analyze the merits and drawbacks of these two approaches. Finally, we conduct a series of experiments with vary data distributions on both language and multimodal tasks to validate that our proposed Bench-CoE outperforms any single model in terms of overall performance. We hope this method serves as a baseline for further research in this area. The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
NSI-IBP: A General Numerical Singular Integral Method via Integration by Parts
Authors:
Shaolin Liao
Abstract:
A general framework of Numerical Singular Integrals (NSI) method based on the Integration By Parts (IBP) has been developed for integrals involving singular and nearly singular integrands, or NSI-IBP. Through a general integration by parts formula and by choosing some analytically integrable function to approximate the original integrand, various well-known integration by parts methods can be deri…
▽ More
A general framework of Numerical Singular Integrals (NSI) method based on the Integration By Parts (IBP) has been developed for integrals involving singular and nearly singular integrands, or NSI-IBP. Through a general integration by parts formula and by choosing some analytically integrable function to approximate the original integrand, various well-known integration by parts methods can be derived. Rigorous mathematical derivations have been performed to transform the original singular or nearly singular integrals into non-singular integrals that can be computed efficiently, along with the boundary values added. What's more important, the NSI-IBP method works well even when the exact form of the singular integrand is not known. Criteria on how to choose the appropriate function with a known analytical integral that closely approximates the original integrand have been outlined and explained. Numerical recipe has been presented to apply the proposed NSI-IBP. Numerical experiments have been carried out on various singular integrals such as the power-law decaying integrand, the logarithmic function, and their hybrid products. It can be shown that various relative accuracy up to $10^{-15}$ can be achieved, even the exact singular function is not known. Finally, the nearly singular integrals involving the scalar Green's function have been evaluated for both electrostatics applications and Computational Electromagnetics (CEM) applications.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Efficient Bitcoin Address Classification Using Quantum-Inspired Feature Selection
Authors:
Ming-Fong Sie,
Yen-Jui Chang,
Chien-Lung Lin,
Ching-Ray Chang,
Shih-Wei Liao
Abstract:
Over 900 million Bitcoin transactions have been recorded, posing considerable challenges for machine learning in terms of computation time and maintaining prediction accuracy. We propose an innovative approach using quantum-inspired algorithms implemented with Simulated Annealing and Quantum Annealing to address the challenge of local minima in solution spaces. This method efficiently identifies k…
▽ More
Over 900 million Bitcoin transactions have been recorded, posing considerable challenges for machine learning in terms of computation time and maintaining prediction accuracy. We propose an innovative approach using quantum-inspired algorithms implemented with Simulated Annealing and Quantum Annealing to address the challenge of local minima in solution spaces. This method efficiently identifies key features linked to mixer addresses, significantly reducing model training time. By categorizing Bitcoin addresses into six classes: exchanges, faucets, gambling, marketplaces, mixers, and mining pools, and applying supervised learning methods, our results demonstrate that feature selection with SA reduced training time by 30.3% compared to using all features in a random forest model while maintaining a 91% F1-score for mixer addresses. This highlights the potential of quantum-inspired algorithms to swiftly and accurately identify high-risk Bitcoin addresses based on transaction features.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Consistency Regularization for Complementary Clothing Recommendations
Authors:
Shuiying Liao,
P. Y. Mok,
Li Li
Abstract:
This paper reports on the development of a Consistency Regularized model for Bayesian Personalized Ranking (CR-BPR), addressing to the drawbacks in existing complementary clothing recommendation methods, namely limited consistency and biased learning caused by diverse feature scale of multi-modal data. Compared to other product types, fashion preferences are inherently subjective and more personal…
▽ More
This paper reports on the development of a Consistency Regularized model for Bayesian Personalized Ranking (CR-BPR), addressing to the drawbacks in existing complementary clothing recommendation methods, namely limited consistency and biased learning caused by diverse feature scale of multi-modal data. Compared to other product types, fashion preferences are inherently subjective and more personal, and fashion are often presented, not by individual clothing product, but with other complementary product(s) in a well coordinated fashion outfit. Current complementary-product recommendation studies primarily focus on user preference and product matching, this study further emphasizes the consistency observed in user-product interactions as well as product-product interactions, in the specific context of clothing matching. Most traditional approaches often underplayed the impact of existing wardrobe items on future matching choices, resulting in less effective preference prediction models. Moreover, many multi-modal information based models overlook the limitations arising from various feature scales being involved. To address these gaps, the CR-BPR model integrates collaborative filtering techniques to incorporate both user preference and product matching modeling, with a unique focus on consistency regularization for each aspect. Additionally, the incorporation of a feature scaling process further addresses the imbalances caused by different feature scales, ensuring that the model can effectively handle multi-modal data without being skewed by any particular type of feature. The effectiveness of the CR-BPR model was validated through detailed analysis involving two benchmark datasets. The results confirmed that the proposed approach significantly outperforms existing models.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
DULAG: A DUal and Lensed AGN candidate catalog with GMP method
Authors:
Qiqi Wu,
M. Scialpi,
Shilong Liao,
F. Mannucci,
Zhaoxiang Qi
Abstract:
Context. A series of studies have demonstrated that the Gaia multipeak method (GMP) is a very efficient technique to select active galactic nucleus (AGN) pair candidates. The number of candidates is determined by the size of the input AGN catalogs, usually limited to spectroscopically-confirmed objects. Aims. The objective of this work is to compile a larger and highly reliable catalog of GMP pair…
▽ More
Context. A series of studies have demonstrated that the Gaia multipeak method (GMP) is a very efficient technique to select active galactic nucleus (AGN) pair candidates. The number of candidates is determined by the size of the input AGN catalogs, usually limited to spectroscopically-confirmed objects. Aims. The objective of this work is to compile a larger and highly reliable catalog of GMP pair candidates extracted from the six million objects the Gaia AGN catalog, the majority of which lack spectroscopic information. Methods. In order to ascertain the differences in the properties of GMP pair candidates compared to normal AGN, we conducted an investigation utilising samples of GMP AGN. These differences were employed to establish the optimal selecting criteria, which ultimately led to the identification of a highly reliable candidate catalog. Results. We found significant differences in astrometry and multi-band colour distribution between normal AGN and GMP pair candidates. A DUal and Lensed AGN candidate catalog with GMP method (DULAG) comprising 5,286 sources was ultimately compiled, accompanied by a highly reliable Golden sample of 1,867 sources. A total of 37 sources in the Golden sample have been identified as dual AGN or lensed AGN. For the majority of sources in the Golden sample, we provide reference redshifts and find three close AGN pair candidates among them.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Authors:
Panwen Hu,
Jin Jiang,
Jianqi Chen,
Mingfei Han,
Shengcai Liao,
Xiaojun Chang,
Xiaodan Liang
Abstract:
The advent of AI-Generated Content (AIGC) has spurred research into automated video generation to streamline conventional processes. However, automating storytelling video production, particularly for customized narratives, remains challenging due to the complexity of maintaining subject consistency across shots. While existing approaches like Mora and AesopAgent integrate multiple agents for Stor…
▽ More
The advent of AI-Generated Content (AIGC) has spurred research into automated video generation to streamline conventional processes. However, automating storytelling video production, particularly for customized narratives, remains challenging due to the complexity of maintaining subject consistency across shots. While existing approaches like Mora and AesopAgent integrate multiple agents for Story-to-Video (S2V) generation, they fall short in preserving protagonist consistency and supporting Customized Storytelling Video Generation (CSVG). To address these limitations, we propose StoryAgent, a multi-agent framework designed for CSVG. StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Notably, our framework includes agents for story design, storyboard generation, video creation, agent coordination, and result evaluation. Leveraging the strengths of different models, StoryAgent enhances control over the generation process, significantly improving character consistency. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency, while a novel storyboard generation pipeline is proposed to maintain subject consistency across shots. Extensive experiments demonstrate the effectiveness of our approach in synthesizing highly consistent storytelling videos, outperforming state-of-the-art methods. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
△ Less
Submitted 11 November, 2024; v1 submitted 7 November, 2024;
originally announced November 2024.
-
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Authors:
Shijia Liao,
Yuxuan Wang,
Tianyu Li,
Yifan Cheng,
Ruoyi Zhang,
Rongzhi Zhou,
Yijin Xing
Abstract:
Text-to-Speech (TTS) systems face ongoing challenges in processing complex linguistic features, handling polyphonic expressions, and producing natural-sounding multilingual speech - capabilities that are crucial for future AI applications. In this paper, we present Fish-Speech, a novel framework that implements a serial fast-slow Dual Autoregressive (Dual-AR) architecture to enhance the stability…
▽ More
Text-to-Speech (TTS) systems face ongoing challenges in processing complex linguistic features, handling polyphonic expressions, and producing natural-sounding multilingual speech - capabilities that are crucial for future AI applications. In this paper, we present Fish-Speech, a novel framework that implements a serial fast-slow Dual Autoregressive (Dual-AR) architecture to enhance the stability of Grouped Finite Scalar Vector Quantization (GFSQ) in sequence generation tasks. This architecture improves codebook processing efficiency while maintaining high-fidelity outputs, making it particularly effective for AI interactions and voice cloning.
Fish-Speech leverages Large Language Models (LLMs) for linguistic feature extraction, eliminating the need for traditional grapheme-to-phoneme (G2P) conversion and thereby streamlining the synthesis pipeline and enhancing multilingual support. Additionally, we developed FF-GAN through GFSQ to achieve superior compression ratios and near 100\% codebook utilization.
Our approach addresses key limitations of current TTS systems while providing a foundation for more sophisticated, context-aware speech synthesis. Experimental results show that Fish-Speech significantly outperforms baseline models in handling complex linguistic scenarios and voice cloning tasks, demonstrating its potential to advance TTS technology in AI applications. The implementation is open source at \href{https://github.com/fishaudio/fish-speech}{https://github.com/fishaudio/fish-speech}.
△ Less
Submitted 9 November, 2024; v1 submitted 2 November, 2024;
originally announced November 2024.
-
First Light and Reionisation Epoch Simulations (FLARES) XVII: Learning the galaxy-halo connection at high redshifts
Authors:
Maxwell G. A. Maltz,
Peter A. Thomas,
Christoper C. Lovell,
William J. Roper,
Aswin P. Vijayan,
Dimitrios Irodotou,
Shihong Liao,
Louise T. C. Seeyave,
Stephen M. Wilkins
Abstract:
Understanding the galaxy-halo relationship is not only key for elucidating the interplay between baryonic and dark matter, it is essential for creating large mock galaxy catalogues from N-body simulations. High-resolution hydrodynamical simulations are limited to small volumes by their large computational demands, hindering their use for comparisons with wide-field observational surveys. We overco…
▽ More
Understanding the galaxy-halo relationship is not only key for elucidating the interplay between baryonic and dark matter, it is essential for creating large mock galaxy catalogues from N-body simulations. High-resolution hydrodynamical simulations are limited to small volumes by their large computational demands, hindering their use for comparisons with wide-field observational surveys. We overcome this limitation by using the First Light and Reionisation Epoch Simulations (FLARES), a suite of high-resolution (M_gas = 1.8 x 10^6 M_Sun) zoom simulations drawn from a large, (3.2 cGpc)^3 box. We use an extremely randomised trees machine learning approach to model the relationship between galaxies and their subhaloes in a wide range of environments. This allows us to build mock catalogues with dynamic ranges that surpass those obtainable through periodic simulations. The low cost of the zoom simulations facilitates multiple runs of the same regions, differing only in the random number seed of the subgrid models; changing this seed introduces a butterfly effect, leading to random differences in the properties of matching galaxies. This randomness cannot be learnt by a deterministic machine learning model, but by sampling the noise and adding it post-facto to our predictions, we are able to recover the distributions of the galaxy properties we predict (stellar mass, star formation rate, metallicity, and size) remarkably well. We also explore the resolution-dependence of our models' performances and find minimal depreciation down to particle resolutions of order M_DM ~ 10^8 M_Sun, enabling the future application of our models to large dark matter-only boxes.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Resolution Enhancement of Under-sampled Photoacoustic Microscopy Images using Implicit Neural Representations
Authors:
Youshen Xiao,
Sheng Liao,
Xuanyang Tian,
Fan Zhang,
Xinlong Dong,
Yunhui Jiang,
Xiyu Chen,
Ruixi Sun,
Yuyao Zhang,
Fei Gao
Abstract:
Acoustic-Resolution Photoacoustic Microscopy (AR-PAM) is promising for subcutaneous vascular imaging, but its spatial resolution is constrained by the Point Spread Function (PSF). Traditional deconvolution methods like Richardson-Lucy and model-based deconvolution use the PSF to improve resolution. However, accurately measuring the PSF is difficult, leading to reliance on less accurate blind decon…
▽ More
Acoustic-Resolution Photoacoustic Microscopy (AR-PAM) is promising for subcutaneous vascular imaging, but its spatial resolution is constrained by the Point Spread Function (PSF). Traditional deconvolution methods like Richardson-Lucy and model-based deconvolution use the PSF to improve resolution. However, accurately measuring the PSF is difficult, leading to reliance on less accurate blind deconvolution techniques. Additionally, AR-PAM suffers from long scanning times, which can be reduced via down-sampling, but this necessitates effective image recovery from under-sampled data, a task where traditional interpolation methods fall short, particularly at high under-sampling rates. To address these challenges, we propose an approach based on Implicit Neural Representations (INR). This method learns a continuous mapping from spatial coordinates to initial acoustic pressure, overcoming the limitations of discrete imaging and enhancing AR-PAM's resolution. By treating the PSF as a learnable parameter within the INR framework, our technique mitigates inaccuracies associated with PSF estimation. We evaluated our method on simulated vascular data, showing significant improvements in Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) over conventional methods. Qualitative enhancements were also observed in leaf vein and in vivo mouse brain microvasculature images. When applied to a custom AR-PAM system, experiments with pencil lead demonstrated that our method delivers sharper, higher-resolution results, indicating its potential to advance photoacoustic microscopy.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Noise-expansion cascade: an origin of randomness of turbulence
Authors:
Shijun Liao,
Shijie Qin
Abstract:
Randomness is one of the most important characteristics of turbulence, but its origin remains an open question. By means of a ``thought experiment'' via several clean numerical experiments based on the Navier-Stokes equations for two-dimensional turbulent Kolmogorov flow, we reveal a new phenomenon, which we call the ``noise-expansion cascade'' whereby all micro-level noises/disturbances at differ…
▽ More
Randomness is one of the most important characteristics of turbulence, but its origin remains an open question. By means of a ``thought experiment'' via several clean numerical experiments based on the Navier-Stokes equations for two-dimensional turbulent Kolmogorov flow, we reveal a new phenomenon, which we call the ``noise-expansion cascade'' whereby all micro-level noises/disturbances at different orders of magnitudes in the initial condition of Navier-Stokes equations enlarge consistently, say, one by one like an inverse cascade, to macro-level. More importantly, each noise/disturbance input may greatly change the macro-level characteristics and statistics of the resulting turbulence, clearly indicating that micro-level noise/disturbance might have great influence on macro-level characteristics and statistics of turbulence. Besides, the noise-expansion cascade closely connects randomness of micro-level noise/disturbance and macro-level disorder of turbulence, thus revealing an origin of randomness of turbulence. This also highly suggests that unavoidable thermal fluctuations must be considered when simulating turbulence, even if such fluctuations are several orders of magnitudes smaller than other external environmental disturbances. Hopefully, the ``noise-expansion cascade'' as a fundamental property of the NS equations could greatly deepen our understandings about turbulence, and besides is helpful for attacking the fourth millennium problem posed by Clay Mathematics Institute in 2000.
△ Less
Submitted 4 April, 2025; v1 submitted 18 October, 2024;
originally announced October 2024.
-
Identifying supermassive black hole recoil in elliptical galaxies
Authors:
Alexander Rawlings,
Atte Keitaanranta,
Max Mattero,
Sonja Soininen,
Ruby J. Wright,
Noa Kallioinen,
Shihong Liao,
Antti Rantala,
Peter H. Johansson,
Thorsten Naab,
Dimitrios Irodotou
Abstract:
We study stellar core growth in simulations of merging massive ($M_\star>10^{11}\,\mathrm{M}_\odot$) elliptical galaxies by a supermassive black hole (SMBH) displaced by gravitational wave induced recoil velocity. With controlled, dense sampling of the SMBH recoil velocity, we find the core radius originally formed by SMBH binary scouring can grow by a factor of 2-3 when the recoil velocity exceed…
▽ More
We study stellar core growth in simulations of merging massive ($M_\star>10^{11}\,\mathrm{M}_\odot$) elliptical galaxies by a supermassive black hole (SMBH) displaced by gravitational wave induced recoil velocity. With controlled, dense sampling of the SMBH recoil velocity, we find the core radius originally formed by SMBH binary scouring can grow by a factor of 2-3 when the recoil velocity exceeds $\sim50$ per cent of the central escape velocity, and the mass deficit grows by up to a factor of $\sim4$. Using Bayesian inference we predict the distribution of stellar core sizes formed through this process to peak at $\sim1\,\mathrm{kpc}$. An orbital decomposition of stellar particles within the core reveals that radial orbits dominate over tube orbits when the recoil velocity exceeds the velocity dispersion of the core, whereas tube orbits dominate for the lowest recoil kicks. A change in orbital structure is reflected in the anisotropy parameter, with a central tangential bias present only for recoil velocities less than the local stellar velocity dispersion. Emulating current integral field unit observations of the stellar line-of-sight velocity distribution, we uncover a distinct signature in the Gauss-Hermite symmetric deviation coefficient $h_4$ that uniquely constrains the core size due to binary scouring. This signature is insensitive to the later evolution of the stellar mass distribution due to SMBH recoil. Our results provide a novel method to estimate the SMBH recoil magnitude from observations of local elliptical galaxies, and implies these galaxies primarily experienced recoil velocities less than the stellar velocity dispersion of the core.
△ Less
Submitted 26 February, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Scaling Wearable Foundation Models
Authors:
Girish Narayanswamy,
Xin Liu,
Kumar Ayush,
Yuzhe Yang,
Xuhai Xu,
Shun Liao,
Jake Garrison,
Shyam Tailor,
Jake Sunshine,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Jiening Zhan,
Mark Malhotra,
Shwetak Patel,
Samy Abdel-Ghaffar,
Daniel McDuff
Abstract:
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre…
▽ More
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Scalable Multi-Domain Adaptation of Language Models using Modular Experts
Authors:
Peter Schafhalter,
Shun Liao,
Yanqi Zhou,
Chih-Kuan Yeh,
Arun Kandoor,
James Laudon
Abstract:
Domain-specific adaptation is critical to maximizing the performance of pre-trained language models (PLMs) on one or multiple targeted tasks, especially under resource-constrained use cases, such as edge devices. However, existing methods often struggle to balance domain-specific performance, retention of general knowledge, and efficiency for training and inference. To address these challenges, we…
▽ More
Domain-specific adaptation is critical to maximizing the performance of pre-trained language models (PLMs) on one or multiple targeted tasks, especially under resource-constrained use cases, such as edge devices. However, existing methods often struggle to balance domain-specific performance, retention of general knowledge, and efficiency for training and inference. To address these challenges, we propose Modular Domain Experts (MoDE). MoDE is a mixture-of-experts architecture that augments a general PLMs with modular, domain-specialized experts. These experts are trained independently and composed together via a lightweight training process. In contrast to standard low-rank adaptation methods, each MoDE expert consists of several transformer layers which scale better with more training examples and larger parameter counts. Our evaluation demonstrates that MoDE achieves comparable target performances to full parameter fine-tuning while achieving 1.65% better retention performance. Moreover, MoDE's architecture enables flexible sharding configurations and improves training speeds by up to 38% over state-of-the-art distributed training configurations.
△ Less
Submitted 24 October, 2024; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Analysis of Gaia Data Release 3 Parallax bias in the Galactic plane
Authors:
Ye Ding,
Shilong Liao,
Qiqi Wu,
Zhaoxiang Qi,
Zhenghong Tang
Abstract:
The systematic errors are inevitable in Gaia published astrometric data. Lindegren et al. (L21) proposed a global recipe to correct for the GEDR3 parallax zero point offset, which did not consider the Galactic plane. The applicability of their correction model to the Galactic plane remains uncertain. We attempt to have an independent investigation into the sample dependence of the L21 correction,…
▽ More
The systematic errors are inevitable in Gaia published astrometric data. Lindegren et al. (L21) proposed a global recipe to correct for the GEDR3 parallax zero point offset, which did not consider the Galactic plane. The applicability of their correction model to the Galactic plane remains uncertain. We attempt to have an independent investigation into the sample dependence of the L21 correction, and its applicability to the Galactic plane. We collect various samples, including quasars, binaries, and sources with parallaxes from other surveys or methods, to validate the L21 correction, especially in the Galactic plane. We conclude that the L21 correction exhibits sample dependence, and does not apply effectively to the Galactic plane. We present a new parallax bias correction applying to the Galactic plane, offering improvements over the existing L21 correction. The correction difference between L21 and this work can go up to 0.01 mas within certain ranges of magnitude and colour. This work provides an additional recipe for users of Gaia parallaxes, especially for sources located near the Galactic plane.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Prospects for detecting cosmic filaments in Lyman-alpha emission across redshifts $z=2-5$
Authors:
Yizhou Liu,
Liang Gao,
Shihong Liao,
Kai Zhu
Abstract:
The standard $\rm Λ$CDM cosmological model predicts that a large amount of diffuse neutral hydrogen distributes in cosmic filaments, which could be mapped through Lyman-alpha (Ly$α$) emission observations. We use the hydrodynamical simulation Illustris-TNG50 to investigate the evolution of surface brightness and detectability of neutral hydrogen in cosmic filaments across redshifts $z=2-5$. While…
▽ More
The standard $\rm Λ$CDM cosmological model predicts that a large amount of diffuse neutral hydrogen distributes in cosmic filaments, which could be mapped through Lyman-alpha (Ly$α$) emission observations. We use the hydrodynamical simulation Illustris-TNG50 to investigate the evolution of surface brightness and detectability of neutral hydrogen in cosmic filaments across redshifts $z=2-5$. While the HI column density of cosmic filaments decreases with redshift, due to the rising temperature with cosmic time in filaments, the surface brightness of Ly$α$ emission in filaments is brighter at lower redshifts, suggesting that the detection of cosmic filaments is more feasible at lower redshifts. However, most of the Ly$α$ emission from cosmic filaments is around $10^{-21}$ $\rm erg\ s^{-1}cm^{-2}arsec^{-2}$, making it extremely challenging to detect with current observational instruments. We further generate mock images using the Multi-Unit Spectroscopic Explorer (MUSE) spectrograph installed on the Very Large Telescope (VLT) and a MUSE-like spectrograph on the upcoming Extremely Large Telescope (ELT). Our finding indicates that while the VLT can only detect filamentary structures made of dense gas in galactic centers, the ELT is expected to reveal much finer filamentary structures from diffuse neutral hydrogen outside of galaxies. Compared to the VLT, both the number density and the longest length of filaments are greatly boosted with the ELT. Hence the forthcoming ELT is highly promising to provide a clearer view of cosmic filaments in Ly$α$ emission.
△ Less
Submitted 3 April, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks
Authors:
Yuxin Dong,
Jianhua Yao,
Jiajing Wang,
Yingbin Liang,
Shuhan Liao,
Minheng Xiao
Abstract:
Financial fraud refers to the act of obtaining financial benefits through dishonest means. Such behavior not only disrupts the order of the financial market but also harms economic and social development and breeds other illegal and criminal activities. With the popularization of the internet and online payment methods, many fraudulent activities and money laundering behaviors in life have shifted…
▽ More
Financial fraud refers to the act of obtaining financial benefits through dishonest means. Such behavior not only disrupts the order of the financial market but also harms economic and social development and breeds other illegal and criminal activities. With the popularization of the internet and online payment methods, many fraudulent activities and money laundering behaviors in life have shifted from offline to online, posing a great challenge to regulatory authorities. How to efficiently detect these financial fraud activities has become an urgent issue that needs to be resolved. Graph neural networks are a type of deep learning model that can utilize the interactive relationships within graph structures, and they have been widely applied in the field of fraud detection. However, there are still some issues. First, fraudulent activities only account for a very small part of transaction transfers, leading to an inevitable problem of label imbalance in fraud detection. At the same time, fraudsters often disguise their behavior, which can have a negative impact on the final prediction results. In addition, existing research has overlooked the importance of balancing neighbor information and central node information. For example, when the central node has too many neighbors, the features of the central node itself are often neglected. Finally, fraud activities and patterns are constantly changing over time, so considering the dynamic evolution of graph edge relationships is also very important.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
Authors:
Sanoojan Baliah,
Qinliang Lin,
Shengcai Liao,
Xiaodan Liang,
Muhammad Haris Khan
Abstract:
Despite promising progress in face swapping task, realistic swapped images remain elusive, often marred by artifacts, particularly in scenarios involving high pose variation, color differences, and occlusion. To address these issues, we propose a novel approach that better harnesses diffusion models for face-swapping by making following core contributions. (a) We propose to re-frame the face-swapp…
▽ More
Despite promising progress in face swapping task, realistic swapped images remain elusive, often marred by artifacts, particularly in scenarios involving high pose variation, color differences, and occlusion. To address these issues, we propose a novel approach that better harnesses diffusion models for face-swapping by making following core contributions. (a) We propose to re-frame the face-swapping task as a self-supervised, train-time inpainting problem, enhancing the identity transfer while blending with the target image. (b) We introduce a multi-step Denoising Diffusion Implicit Model (DDIM) sampling during training, reinforcing identity and perceptual similarities. (c) Third, we introduce CLIP feature disentanglement to extract pose, expression, and lighting information from the target image, improving fidelity. (d) Further, we introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping, with an additional feature of head swapping. Ours can swap hair and even accessories, beyond traditional face swapping. Unlike prior works reliant on multiple off-the-shelf models, ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models. Extensive experiments on FFHQ and CelebA datasets validate the efficacy and robustness of our approach, showcasing high-fidelity, realistic face-swapping with minimal inference time. Our code is available at https://github.com/Sanoojan/REFace.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Authors:
Min Shi,
Fuxiao Liu,
Shihao Wang,
Shijia Liao,
Subhashree Radhakrishnan,
Yilin Zhao,
De-An Huang,
Hongxu Yin,
Karan Sapra,
Yaser Yacoob,
Humphrey Shi,
Bryan Catanzaro,
Andrew Tao,
Jan Kautz,
Zhiding Yu,
Guilin Liu
Abstract:
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vis…
▽ More
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.
△ Less
Submitted 2 March, 2025; v1 submitted 28 August, 2024;
originally announced August 2024.
-
Microsatellite-based real-time quantum key distribution
Authors:
Yang Li,
Wen-Qi Cai,
Ji-Gang Ren,
Chao-Ze Wang,
Meng Yang,
Liang Zhang,
Hui-Ying Wu,
Liang Chang,
Jin-Cai Wu,
Biao Jin,
Hua-Jian Xue,
Xue-Jiao Li,
Hui Liu,
Guang-Wen Yu,
Xue-Ying Tao,
Ting Chen,
Chong-Fei Liu,
Wen-Bin Luo,
Jie Zhou,
Hai-Lin Yong,
Yu-Huai Li,
Feng-Zhi Li,
Cong Jiang,
Hao-Ze Chen,
Chao Wu
, et al. (16 additional authors not shown)
Abstract:
A quantum network provides an infrastructure connecting quantum devices with revolutionary computing, sensing, and communication capabilities. As the best-known application of a quantum network, quantum key distribution (QKD) shares secure keys guaranteed by the laws of quantum mechanics. A quantum satellite constellation offers a solution to facilitate the quantum network on a global scale. The M…
▽ More
A quantum network provides an infrastructure connecting quantum devices with revolutionary computing, sensing, and communication capabilities. As the best-known application of a quantum network, quantum key distribution (QKD) shares secure keys guaranteed by the laws of quantum mechanics. A quantum satellite constellation offers a solution to facilitate the quantum network on a global scale. The Micius satellite has verified the feasibility of satellite quantum communications, however, scaling up quantum satellite constellations is challenging, requiring small lightweight satellites, portable ground stations and real-time secure key exchange. Here we tackle these challenges and report the development of a quantum microsatellite capable of performing space-to-ground QKD using portable ground stations. The quantum microsatellite features a payload weighing approximately 23 kg, while the portable ground station weighs about 100 kg. These weights represent reductions by more than an order and two orders of magnitude, respectively, compared to the Micius satellite. Additionally, we multiplex bidirectional satellite-ground optical communication with quantum communication, enabling key distillation and secure communication in real-time. Using the microsatellite and the portable ground stations, we demonstrate satellite-based QKD with multiple ground stations and achieve the sharing of up to 0.59 million bits of secure keys during a single satellite pass. The compact quantum payload can be readily assembled on existing space stations or small satellites, paving the way for a satellite-constellation-based quantum and classical network for widespread real-life applications.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Apostle--Auriga: Effects of stellar feedback subgrid models on the evolution of angular momentum in disc galaxies
Authors:
Hang Yang,
Shihong Liao,
Azadeh Fattahi,
Carlos S. Frenk,
Liang Gao,
Qi Guo,
Shi Shao,
Lan Wang,
Ruby J. Wright,
Guangquan Zeng
Abstract:
Utilizing the Apostle--Auriga simulations, which start from the same zoom-in initial conditions of Local Group-like systems but run with different galaxy formation subgrid models and hydrodynamic solvers, we study the impact of stellar feedback models on the evolution of angular momentum in disc galaxies. At $z = 0$, Auriga disc galaxies tend to exhibit higher specific angular momenta compared to…
▽ More
Utilizing the Apostle--Auriga simulations, which start from the same zoom-in initial conditions of Local Group-like systems but run with different galaxy formation subgrid models and hydrodynamic solvers, we study the impact of stellar feedback models on the evolution of angular momentum in disc galaxies. At $z = 0$, Auriga disc galaxies tend to exhibit higher specific angular momenta compared to their cross-matched Apostle counterparts. By tracing the evolution history of the Lagrangian mass tracers of the in-situ star particles in the $z = 0$ galaxies, we find that the specific angular momentum distributions of the gas tracers from the two simulations at the halo accretion time are relatively similar. The present-day angular momentum difference is mainly driven by the physical processes occurring inside dark matter haloes, especially galactic fountains. Due to the different subgrid implementations of stellar feedback processes, Auriga galaxies contain a high fraction of gas that has gone through recycled fountain (${\sim} 65$ per cent) which could acquire angular momentum through mixing with the high angular momentum circumgalactic medium (CGM). In Apostle, however, the fraction of gas that has undergone the recycled fountain process is significantly lower (down to ${\sim} 20$ per cent for Milky Way-sized galaxies) and the angular momentum acquisition from the CGM is marginal. As a result, the present-day Auriga galaxies overall have higher specific angular momenta.
△ Less
Submitted 19 October, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Assembly History and Internal Structure of Cluster Cold Dark Matter Haloes
Authors:
Qingxiang Chen,
Shihong Liao,
Jie Wang,
Liang Gao
Abstract:
We use the Phoenix simulations to study the mass assembly history and internal structures of cluster dark matter haloes ($M_{200} \gtrsim 5\times 10^{14} h^{-1}{\rm M}_\odot$). We confirm that cluster haloes grow inside-out, similar to galactic haloes. Major merger events dominate the growth of the internal region and minor mergers/diffuse accretion shape the outskirts. However, compared to galact…
▽ More
We use the Phoenix simulations to study the mass assembly history and internal structures of cluster dark matter haloes ($M_{200} \gtrsim 5\times 10^{14} h^{-1}{\rm M}_\odot$). We confirm that cluster haloes grow inside-out, similar to galactic haloes. Major merger events dominate the growth of the internal region and minor mergers/diffuse accretion shape the outskirts. However, compared to galactic haloes, cluster haloes tend to have a younger and more actively evolving inner region. On average, the majority of mass (> 80%) in the inner region ($R< 0.1 r_{200}$) of Phoenix haloes is accreted after $z = 3$, while for galactic haloes, most mass in the central region has already been accreted before $z=6$. The density profiles of cluster haloes are less stable than those of galactic haloes over different radii. The enclosed mass within $50$ or $150$ kpc of all Phoenix haloes evolves substantially in the past ${\sim} 7$ Gyr, while galactic haloes remained stable during the same period. We suggest that the relatively younger and more active state explains the various observations of cluster haloes, especially in central regions.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Text-Guided Video Masked Autoencoder
Authors:
David Fan,
Jue Wang,
Shuai Liao,
Zhikang Zhang,
Vimal Bhat,
Xinyu Li
Abstract:
Recent video masked autoencoder (MAE) works have designed improved masking algorithms focused on saliency. These works leverage visual cues such as motion to mask the most salient regions. However, the robustness of such visual cues depends on how often input videos match underlying assumptions. On the other hand, natural language description is an information dense representation of video that im…
▽ More
Recent video masked autoencoder (MAE) works have designed improved masking algorithms focused on saliency. These works leverage visual cues such as motion to mask the most salient regions. However, the robustness of such visual cues depends on how often input videos match underlying assumptions. On the other hand, natural language description is an information dense representation of video that implicitly captures saliency without requiring modality-specific assumptions, and has not been explored yet for video MAE. To this end, we introduce a novel text-guided masking algorithm (TGM) that masks the video regions with highest correspondence to paired captions. Without leveraging any explicit visual cues for saliency, our TGM is competitive with state-of-the-art masking algorithms such as motion-guided masking. To further benefit from the semantics of natural language for masked reconstruction, we next introduce a unified framework for joint MAE and masked video-text contrastive learning. We show that across existing masking algorithms, unifying MAE and masked video-text contrastive learning improves downstream performance compared to pure MAE on a variety of video recognition tasks, especially for linear probe. Within this unified framework, our TGM achieves the best relative performance on five action recognition and one egocentric datasets, highlighting the complementary nature of natural language for masked video modeling.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Evidence of electron interaction with an unidentified bosonic mode in superconductor CsCa$_2$Fe$_4$As$_4$F$_2$
Authors:
Peng Li,
Sen Liao,
Zhicheng Wang,
Huaxun Li,
Shiwu Su,
Jiakang Zhang,
Ziyuan Chen,
Zhicheng Jiang,
Zhengtai Liu,
Lexian Yang,
Linwei Huai,
Junfeng He,
Shengtao Cui,
Zhe Sun,
Yajun Yan,
Guanghan Cao,
Dawei Shen,
Juan Jiang,
Donglai Feng
Abstract:
The kink structure in band dispersion usually refers to a certain electron-boson interaction, which is crucial in understanding the pairing in unconventional superconductors. Here we report the evidence of the observation of a kink structure in Fe-based superconductor CsCa$_2$Fe$_4$As$_4$F$_2$ using angle-resolved photoemission spectroscopy. The kink shows an orbital selective and momentum depende…
▽ More
The kink structure in band dispersion usually refers to a certain electron-boson interaction, which is crucial in understanding the pairing in unconventional superconductors. Here we report the evidence of the observation of a kink structure in Fe-based superconductor CsCa$_2$Fe$_4$As$_4$F$_2$ using angle-resolved photoemission spectroscopy. The kink shows an orbital selective and momentum dependent behavior, which is located at 15 meV below Fermi level along the Gamma-M direction at the band with dxz orbital character and vanishes when approaching the Gamma-X direction, correlated with a slight decrease of the superconducting gap. Most importantly, this kink structure disappears when the superconducting gap closes, indicating that the corresponding bosonic mode (9 meV) is closely related to superconductivity. However, the origin of this mode remains unidentified, since it cannot be related to phonons or the spin resonance mode (15 meV) observed by inelastic neutron scattering. The behavior of this mode is rather unique and challenges our present understanding of the superconducting paring mechanism of the bilayer FeAs-based superconductors.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
RDP: Ranked Differential Privacy for Facial Feature Protection in Multiscale Sparsified Subspace
Authors:
Lu Ou,
Shaolin Liao,
Shihui Gao,
Guandong Huang,
Zheng Qi
Abstract:
With the widespread sharing of personal face images in applications' public databases, face recognition systems faces real threat of being breached by potential adversaries who are able to access users' face images and use them to intrude the face recognition systems. In this paper, we propose a novel privacy protection method in the multiscale sparsified feature subspaces to protect sensitive fac…
▽ More
With the widespread sharing of personal face images in applications' public databases, face recognition systems faces real threat of being breached by potential adversaries who are able to access users' face images and use them to intrude the face recognition systems. In this paper, we propose a novel privacy protection method in the multiscale sparsified feature subspaces to protect sensitive facial features, by taking care of the influence or weight ranked feature coefficients on the privacy budget, named "Ranked Differential Privacy (RDP)". After the multiscale feature decomposition, the lightweight Laplacian noise is added to the dimension-reduced sparsified feature coefficients according to the geometric superposition method. Then, we rigorously prove that the RDP satisfies Differential Privacy. After that, the nonlinear Lagrange Multiplier (LM) method is formulated for the constraint optimization problem of maximizing the utility of the visualization quality protected face images with sanitizing noise, under a given facial features privacy budget. Then, two methods are proposed to solve the nonlinear LM problem and obtain the optimal noise scale parameters: 1) the analytical Normalization Approximation (NA) method with identical average noise scale parameter for real-time online applications; and 2) the LM optimization Gradient Descent (LMGD) numerical method to obtain the nonlinear solution through iterative updating for more accurate offline applications. Experimental results on two real-world datasets show that our proposed RDP outperforms other state-of-the-art methods: at a privacy budget of 0.2, the PSNR (Peak Signal-to-Noise Ratio) of the RDP is about ~10 dB higher than (10 times as high as) the highest PSNR of all compared methods.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Utility of High-Order Scheme for Unsteady Flow Simulations: Comparison with Second-Order Tool
Authors:
Peng Jiang,
Yichen Huang,
Yong Cao,
Shijun Liao,
Bin Xie
Abstract:
The objective of this work is to investigate the utility and effectiveness of the high-order scheme for simulating unsteady turbulent flows. To achieve it, the studies were conducted from two perspectives: (i) the ability of different numerical schemes for turbulence problems under the same set of meshes; and (ii) the accuracy and stability of higher-order schemes for solving turbulence statistics…
▽ More
The objective of this work is to investigate the utility and effectiveness of the high-order scheme for simulating unsteady turbulent flows. To achieve it, the studies were conducted from two perspectives: (i) the ability of different numerical schemes for turbulence problems under the same set of meshes; and (ii) the accuracy and stability of higher-order schemes for solving turbulence statistics for different mesh types (hexahedral, tetrahedral, and polyhedral cells). The simulations employ the third-order scheme for spatial discretization of the governing equations, while a widely-used second-order solver, namely pisoFoam, was employed for comparison. This study considers the canonical cases of the Taylor-Green vortex (TGV) problem at Re=100, 1600 and flow past a sphere at Re=10000 to address the aforementioned two key issues. For the TGV case, the high-order model significantly improves the numerical accuracy with convergence rates and reduces the numerical dissipation of nearly 1/10 of pisoFoam. In the latter case, the high-order scheme with large-eddy simulation (LES) accurately predicts the vortex structures and the flow instability, regardless of grid type. However, pisoFoam is found to be sensitive to mesh types, which results in numerous non-physical structures in the flow field due to numerical noise rather than flow physics, particularly for tetrahedral cells. Furthermore, for the typical low- and high-order flow statistics, the numerical results predicted by the present model show better agreement with the reference data and have less dependence on the type of grids compared with the conventional scheme. In addition, the obtained energy spectrum by the high-order solver accurately captures the Kelvin-Helmholtz (K-H) instability and the vortex shedding frequency, while these important features are less pronounced by the traditional low-order model.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.