Search | arXiv e-print repository

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Authors: Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue

Abstract: Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a sin… ▽ More Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a single-stream speech codec that decomposes speech into two complementary token types: low-bitrate semantic tokens for linguistic content and fixed-length global tokens for speaker attributes. This disentangled representation, combined with the Qwen2.5 LLM and a chain-of-thought (CoT) generation approach, enables both coarse-grained control (e.g., gender, speaking style) and fine-grained adjustments (e.g., precise pitch values, speaking rate). To facilitate research in controllable TTS, we introduce VoxBox, a meticulously curated 100,000-hour dataset with comprehensive attribute annotations. Extensive experiments demonstrate that Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis. Source code, pre-trained models, and audio samples are available at https://github.com/SparkAudio/Spark-TTS. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: Submitted to ACL 2025

arXiv:2502.17879 [pdf]

Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification

Authors: Rong Liu, Junye Liang, Jiaqi Yang, Jiang He, Peng Zhu

Abstract: Due to the difficulty of obtaining labeled data for hyperspectral images (HSIs), cross-scene classification has emerged as a widely adopted approach in the remote sensing community. It involves training a model using labeled data from a source domain (SD) and unlabeled data from a target domain (TD), followed by inferencing on the TD. However, variations in the reflectance spectrum of the same obj… ▽ More Due to the difficulty of obtaining labeled data for hyperspectral images (HSIs), cross-scene classification has emerged as a widely adopted approach in the remote sensing community. It involves training a model using labeled data from a source domain (SD) and unlabeled data from a target domain (TD), followed by inferencing on the TD. However, variations in the reflectance spectrum of the same object between the SD and the TD, as well as differences in the feature distribution of the same land cover class, pose significant challenges to the performance of cross-scene classification. To address this issue, we propose a dual classification head self-training network (DHSNet). This method aligns class-wise features across domains, ensuring that the trained classifier can accurately classify TD data of different classes. We introduce a dual classification head self-training strategy for the first time in the cross-scene HSI classification field. The proposed approach mitigates domain gap while preventing the accumulation of incorrect pseudo-labels in the model. Additionally, we incorporate a novel central feature attention mechanism to enhance the model's capacity to learn scene-invariant features across domains. Experimental results on three cross-scene HSI datasets demonstrate that the proposed DHSNET significantly outperforms other state-of-the-art approaches. The code for DHSNet will be available at https://github.com/liurongwhm. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.17141 [pdf, other]

The Starburst Acceleration of High-Velocity Clouds in the Galactic Center

Authors: Mengfei Zhang, Miao Li, Peixin Zhu

Abstract: High-velocity clouds (HVCs) in the Galactic center have garnered significant attention due to their mysterious formation, potentially linked to starburst events or supermassive black hole activity in the region. However, it remains challenging to explain the observed column density and velocity distribution of HVCs. The discovery of high-velocity molecular clouds (HVMCs), which are denser and more… ▽ More High-velocity clouds (HVCs) in the Galactic center have garnered significant attention due to their mysterious formation, potentially linked to starburst events or supermassive black hole activity in the region. However, it remains challenging to explain the observed column density and velocity distribution of HVCs. The discovery of high-velocity molecular clouds (HVMCs), which are denser and more massive, adds to this complexity. To address this, we conduct three-dimensional numerical simulations to explore the origin and magneto-hydrodynamic evolution of HVCs in the context of a starburst in the Galactic center. By incorporating magnetic fields and an initial tangential velocity for the clouds, our simulation results align with the observed properties of HVCs, supporting the notion that these clouds can originate from a starburst process. In addition, ~5% of the total mass of initial clouds can survive after 3.5 Myr, as a result, the following star formation will be more efficient than a feedback process that destroys all cool clouds. △ Less

Submitted 24 February, 2025; originally announced February 2025.

Comments: 15 pages, 6 figures, accepted by ApJ

arXiv:2502.16236 [pdf, ps, other]

doi 10.1063/5.0267186

Imaging the photochemical dynamics of cyclobutanone with MeV ultrafast electron diffraction

Authors: Tianyu Wang, Hui Jiang, Cheng Jin, Xiao Zou, Pengfei Zhu, Tao Jiang, Feng He, Dao Xiang

Abstract: We study the photoinduced chemical dynamics of cyclobutanone upon excitation at 200 nm to the 3s Rydberg state using MeV ultrafast electron diffraction (UED). We observe both the elastic scattering signal, which contains information about the structural dynamics, and the inelastic scattering signal, which encodes information about the electronic state. Our results suggest a sub-picosecond timescal… ▽ More We study the photoinduced chemical dynamics of cyclobutanone upon excitation at 200 nm to the 3s Rydberg state using MeV ultrafast electron diffraction (UED). We observe both the elastic scattering signal, which contains information about the structural dynamics, and the inelastic scattering signal, which encodes information about the electronic state. Our results suggest a sub-picosecond timescale for the photodissociation dynamics, and an excited state lifetime of about 230 femtoseconds. The dissociation is found to be dominated by the C3 channel where cyclopropane and CO are produced. The branching ratio of the C3 channel to the C2 channel where ethene and ketene are produced, is estimated to be approximately 5:3. Our data suggest that the C3 and C2 channels account for approximately 80% of the photoproducts, with the remaining 20% exhibiting ring-opened structures. It is found that the timescale associated with the dissociation process in the C2 channel is shorter compared to that in the C3 channel. Leveraging the enhanced temporal resolution of MeV UED, our results provide a real-time mapping of the nuclear wavepacket dynamics, capturing the complete photochemical dynamics from S2 minimum through the S1/S0 conical intersection, and finally to the dissociation. Our experimental results provide new insights into the Norrish Type I reaction and can be used to benchmark non-adiabatic dynamics simulations. △ Less

Submitted 22 February, 2025; originally announced February 2025.

Journal ref: J. Chem. Phys. 162, 184201 (2025)

arXiv:2502.14506 [pdf, other]

Enhanced dynamo drive for the sawtooth relaxation process due to non-uniform resistivity distribution in a reversed field pinch

Authors: Wentan Yan, Ping Zhu, Hong Li, Wandong Liu, Bing Luo, Haolong Li

Abstract: In this work, we use the three-dimensional resistive MHD code NIMROD to investigate the impact of resistivity inhomogeneity on the sawtooth process of an reversed field pinch (RFP) plasma. The simulation employs a non-uniform resistivity profile similar to experiments, which monotonically increases from the core to the edge as the temperature decreases. The resistivity inhomogeneity introduces an… ▽ More In this work, we use the three-dimensional resistive MHD code NIMROD to investigate the impact of resistivity inhomogeneity on the sawtooth process of an reversed field pinch (RFP) plasma. The simulation employs a non-uniform resistivity profile similar to experiments, which monotonically increases from the core to the edge as the temperature decreases. The resistivity inhomogeneity introduces an additional electric field in the plasma, which accelerates the inward diffusion of magnetic flux and changing the self sustained reversal state, hence significantly enhances the dynamo effect and the sawtooth process in the RFP plasma. △ Less

Submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.14332 [pdf, other]

A Collaborative Jade Recognition System for Mobile Devices Based on Lightweight and Large Models

Authors: Zhenyu Wang, Wenjia Li, Pengyu Zhu

Abstract: With the widespread adoption and development of mobile devices, vision-based recognition applications have become a hot topic in research. Jade, as an important cultural heritage and artistic item, has significant applications in fields such as jewelry identification and cultural relic preservation. However, existing jade recognition systems still face challenges in mobile implementation, such as… ▽ More With the widespread adoption and development of mobile devices, vision-based recognition applications have become a hot topic in research. Jade, as an important cultural heritage and artistic item, has significant applications in fields such as jewelry identification and cultural relic preservation. However, existing jade recognition systems still face challenges in mobile implementation, such as limited computing resources, real-time requirements, and accuracy issues. To address these challenges, this paper proposes a jade recognition system based on size model collaboration, aiming to achieve efficient and accurate jade identification using mobile devices such as smartphones.First, we design a size model based on multi-scale image processing, extracting key visual information by analyzing jade's dimensions, shapes, and surface textures. Then, a collaborative multi-model classification framework is built by combining deep learning and traditional computer vision algorithms. This framework can effectively select and adjust models based on different jade characteristics, providing high accuracy results across various environments and devices.Experimental results show that the proposed system can provide high recognition accuracy and fast processing time on mobile devices, while consuming relatively low computational resources. The system not only holds great application potential but also provides new ideas and technical support for the intelligent development of jade identification. △ Less

Submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.13546 [pdf, other]

Power dependence of density limit due to plasma-wall interaction in a burning plasma

Authors: Jiaxing Liu, Ping Zhu, Dominique Franck Escande

Abstract: The density limit is one of the major obstacles to achieving the desired fusion performance in tokamaks. However, the underlying physics mechanism for its recently observed power dependence in experiments has not been well understood or predicted in theory. In this work, the power dependent scalings of density limit are obtained based on the plasma-wall self-organization model [D.F. Escande 2022 N… ▽ More The density limit is one of the major obstacles to achieving the desired fusion performance in tokamaks. However, the underlying physics mechanism for its recently observed power dependence in experiments has not been well understood or predicted in theory. In this work, the power dependent scalings of density limit are obtained based on the plasma-wall self-organization model [D.F. Escande 2022 NF], which are able to match the power dependence of density limits in multiple tokamak devices. The key factors influencing the power dependence are found to be the plasma-wall sputtering and the particle confinement time. The effects of non-sputtered impurities and fusion products are further evaluated. This PWSO-density limit model is then extended to the burning plasma regime and used to predict the conditions for entering burning plasma. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: 17 pages, 7 figures

arXiv:2502.12575 [pdf, other]

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

Authors: Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su

Abstract: As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically… ▽ More As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically Encrypted Multi-Backdoor Implantation Attack}. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits. To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly. Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks. Experimental results across multiple datasets demonstrate that our method achieves an attack success rate nearing 100\% while maintaining a detection rate of 0\%, illustrating its effectiveness in evading safety audits. Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats. Code and data are available at https://github.com/whfeLingYu/DemonAgent. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.11370 [pdf, other]

HI-GVF: Shared Control based on Human-Influenced Guiding Vector Fields for Human-multi-robot Cooperation

Authors: Pengming Zhu, Zongtan Zhou, Weijia Yao, Wei Dai, Zhiwen Zeng, Huimin Lu

Abstract: Human-multi-robot shared control leverages human decision-making and robotic autonomy to enhance human-robot collaboration. While widely studied, existing systems often adopt a leader-follower model, limiting robot autonomy to some extent. Besides, a human is required to directly participate in the motion control of robots through teleoperation, which significantly burdens the operator. To allevia… ▽ More Human-multi-robot shared control leverages human decision-making and robotic autonomy to enhance human-robot collaboration. While widely studied, existing systems often adopt a leader-follower model, limiting robot autonomy to some extent. Besides, a human is required to directly participate in the motion control of robots through teleoperation, which significantly burdens the operator. To alleviate these two issues, we propose a layered shared control computing framework using human-influenced guiding vector fields (HI-GVF) for human-robot collaboration. HI-GVF guides the multi-robot system along a desired path specified by the human. Then, an intention field is designed to merge the human and robot intentions, accelerating the propagation of the human intention within the multi-robot system. Moreover, we give the stability analysis of the proposed model and use collision avoidance based on safety barrier certificates to fine-tune the velocity. Eventually, considering the firefighting task as an example scenario, we conduct simulations and experiments using multiple human-robot interfaces (brain-computer interface, myoelectric wristband, eye-tracking), and the results demonstrate that our proposed approach boosts the effectiveness and performance of the task. △ Less

Submitted 16 February, 2025; originally announced February 2025.

arXiv:2502.05170 [pdf, other]

Observation of a dynamic magneto-chiral instability in photoexcited tellurium

Authors: Yijing Huang, Nick Abboud, Yinchuan Lv, Penghao Zhu, Azel Murzabekova, Changjun Lee, Emma A. Pappas, Dominic Petruzzi, Jason Y. Yan, Dipanjan Chauduri, Peter Abbamonte, Daniel P. Shoemaker, Rafael M. Fernandes, Jorge Noronha, Fahad Mahmood

Abstract: In a system of charged chiral fermions driven out of equilibrium, an electric current parallel to the magnetic field can generate a dynamic instability by which electromagnetic waves become amplified. Whether a similar instability can occur in chiral solid-state systems remains an open question. Using time-domain terahertz (THz) emission spectroscopy, we detect signatures of what we dub a ``dynami… ▽ More In a system of charged chiral fermions driven out of equilibrium, an electric current parallel to the magnetic field can generate a dynamic instability by which electromagnetic waves become amplified. Whether a similar instability can occur in chiral solid-state systems remains an open question. Using time-domain terahertz (THz) emission spectroscopy, we detect signatures of what we dub a ``dynamic magneto-chiral instability" in elemental tellurium, a structurally chiral crystal. Upon transient photoexcitation in a moderate external magnetic field, tellurium emits THz radiation consisting of coherent modes that amplify over time. An explanation for this amplification is proposed using a theoretical model based on a dynamic instability of electromagnetic waves interacting with infrared-active oscillators of impurity acceptor states in tellurium to form an amplifying polariton. Our work not only uncovers the presence of a magneto-chiral instability but also highlights its promise for THz-wave amplification in chiral materials. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: Supplementary Information (SI) available as a PDF in the TeX source

arXiv:2502.02690 [pdf, ps, other]

Controllable Video Generation with Provable Disentanglement

Authors: Yifan Shen, Peiyuan Zhu, Zijian Li, Shaoan Xie, Zeyu Tang, Namrata Deka, Zongfang Liu, Guangyi Chen, Kun Zhang

Abstract: Controllable video generation remains a significant challenge, despite recent advances in generating high-quality and consistent videos. Most existing methods for controlling video generation treat the video as a whole, neglecting intricate fine-grained spatiotemporal relationships, which limits both control precision and efficiency. In this paper, we propose Controllable Video Generative Adversar… ▽ More Controllable video generation remains a significant challenge, despite recent advances in generating high-quality and consistent videos. Most existing methods for controlling video generation treat the video as a whole, neglecting intricate fine-grained spatiotemporal relationships, which limits both control precision and efficiency. In this paper, we propose Controllable Video Generative Adversarial Networks (CoVoGAN) to disentangle the video concepts, thus facilitating efficient and independent control over individual concepts. Specifically, following the minimal change principle, we first disentangle static and dynamic latent variables. We then leverage the sufficient change property to achieve component-wise identifiability of dynamic latent variables, enabling disentangled control of video generation. To establish the theoretical foundation, we provide a rigorous analysis demonstrating the identifiability of our approach. Building on these theoretical insights, we design a Temporal Transition Module to disentangle latent dynamics. To enforce the minimal change principle and sufficient change property, we minimize the dimensionality of latent dynamic variables and impose temporal conditional independence. To validate our approach, we integrate this module as a plug-in for GANs. Extensive qualitative and quantitative experiments on various video generation benchmarks demonstrate that our method significantly improves generation quality and controllability across diverse real-world scenarios. △ Less

Submitted 24 June, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

arXiv:2501.16767 [pdf, other]

Target-driven Self-Distillation for Partial Observed Trajectories Forecasting

Authors: Pengfei Zhu, Peng Shu, Mengshi Qi, Liang Liu, Huadong Ma

Abstract: Accurate prediction of future trajectories of traffic agents is essential for ensuring safe autonomous driving. However, partially observed trajectories can significantly degrade the performance of even state-of-the-art models. Previous approaches often rely on knowledge distillation to transfer features from fully observed trajectories to partially observed ones. This involves firstly training a… ▽ More Accurate prediction of future trajectories of traffic agents is essential for ensuring safe autonomous driving. However, partially observed trajectories can significantly degrade the performance of even state-of-the-art models. Previous approaches often rely on knowledge distillation to transfer features from fully observed trajectories to partially observed ones. This involves firstly training a fully observed model and then using a distillation process to create the final model. While effective, they require multi-stage training, making the training process very expensive. Moreover, knowledge distillation can lead to a performance degradation of the model. In this paper, we introduce a Target-driven Self-Distillation method (TSD) for motion forecasting. Our method leverages predicted accurate targets to guide the model in making predictions under partial observation conditions. By employing self-distillation, the model learns from the feature distributions of both fully observed and partially observed trajectories during a single end-to-end training process. This enhances the model's ability to predict motion accurately in both fully observed and partially observed scenarios. We evaluate our method on multiple datasets and state-of-the-art motion forecasting models. Extensive experimental results demonstrate that our approach achieves significant performance improvements in both settings. To facilitate further research, we will release our code and model checkpoints. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.16459 [pdf, other]

doi 10.3847/1538-4357/adaf1c

SQuIGG$\vec{L}$E: Observational Evidence of Low Ongoing Star Formation Rates in Gas-Rich Post-Starburst Galaxies

Authors: Pengpei Zhu, Katherine A. Suess, Mariska Kriek, David J. Setton, Rachel Bezanson, Vincenzo Donofrio, Robert Feldmann, Andy D. Goulding, Jenny E. Greene, Desika Narayanan, Justin Spilker

Abstract: ALMA observations have shown that candidate "post-starburst" galaxies (PSBs) at z$\sim$0.6 can retain significant molecular gas reservoirs. These results would imply that -- unlike many model predictions -- galaxies can shut down their star formation before their cold gas reservoirs are depleted. However, these studies inferred star formation rates (SFRs) either from [O II] line fluxes or from spe… ▽ More ALMA observations have shown that candidate "post-starburst" galaxies (PSBs) at z$\sim$0.6 can retain significant molecular gas reservoirs. These results would imply that -- unlike many model predictions -- galaxies can shut down their star formation before their cold gas reservoirs are depleted. However, these studies inferred star formation rates (SFRs) either from [O II] line fluxes or from spectral energy distribution modeling, and could have missed large dust-obscured contributions to the SFRs. In this study, we present Keck/NIRES observations of 13 massive ($\mathrm{M_*}\gtrsim \times 10^{11} \,\, \mathrm{M_\odot}$) PSBs, which allow us to estimate $\mathrm{Hα}$ SFRs in these gas-rich post-starburst galaxies. We confirm the previously inferred low SFRs for the majority of the sample: 11/13 targets show clear $\mathrm{Hα}$ absorption, with minimal infilling indicating dust-corrected SFRs of $<4.1 \,\mathrm{M_\odot\, yr^{-1}}$. These SFRs are notably low given the large $\mathrm{H_2}$ reservoirs ($\sim 1-5 \times 10^{10} \,\, \mathrm{M_\odot}$) present in 5/13 of these galaxies, placing them significantly offset from star-forming galaxies on the Kennicutt-Schmidt relation for star-forming galaxies. The [N II]/H$α$ ratios of all 13 PSBs imply contributions from non-star-forming ionization mechanisms (e.g., AGN, shocks, or hot evolved stars) to their $\mathrm{Hα}$ emission, suggesting that even these low ongoing SFRs may be overestimated. These low $\mathrm{Hα}$ SFRs, dust-corrected using A$_v$ estimates from SED fitting, confirm that these galaxies are very likely quiescent and, thus, that galaxies can quench before their cold gas reservoirs are fully depleted. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: 12 pages, 4 figures, 1 table. Accepted by the Astrophysical Journal (ApJ)

Journal ref: ApJ 981 60 (2025)

arXiv:2501.15045 [pdf, other]

Towards Robust Unsupervised Attention Prediction in Autonomous Driving

Authors: Mengshi Qi, Xiaoyang Bi, Pengfei Zhu, Huadong Ma

Abstract: Robustly predicting attention regions of interest for self-driving systems is crucial for driving safety but presents significant challenges due to the labor-intensive nature of obtaining large-scale attention labels and the domain gap between self-driving scenarios and natural scenes. These challenges are further exacerbated by complex traffic environments, including camera corruption under adver… ▽ More Robustly predicting attention regions of interest for self-driving systems is crucial for driving safety but presents significant challenges due to the labor-intensive nature of obtaining large-scale attention labels and the domain gap between self-driving scenarios and natural scenes. These challenges are further exacerbated by complex traffic environments, including camera corruption under adverse weather, noise interferences, and central bias from long-tail distributions. To address these issues, we propose a robust unsupervised attention prediction method. An Uncertainty Mining Branch refines predictions by analyzing commonalities and differences across multiple pre-trained models on natural scenes, while a Knowledge Embedding Block bridges the domain gap by incorporating driving knowledge to adaptively enhance pseudo-labels. Additionally, we introduce RoboMixup, a novel data augmentation method that improves robustness against corruption through soft attention and dynamic augmentation, and mitigates central bias by integrating random cropping into Mixup as a regularizer. To systematically evaluate robustness in self-driving attention prediction, we introduce the DriverAttention-C benchmark, comprising over 100k frames across three subsets: BDD-A-C, DR(eye)VE-C, and DADA-2000-C. Our method achieves performance equivalent to or surpassing fully supervised state-of-the-art approaches on three public datasets and the proposed robustness benchmark, reducing relative corruption degradation by 58.8% and 52.8%, and improving central bias robustness by 12.4% and 11.4% in KLD and CC metrics, respectively. Code and data are available at https://github.com/zaplm/DriverAttention. △ Less

Submitted 28 January, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.12648 [pdf]

doi 10.1103/PhysRevB.111.L041403

Engineering nonlinear Hall effect in bilayer graphene/black phosphorus heterostructures

Authors: Xing-Guo Ye, Zhen-Tao Zhang, Peng-Fei Zhu, Wen-Zheng Xu, An-Qi Wang, Zhi-Min Liao

Abstract: Two-dimensional van der Waals materials offer a highly tunable platform for generating emergent quantum phenomena through symmetry breaking. Stacking-induced symmetry breaking at interfaces provides an effective method to modulate their electronic properties for functional devices. Here, we strategically stack bilayer graphene with black phosphorus, a low-symmetry semiconductor, to break the symme… ▽ More Two-dimensional van der Waals materials offer a highly tunable platform for generating emergent quantum phenomena through symmetry breaking. Stacking-induced symmetry breaking at interfaces provides an effective method to modulate their electronic properties for functional devices. Here, we strategically stack bilayer graphene with black phosphorus, a low-symmetry semiconductor, to break the symmetries and induce the nonlinear Hall effect (NLHE) that can persist up to room temperature. Intriguingly, it is found the NLHE undergoes sign reversals by varying the electrical displacement field under fixed carrier density. The scaling analysis reveals that the sign reversal of the NLHE is contributed from both the Berry curvature dipole (BCD) and extrinsic scatterings. The displacement field-induced sign reversal of the BCD indicates asymmetric distributions of Berry curvature hot spots across different Fermi pockets in bilayer graphene. Our findings suggest that symmetry engineering of van der Waals heterostructures is promising for room-temperature applications based on nonlinear quantum devices, such as high-frequency rectifiers and wireless charging. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Journal ref: Phys. Rev. B 111, L041403 (2025)

arXiv:2501.04085 [pdf, other]

The Cosmic Evolution Early Release Science Survey (CEERS)

Authors: Steven L. Finkelstein, Micaela B. Bagley, Pablo Arrabal Haro, Mark Dickinson, Henry C. Ferguson, Jeyhan S. Kartaltepe, Dale D. Kocevski, Anton M. Koekemoer, Jennifer M. Lotz, Casey Papovich, Pablo G. Perez-Gonzalez, Nor Pirzkal, Rachel S. Somerville, Jonathan R. Trump, Guang Yang, L. Y. Aaron Yung, Adriano Fontana, Andrea Grazian, Norman A. Grogin, Lisa J. Kewley, Allison Kirkpatrick, Rebecca L. Larson, Laura Pentericci, Swara Ravindranath, Stephen M. Wilkins , et al. (74 additional authors not shown)

Abstract: We present the Cosmic Evolution Early Release Science (CEERS) Survey, a 77.2 hour Director's Discretionary Early Release Science Program. CEERS demonstrates, tests, and validates efficient extragalactic surveys using coordinated, overlapping parallel observations with the JWST instrument suite, including NIRCam and MIRI imaging, NIRSpec low (R~100) and medium (R~1000) resolution spectroscopy, and… ▽ More We present the Cosmic Evolution Early Release Science (CEERS) Survey, a 77.2 hour Director's Discretionary Early Release Science Program. CEERS demonstrates, tests, and validates efficient extragalactic surveys using coordinated, overlapping parallel observations with the JWST instrument suite, including NIRCam and MIRI imaging, NIRSpec low (R~100) and medium (R~1000) resolution spectroscopy, and NIRCam slitless grism (R~1500) spectroscopy. CEERS targets the Hubble Space Telescope-observed region of the Extended Groth Strip (EGS) field, supported by a rich set of multiwavelength data. CEERS facilitated immediate community science in both of the extragalactic core JWST science drivers ``First Light" and ``Galaxy Assembly," including: 1) The discovery and characterization of large samples of galaxies at z >~ 10 from ~90 arcmin^2 of NIRCam imaging, constraining their abundance and physical nature; 2) Deep spectra of >1000 galaxies, including dozens of galaxies at 6<z<10, enabling redshift measurements and constraints on the physical conditions of star-formation and black hole growth via line diagnostics; 3) Quantifying the first bulge, bar and disk structures at z>3; and 4) Characterizing galaxy mid-IR emission with MIRI to study dust-obscured star-formation and supermassive black hole growth at z~1-3. As a legacy product for the community, the CEERS team has provided several data releases, accompanied by detailed notes on the data reduction procedures and notebooks to aid in reproducibility. In addition to an overview of the survey and quality of the data, we provide science highlights from the first two years with CEERS data. △ Less

Submitted 7 January, 2025; originally announced January 2025.

Comments: 38 pages, 13 figures, 6 tables

arXiv:2501.01862 [pdf, other]

doi 10.3847/1538-4357/ad93c9

Glitches and glitching clusters in rotation-powered pulsars

Authors: Pei-Xin Zhu, Xiao-Ping Zheng

Abstract: The study of pulsar glitch phenomena serves as a valuable probe into the dynamic properties of matter under extreme high-density conditions, offering insights into the physics within neutron stars. Providing theoretical explanations for the diverse manifestations observed in different pulsars has proven to be a formidable challenge. By analyzing the distribution of glitch sizes and waiting times,… ▽ More The study of pulsar glitch phenomena serves as a valuable probe into the dynamic properties of matter under extreme high-density conditions, offering insights into the physics within neutron stars. Providing theoretical explanations for the diverse manifestations observed in different pulsars has proven to be a formidable challenge. By analyzing the distribution of glitch sizes and waiting times, along with the evolution of cumulative glitch sizes over time, we have uncovered a long-term clustering phenomenon for pulsar glitches. This perspective allows us to approach the distinct glitch representations in various pulsars from a unified standpoint, connecting the same periodicity of observational data to the randomness. Without relying on specific physical models, we utilized the coefficient of variation to numerically determine optimal clustering numbers and clustering periods for sample pulsars. Our analysis involving 27 pulsars has revealed a clear linear relationship between the glitch cluster period and characteristic age. Of interest, the cumulative distribution of functions of cluster sizes and interval times have the same patterns, which can be synchronously fitted by Gaussian processes. These results may indicate novel understandings of glitches and the resulting processes. △ Less

Submitted 6 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

Journal ref: The Astrophysical Journal, 978:49 (13pp), 2025 January 01

arXiv:2501.01240 [pdf, other]

Asymmetric Reinforcing against Multi-modal Representation Bias

Authors: Xiyuan Gao, Bing Cao, Pengfei Zhu, Nannan Wang, Qinghua Hu

Abstract: The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic modality contributions, the dominance of different modalities may change with the environments, leading to suboptimal performance in multimodal learning. Current me… ▽ More The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic modality contributions, the dominance of different modalities may change with the environments, leading to suboptimal performance in multimodal learning. Current methods mainly enhance weak modalities to balance multimodal representation bias, which inevitably optimizes from a partialmodality perspective, easily leading to performance descending for dominant modalities. To address this problem, we propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM). Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information. Moreover, we provide an in-depth analysis that optimizing certain modalities could cause information loss and prevent leveraging the full advantages of multimodal data. By exploring the dominance and narrowing the contribution gaps between modalities, we have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: Accepted by AAAI 2025

arXiv:2412.20003 [pdf, other]

doi 10.1007/JHEP03(2025)207

Revisiting CMSSM with Non-Universal Gaugino Masses under Current Constraints

Authors: Yabo Dong, Kun Wang, Hailong Yuan, Jingya Zhu, Pengxuan Zhu

Abstract: To address the longstanding tension between the Constrained Minimal Supersymmetric Standard Model (CMSSM) and recent experimental data, we investigate non-universal gaugino masses within an SU(5) Grand Unified Theory (GUT) framework, focusing on the $\tilde{g}$-SUGRA scenario where $\lvert M_{3} \rvert \gg \lvert M_{1} \rvert, \lvert M_{2} \rvert$. This hierarchy enables a heavier gluino, thereby… ▽ More To address the longstanding tension between the Constrained Minimal Supersymmetric Standard Model (CMSSM) and recent experimental data, we investigate non-universal gaugino masses within an SU(5) Grand Unified Theory (GUT) framework, focusing on the $\tilde{g}$-SUGRA scenario where $\lvert M_{3} \rvert \gg \lvert M_{1} \rvert, \lvert M_{2} \rvert$. This hierarchy enables a heavier gluino, thereby evading current experimental bounds on supersymmetric particles. Our analysis reveals that precise Higgs measurements place stringent constraints on the model, requiring $\tanβ\gtrsim 5$ and $ M_{0} \gtrsim 20 \, \tanβ\,\text{GeV}$. Although the $\tilde{g}$-SUGRA scenario can help reconcile the persistent $(g-2)_μ$ anomaly, the Higgs constraints significantly restrict its parameter space, making a large contribution to $(g-2)_μ$ challenging. We also assess the discovery prospects in upcoming dark matter direct detection experiments, including PandaX-xT (200 t.y.), LZ (projected), and XENONnT (20 t.y.), which may not fully cover the viable parameter space. In contrast, future collider experiments$-$such as the High-Luminosity LHC at $3\,\mathrm{ab}^{-1}$ and $\mathrm{CLIC}_{1500}$ at $2.5\,\mathrm{ab}^{-1}$$-$can comprehensively probe the remaining regions. These findings highlight $\tilde{g}$-SUGRA as a promising solution to the CMSSM tension and offer clear, testable predictions for upcoming collider searches. △ Less

Submitted 31 March, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

Comments: 27 pages, 7 figures, 2 tables. Accepted by JHEP

Journal ref: JHEP 03 (2025) 207

arXiv:2412.19743 [pdf, other]

Flavor Physics at CEPC: a General Perspective

Authors: Xiaocong Ai, Wolfgang Altmannshofer, Peter Athron, Xiaozhi Bai, Lorenzo Calibbi, Lu Cao, Yuzhi Che, Chunhui Chen, Ji-Yuan Chen, Long Chen, Mingshui Chen, Shanzhen Chen, Xuan Chen, Shan Cheng, Cheng-Wei Chiang, Andreas Crivellin, Hanhua Cui, Olivier Deschamps, Sébastien Descotes-Genon, Xiaokang Du, Shuangshi Fang, Yu Gao, Li-Sheng Geng, Pablo Goldenzweig, Jiayin Gu , et al. (116 additional authors not shown)

Abstract: We discuss the landscape of flavor physics at the Circular Electron-Positron Collider (CEPC), based on the nominal luminosity outlined in its Technical Design Report. The CEPC is designed to operate in multiple modes to address a variety of tasks. At the $Z$ pole, the expected production of 4 Tera $Z$ bosons will provide unique and highly precise measurements of $Z$ boson couplings, while the subs… ▽ More We discuss the landscape of flavor physics at the Circular Electron-Positron Collider (CEPC), based on the nominal luminosity outlined in its Technical Design Report. The CEPC is designed to operate in multiple modes to address a variety of tasks. At the $Z$ pole, the expected production of 4 Tera $Z$ bosons will provide unique and highly precise measurements of $Z$ boson couplings, while the substantial number of boosted heavy-flavored quarks and leptons produced in clean $Z$ decays will facilitate investigations into their flavor physics with unprecedented precision. We investigate the prospects of measuring various physics benchmarks and discuss their implications for particle theories and phenomenological models. Our studies indicate that, with its highlighted advantages and anticipated excellent detector performance, the CEPC can explore beauty and $τ$ physics in ways that are superior to or complementary with the Belle II and Large-Hadron-Collider-beauty experiments, potentially enabling the detection of new physics at energy scales of 10 TeV and above. This potential also extends to the observation of yet-to-be-discovered rare and exotic processes, as well as testing fundamental principles such as lepton flavor universality, lepton and baryon number conservation, etc., making the CEPC a vibrant platform for flavor physics research. The $WW$ threshold scan, Higgs-factory operation and top-pair productions of the CEPC further enhance its merits in this regard, especially for measuring the Cabibbo-Kobayashi-Maskawa matrix elements, and Flavor-Changing-Neutral-Current physics of Higgs boson and top quarks. We outline the requirements for detector performance and considerations for future development to achieve the anticipated scientific goals. △ Less

Submitted 31 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

arXiv:2412.19015 [pdf, other]

Imperceptible Adversarial Attacks on Point Clouds Guided by Point-to-Surface Field

Authors: Keke Tang, Weiyao Ke, Weilong Peng, Xiaofei Wang, Ziyong Du, Zhize Wu, Peican Zhu, Zhihong Tian

Abstract: Adversarial attacks on point clouds are crucial for assessing and improving the adversarial robustness of 3D deep learning models. Traditional solutions strictly limit point displacement during attacks, making it challenging to balance imperceptibility with adversarial effectiveness. In this paper, we attribute the inadequate imperceptibility of adversarial attacks on point clouds to deviations fr… ▽ More Adversarial attacks on point clouds are crucial for assessing and improving the adversarial robustness of 3D deep learning models. Traditional solutions strictly limit point displacement during attacks, making it challenging to balance imperceptibility with adversarial effectiveness. In this paper, we attribute the inadequate imperceptibility of adversarial attacks on point clouds to deviations from the underlying surface. To address this, we introduce a novel point-to-surface (P2S) field that adjusts adversarial perturbation directions by dragging points back to their original underlying surface. Specifically, we use a denoising network to learn the gradient field of the logarithmic density function encoding the shape's surface, and apply a distance-aware adjustment to perturbation directions during attacks, thereby enhancing imperceptibility. Extensive experiments show that adversarial attacks guided by our P2S field are more imperceptible, outperforming state-of-the-art methods. △ Less

Submitted 25 December, 2024; originally announced December 2024.

Comments: Accepted by ICASSP 2025

MSC Class: 68T07

arXiv:2412.18922 [pdf]

Numerical solutions of resistive finite-pressure magnetohydrodynamic equilibria for non-axisymmetric toroidal plasmas

Authors: Jian Zhang, Ping Zhu, Chris C. Hegna

Abstract: A hybrid spectral/finite-element code is developed to numerically solve the resistive finite-pressure magnetohydrodynamic equilibria without the necessity of postulating nested magnetic flux surfaces in the non-axisymmetric toroidal systems. The adopted approach integrates a hyperbolic parallel damping equation for pressure updating, along with a dynamic resistive relaxation for magnetic field. To… ▽ More A hybrid spectral/finite-element code is developed to numerically solve the resistive finite-pressure magnetohydrodynamic equilibria without the necessity of postulating nested magnetic flux surfaces in the non-axisymmetric toroidal systems. The adopted approach integrates a hyperbolic parallel damping equation for pressure updating, along with a dynamic resistive relaxation for magnetic field. To address the nonaxisymmetry in toroidal geometry, a pseudo flux mapping is employed to relate the axisymmetric computational domain to the physical domain. On the computational mesh, an isoparametric C1-continuous triangular element is utilized to discretize the poloidal plane, which is complemented with a Fourier decomposition in the toroidal direction. The versatility of the code is demonstrated through its application to several different non-axisymmetric toroidal systems, including the inherently three-dimensional equilibria in stellarators, the helical-core equilibrium states in tokamak plasmas, and the quasi-single-helicity states in a reversed-field pinch. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2412.18365 [pdf, other]

Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges

Authors: Meixia He, Peican Zhu, Keke Tang, Yangming Guo

Abstract: Recent studies have shown that Hypergraph Neural Networks (HGNNs) are vulnerable to adversarial attacks. Existing approaches focus on hypergraph modification attacks guided by gradients, overlooking node spanning in the hypergraph and the group identity of hyperedges, thereby resulting in limited attack performance and detectable attacks. In this manuscript, we present a novel framework, i.e., Hyp… ▽ More Recent studies have shown that Hypergraph Neural Networks (HGNNs) are vulnerable to adversarial attacks. Existing approaches focus on hypergraph modification attacks guided by gradients, overlooking node spanning in the hypergraph and the group identity of hyperedges, thereby resulting in limited attack performance and detectable attacks. In this manuscript, we present a novel framework, i.e., Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges (IE-Attack), to tackle these challenges. Initially, utilizing the node spanning in the hypergraph, we propose the elite hyperedges sampler to identify hyperedges to be injected. Subsequently, a node generator utilizing Kernel Density Estimation (KDE) is proposed to generate the homogeneous node with the group identity of hyperedges. Finally, by injecting the homogeneous node into elite hyperedges, IE-Attack improves the attack performance and enhances the imperceptibility of attacks. Extensive experiments are conducted on five authentic datasets to validate the effectiveness of IE-Attack and the corresponding superiority to state-of-the-art methods. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Comments: 9 pages, The 39th Annual AAAI Conference on Artificial Intelligence(2025)

arXiv:2412.18361 [pdf, ps, other]

On a generalized Monge-Ampère equation on closed almost Kähler surfaces

Authors: Ken Wang, Zuyi Zhang, Tao Zheng, Peng Zhu

Abstract: We show the existence and uniqueness of solutions to a generalized Monge-Ampère equation on closed almost Kähler surfaces, where the equation depends only on the underlying almost Kähler structure. As an application, we prove Donaldson's conjecture for tamed almost complex 4-manifolds. We show the existence and uniqueness of solutions to a generalized Monge-Ampère equation on closed almost Kähler surfaces, where the equation depends only on the underlying almost Kähler structure. As an application, we prove Donaldson's conjecture for tamed almost complex 4-manifolds. △ Less

Submitted 2 May, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

MSC Class: 53D35; 53C56; 53C65; 32Q60

arXiv:2412.12149

MHSA: A Multi-scale Hypergraph Network for Mild Cognitive Impairment Detection via Synchronous and Attentive Fusion

Authors: Manman Yuan, Weiming Jia, Xiong Luo, Jiazhen Ye, Peican Zhu, Junlin Li

Abstract: The precise detection of mild cognitive impairment (MCI) is of significant importance in preventing the deterioration of patients in a timely manner. Although hypergraphs have enhanced performance by learning and analyzing brain networks, they often only depend on vector distances between features at a single scale to infer interactions. In this paper, we deal with a more arduous challenge, hyperg… ▽ More The precise detection of mild cognitive impairment (MCI) is of significant importance in preventing the deterioration of patients in a timely manner. Although hypergraphs have enhanced performance by learning and analyzing brain networks, they often only depend on vector distances between features at a single scale to infer interactions. In this paper, we deal with a more arduous challenge, hypergraph modelling with synchronization between brain regions, and design a novel framework, i.e., A Multi-scale Hypergraph Network for MCI Detection via Synchronous and Attentive Fusion (MHSA), to tackle this challenge. Specifically, our approach employs the Phase-Locking Value (PLV) to calculate the phase synchronization relationship in the spectrum domain of regions of interest (ROIs) and designs a multi-scale feature fusion mechanism to integrate dynamic connectivity features of functional magnetic resonance imaging (fMRI) from both the temporal and spectrum domains. To evaluate and optimize the direct contribution of each ROI to phase synchronization in the temporal domain, we structure the PLV coefficients dynamically adjust strategy, and the dynamic hypergraph is modelled based on a comprehensive temporal-spectrum fusion matrix. Experiments on the real-world dataset indicate the effectiveness of our strategy. The code is available at https://github.com/Jia-Weiming/MHSA. △ Less

Submitted 11 January, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

Comments: The submission was made prematurely and will be resubmitted after further development

arXiv:2412.10087 [pdf, other]

Consensus-Based Dynamic Task Allocation for Multi-Robot System Considering Payloads Consumption

Authors: Xuekai Qiu, Pengming Zhu, Yiming Hu, Zhiwen Zeng, Huimin Lu

Abstract: This paper presents a consensus-based payload algorithm (CBPA) to deal with the condition of robots' capability decrease for multi-robot task allocation. During the execution of complex tasks, robots' capabilities could decrease with the consumption of payloads, which causes a problem that the robot coalition would not meet the tasks' requirements in real time. The proposed CBPA is an enhanced ver… ▽ More This paper presents a consensus-based payload algorithm (CBPA) to deal with the condition of robots' capability decrease for multi-robot task allocation. During the execution of complex tasks, robots' capabilities could decrease with the consumption of payloads, which causes a problem that the robot coalition would not meet the tasks' requirements in real time. The proposed CBPA is an enhanced version of the consensus-based bundle algorithm (CBBA) and comprises two primary core phases: the payload bundle construction and consensus phases. In the payload bundle construction phase, CBPA introduces a payload assignment matrix to track the payloads carried by the robots and the demands of multi-robot tasks in real time. Then, robots share their respective payload assignment matrix in the consensus phase. These two phases are iterated to dynamically adjust the number of robots performing multi-robot tasks and the number of tasks each robot performs and obtain conflict-free results to ensure that the robot coalition meets the demand and completes all tasks as quickly as possible. Physical experiment shows that CBPA is appropriate in complex and dynamic scenarios where robots need to collaborate and task requirements are tightly coupled to the robots' payloads. Numerical experiments show that CBPA has higher total task gains than CBBA. △ Less

Submitted 13 December, 2024; originally announced December 2024.

arXiv:2412.02938 [pdf]

doi 10.1103/PhysRevB.110.L201407

Nonlinear spin and orbital Edelstein effect in WTe2

Authors: Xing-Guo Ye, Peng-Fei Zhu, Wen-Zheng Xu, Tong-Yang Zhao, Zhi-Min Liao

Abstract: In materials with spin-momentum locked spin textures, such as Rashba states and topological surface states, the current-induced shift of the Fermi contour in the k space leads to spin polarization, known as the Edelstein effect, which depends linearly on the applied current. However, its nonlinear counterpart has not yet been discovered. Here, we report the observation of the nonlinear Edelstein e… ▽ More In materials with spin-momentum locked spin textures, such as Rashba states and topological surface states, the current-induced shift of the Fermi contour in the k space leads to spin polarization, known as the Edelstein effect, which depends linearly on the applied current. However, its nonlinear counterpart has not yet been discovered. Here, we report the observation of the nonlinear Edelstein effect in few-layer WTe2. Under a current bias, an out-of-plane magnetization is induced in WTe2, which is electrically probed using an Fe3GeTe2 electrode, a van der Waals ferromagnet with perpendicular magnetic anisotropy. Notably, with an applied ac at frequency ω, an induced magnetization with second-harmonic response at frequency 2ω is observed, and its magnitude demonstrates a quadratic dependence on the applied current, characteristic of the nonlinear Edelstein effect. This phenomenon is well explained by the current-induced orbital magnetization via the Berry connection polarizability tensors in WTe2. The orbital degree of freedom plays the primary role in the observed nonlinear Edelstein effect, that is, the nonlinear orbital Edelstein effect. This can, in turn, give rise to a nonlinear spin Edelstein effect through spin-orbit coupling. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 29 pages

Journal ref: Physical Review B 110, L201407 (2024)

arXiv:2412.02491 [pdf]

doi 10.1103/PhysRevB.110.L100409

Facilitating field-free perpendicular magnetization switching with a Berry curvature dipole in a Weyl semimetal

Authors: Dong Li, Xing-Yu Liu, Xing-Guo Ye, Zhen-Cun Pan, Wen-Zheng Xu, Peng-Fei Zhu, An-Qi Wang, Kenji Watanabe, Takashi Taniguchi, Zhi-Min Liao

Abstract: We report the synergy between orbital and spin-orbit torques in WTe2/Fe3GeTe2 heterostructures characterized by a Berry curvature dipole. By applying a current along the a axis in WTe2, we detect an out-of-plane magnetization in the system, which we attribute to nonequilibrium orbital magnetization linked to the Berry curvature dipole based on first-principles calculations, manifesting as the orbi… ▽ More We report the synergy between orbital and spin-orbit torques in WTe2/Fe3GeTe2 heterostructures characterized by a Berry curvature dipole. By applying a current along the a axis in WTe2, we detect an out-of-plane magnetization in the system, which we attribute to nonequilibrium orbital magnetization linked to the Berry curvature dipole based on first-principles calculations, manifesting as the orbital Edelstein effect. This effect generates orbital torques that enable field-free perpendicular magnetization switching. Furthermore, by applying a relatively small current along the a axis and a pulsed current along the b axis in WTe2, we demonstrate controllable field-free magnetization switching of the adjacent Fe3GeTe2 layer, independently manipulating the orbital and spin-orbit torques. Our findings not only enhance the understanding of the collaborative dynamics between these torques but also suggest potential applications in magnetoresistive random-access memory. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 29 pages

Journal ref: Physical Review B 110, L100409 (2024)

arXiv:2412.00701 [pdf]

doi 10.3390/ma17225460

Superconductivity at Pd/Bi$_2$Se$_3$ Interfaces Due to Self-Formed PdBiSe Interlayers

Authors: Kaixuan Fan, Ze Hua, Siyao Gu, Peng Zhu, Guangtong Liu, Hechen Ren, Ruiwen Shao, Zhiwei Wang, Li Lu, Fan Yang

Abstract: Understanding the physical and chemical processes at the interface of metals and topological insulators is crucial for developing the next generation of topological quantum devices. Here we report the discovery of robust superconductivity in Pd/Bi$_2$Se$_3$ bilayers fabricated by sputtering Pd on the surface of Bi$_2$Se$_3$. Through transmission electron microscopy measurements, we identify that t… ▽ More Understanding the physical and chemical processes at the interface of metals and topological insulators is crucial for developing the next generation of topological quantum devices. Here we report the discovery of robust superconductivity in Pd/Bi$_2$Se$_3$ bilayers fabricated by sputtering Pd on the surface of Bi$_2$Se$_3$. Through transmission electron microscopy measurements, we identify that the observed interfacial superconductivity originates from the diffusion of Pd into Bi$_2$Se$_3$. In the diffusion region, Pd chemically reacts with Bi$_2$Se$_3$ and forms a layer of PdBiSe, a known su-perconductor with a bulk transition temperature of 1.5 K. Our work provides a method for in-troducing superconductivity into Bi$_2$Se$_3$, laying the foundation for developing sophisticated Bi$_2$Se$_3$-based topological devices. △ Less

Submitted 1 December, 2024; originally announced December 2024.

Journal ref: Materials 2024, 17(22), 5460

arXiv:2411.13056 [pdf, other]

Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark

Authors: Bing Cao, Quanhao Lu, Jiekang Feng, Qilong Wang, Qinghua Hu, Pengfei Zhu

Abstract: The dynamic imbalance of the fore-background is a major challenge in video object counting, which is usually caused by the sparsity of target objects. This remains understudied in existing works and often leads to severe under-/over-prediction errors. To tackle this issue in video object counting, we propose a density-embedded Efficient Masked Autoencoder Counting (E-MAC) framework in this paper.… ▽ More The dynamic imbalance of the fore-background is a major challenge in video object counting, which is usually caused by the sparsity of target objects. This remains understudied in existing works and often leads to severe under-/over-prediction errors. To tackle this issue in video object counting, we propose a density-embedded Efficient Masked Autoencoder Counting (E-MAC) framework in this paper. To empower the model's representation ability on density regression, we develop a new $\mathtt{D}$ensity-$\mathtt{E}$mbedded $\mathtt{M}$asked m$\mathtt{O}$deling ($\mathtt{DEMO}$) method, which first takes the density map as an auxiliary modality to perform multimodal self-representation learning for image and density map. Although $\mathtt{DEMO}$ contributes to effective cross-modal regression guidance, it also brings in redundant background information, making it difficult to focus on the foreground regions. To handle this dilemma, we propose an efficient spatial adaptive masking derived from density maps to boost efficiency. Meanwhile, we employ an optical flow-based temporal collaborative fusion strategy to effectively capture the dynamic variations across frames, aligning features to derive multi-frame density residuals. The counting accuracy of the current frame is boosted by harnessing the information from adjacent frames. In addition, considering that most existing datasets are limited to human-centric scenarios, we first propose a large video bird counting dataset, DroneBird, in natural scenarios for migratory bird protection. Extensive experiments on three crowd datasets and our \textit{DroneBird} validate our superiority against the counterparts. The code and dataset are available. △ Less

Submitted 6 March, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: ICLR25

arXiv:2411.07174 [pdf, other]

Bilayer construction for mixed state phenomena with strong, weak symmetries and symmetry breakings

Authors: Shuangyuan Lu, Penghao Zhu, Yuan-Ming Lu

Abstract: We introduce the bilayer construction, as a specific purification scheme for a general mixed state, where each mixed state has a one-to-one correspondence with a bilayer pure state with two constraints: non-negativity of the bilayer wavefunction; and the presence of an anti-unitary layer-exchange symmetry T. Different from the Choi-Jamiołkowski isomorphism, any mixed state can be realized as the m… ▽ More We introduce the bilayer construction, as a specific purification scheme for a general mixed state, where each mixed state has a one-to-one correspondence with a bilayer pure state with two constraints: non-negativity of the bilayer wavefunction; and the presence of an anti-unitary layer-exchange symmetry T. Different from the Choi-Jamiołkowski isomorphism, any mixed state can be realized as the monolayer reduced density matrix of a bilayer pure state, and its physical properties can be experimentally realized and detected in non-magnetic bilayer 2D materials with a layer-exchange mirror symmetry. We study a variety of mixed state phenomena in the bilayer construction: (1) strong and weak symmetries, their explicit and spontaneous breakings in mixed states can be understood as usual Landau-type symmetry breakings in the bilayer pure state, and their criteria can be derived accordingly; (2) decoherence of a pure state by local errors can be mapped to quantum quench dynamics of the bilayer pure states; (3) mixed symmetry protected topological (SPT) states and mixed state topological orders can be classified, characterized and realized as pure state SPTs and topological orders in the bilayer. We further study examples of strong-to-weak spontaneous symmetry breaking (SWSSB) and their critical scalings at the SWSSB transition in the bilayer construction. △ Less

Submitted 3 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

Comments: 18 pages, 1 figure, 1 table, added discussions on the definition of mixed state phases, added details for the examples discussed in section V B

arXiv:2411.04697 [pdf, other]

doi 10.24963/ijcai.2024/146

Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion

Authors: Yiming Sun, Bing Cao, Pengfei Zhu, Qinghua Hu

Abstract: Infrared and visible image fusion aim to integrate modality strengths for visually enhanced, informative images. Visible imaging in real-world scenarios is susceptible to dynamic environmental brightness fluctuations, leading to texture degradation. Existing fusion methods lack robustness against such brightness perturbations, significantly compromising the visual fidelity of the fused imagery. To… ▽ More Infrared and visible image fusion aim to integrate modality strengths for visually enhanced, informative images. Visible imaging in real-world scenarios is susceptible to dynamic environmental brightness fluctuations, leading to texture degradation. Existing fusion methods lack robustness against such brightness perturbations, significantly compromising the visual fidelity of the fused imagery. To address this challenge, we propose the Brightness Adaptive multimodal dynamic fusion framework (BA-Fusion), which achieves robust image fusion despite dynamic brightness fluctuations. Specifically, we introduce a Brightness Adaptive Gate (BAG) module, which is designed to dynamically select features from brightness-related channels for normalization, while preserving brightness-independent structural information within the source images. Furthermore, we propose a brightness consistency loss function to optimize the BAG module. The entire framework is tuned via alternating training strategies. Extensive experiments validate that our method surpasses state-of-the-art methods in preserving multi-modal image information and visual fidelity, while exhibiting remarkable robustness across varying brightness levels. Our code is available: https://github.com/SunYM2020/BA-Fusion. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: Accepted by IJCAI 2024

ACM Class: I.4.9

Journal ref: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,Main Track,Pages 1317-1325, 2024

arXiv:2411.04103 [pdf, other]

Theoretical Diagnostics for Narrow Line Regions of Active Galactic Nuclei

Authors: Peixin Zhu, Lisa J. Kewley, Ralph Sutherland

Abstract: Gas metallicity, ionization parameter, and gas pressure can affect the observed ratios of specific strong emission lines within galaxies. While the theoretical strong lines diagnostics for gas metallicity, ionization parameters, and gas pressure in star-forming regions are well-established, theoretical diagnostics for active galactic nuclei (AGNs) narrow line regions are still lacking. In Zhu et a… ▽ More Gas metallicity, ionization parameter, and gas pressure can affect the observed ratios of specific strong emission lines within galaxies. While the theoretical strong lines diagnostics for gas metallicity, ionization parameters, and gas pressure in star-forming regions are well-established, theoretical diagnostics for active galactic nuclei (AGNs) narrow line regions are still lacking. In Zhu et al. (2023), we presented a new AGN model that provides the best predictions for observations spanning the UV, optical, and infrared wavelengths. This paper presents a suite of theoretical diagnostics for the gas metallicity, ionization parameter, gas pressure, and the peak energy in AGN ionizing radiation field $E_{peak}$ for AGN narrow-line regions spanning the UV and optical wavelengths. We investigate the model dependency on the ionization parameter, gas pressure, $E_{peak}$, and the nitrogen scaling relation and make recommendations on metallicity diagnostics that are most robust against these parameters. We test our new AGN metallicity diagnostics using optical galaxy spectra from Sloan Digital Sky Survey DR16. These tests show that the metallicities measured from different diagnostics in this paper are consistent within $\sim0.3$ dex. We compare consistent HII and AGN diagnostics and demonstrate that HII and AGN diagnostics should not be used interchangeably. With a wide wavelength coverage, we anticipate that these AGN diagnostics will enable new metallicity studies of galaxies dominated by AGN. △ Less

Submitted 6 November, 2024; originally announced November 2024.

Comments: 33 pages, 22 figures, 7 tables, Accepted for publication in ApJ

arXiv:2411.01573 [pdf, other]

Conditional Controllable Image Fusion

Authors: Bing Cao, Xingxin Xu, Pengfei Zhu, Qilong Wang, Qinghua Hu

Abstract: Image fusion aims to integrate complementary information from multiple input images acquired through various sources to synthesize a new fused image. Existing methods usually employ distinct constraint designs tailored to specific scenes, forming fixed fusion paradigms. However, this data-driven fusion approach is challenging to deploy in varying scenarios, especially in rapidly changing environme… ▽ More Image fusion aims to integrate complementary information from multiple input images acquired through various sources to synthesize a new fused image. Existing methods usually employ distinct constraint designs tailored to specific scenes, forming fixed fusion paradigms. However, this data-driven fusion approach is challenging to deploy in varying scenarios, especially in rapidly changing environments. To address this issue, we propose a conditional controllable fusion (CCF) framework for general image fusion tasks without specific training. Due to the dynamic differences of different samples, our CCF employs specific fusion constraints for each individual in practice. Given the powerful generative capabilities of the denoising diffusion model, we first inject the specific constraints into the pre-trained DDPM as adaptive fusion conditions. The appropriate conditions are dynamically selected to ensure the fusion process remains responsive to the specific requirements in each reverse diffusion stage. Thus, CCF enables conditionally calibrating the fused images step by step. Extensive experiments validate our effectiveness in general fusion tasks across diverse scenarios against the competing methods without additional training. △ Less

Submitted 3 November, 2024; originally announced November 2024.

Comments: Accepted by NeurIPS 2024

arXiv:2410.20787 [pdf, ps, other]

Impurity radiation seeding of neoclassical tearing mode growth

Authors: Shiyong Zeng, Ping Zhu, Eric C. Howell

Abstract: The physics of neoclassical tearing mode (NTM) is of great concern to the tokamak plasma stability and performance, especially in the burning plasma regime. Whereas a great deal about the different seeding mechanisms have been understood, and in many situations the seed event can be clearly identified, the potential seeding process of NTM due to the resistive tearing instability driven by the impu… ▽ More The physics of neoclassical tearing mode (NTM) is of great concern to the tokamak plasma stability and performance, especially in the burning plasma regime. Whereas a great deal about the different seeding mechanisms have been understood, and in many situations the seed event can be clearly identified, the potential seeding process of NTM due to the resistive tearing instability driven by the impurity radiation cooling still needs more studies. Recent NIMROD simulations have demonstrated that the local impurity radiation cooling can drive the seed island growth and trigger the subsequent onset of neoclassical tearing mode instability. The seed island is mainly driven by the local helical perturbation of the diamagnetic current induced by the perturbed pressure gradient as a result of the impurity radiative cooling on the rational surface. A heuristic closure for the neoclassical viscosity is adopted, and the seed island is further driven by the perturbed bootstrap current induced from the neoclassical electron viscous stress in the extended Ohm's law. The growth rate of the NTM in simulations is found proportional to the electron neoclassical viscosity, and a theoretical neoclassical driving term is adopted to account for the nonlinear neoclassical island growth in the simulations. △ Less

Submitted 28 October, 2024; originally announced October 2024.

Comments: 21 pages, 12 figures

MSC Class: 76W05 (Primary) ACM Class: J.2

arXiv:2410.20679 [pdf, other]

MCI-GRU: Stock Prediction Model Based on Multi-Head Cross-Attention and Improved GRU

Authors: Peng Zhu, Yuante Li, Yifan Hu, Sheng Xiang, Qinyuan Liu, Dawei Cheng, Yuqi Liang

Abstract: As financial markets grow increasingly complex in the big data era, accurate stock prediction has become more critical. Traditional time series models, such as GRUs, have been widely used but often struggle to capture the intricate nonlinear dynamics of markets, particularly in the flexible selection and effective utilization of key historical information. Recently, methods like Graph Neural Netwo… ▽ More As financial markets grow increasingly complex in the big data era, accurate stock prediction has become more critical. Traditional time series models, such as GRUs, have been widely used but often struggle to capture the intricate nonlinear dynamics of markets, particularly in the flexible selection and effective utilization of key historical information. Recently, methods like Graph Neural Networks and Reinforcement Learning have shown promise in stock prediction but require high data quality and quantity, and they tend to exhibit instability when dealing with data sparsity and noise. Moreover, the training and inference processes for these models are typically complex and computationally expensive, limiting their broad deployment in practical applications. Existing approaches also generally struggle to capture unobservable latent market states effectively, such as market sentiment and expectations, microstructural factors, and participant behavior patterns, leading to an inadequate understanding of market dynamics and subsequently impact prediction accuracy. To address these challenges, this paper proposes a stock prediction model, MCI-GRU, based on a multi-head cross-attention mechanism and an improved GRU. First, we enhance the GRU model by replacing the reset gate with an attention mechanism, thereby increasing the model's flexibility in selecting and utilizing historical information. Second, we design a multi-head cross-attention mechanism for learning unobservable latent market state representations, which are further enriched through interactions with both temporal features and cross-sectional features. Finally, extensive experiments on four main stock markets show that the proposed method outperforms SOTA techniques across multiple metrics. Additionally, its successful application in real-world fund management operations confirms its effectiveness and practicality. △ Less

Submitted 28 March, 2025; v1 submitted 25 September, 2024; originally announced October 2024.

arXiv:2410.20374 [pdf, other]

A CT-guided Control Framework of a Robotic Flexible Endoscope for the Diagnosis of the Maxillary Sinusitis

Authors: Puchen Zhu, Huayu Zhang, Xin Ma, Xiaoyin Zheng, Xuchen Wang, Kwok Wai Samuel Au

Abstract: Flexible endoscopes are commonly adopted in narrow and confined anatomical cavities due to their higher reachability and dexterity. However, prolonged and unintuitive manipulation of these endoscopes leads to an increased workload on surgeons and risks of collision. To address these challenges, this paper proposes a CT-guided control framework for the diagnosis of maxillary sinusitis by using a ro… ▽ More Flexible endoscopes are commonly adopted in narrow and confined anatomical cavities due to their higher reachability and dexterity. However, prolonged and unintuitive manipulation of these endoscopes leads to an increased workload on surgeons and risks of collision. To address these challenges, this paper proposes a CT-guided control framework for the diagnosis of maxillary sinusitis by using a robotic flexible endoscope. In the CT-guided control framework, a feasible path to the target position in the maxillary sinus cavity for the robotic flexible endoscope is designed. Besides, an optimal control scheme is proposed to autonomously control the robotic flexible endoscope to follow the feasible path. This greatly improves the efficiency and reduces the workload for surgeons. Several experiments were conducted based on a widely utilized sinus phantom, and the results showed that the robotic flexible endoscope can accurately and autonomously follow the feasible path and reach the target position in the maxillary sinus cavity. The results also verified the feasibility of the CT-guided control framework, which contributes an effective approach to early diagnosis of sinusitis in the future. △ Less

Submitted 27 October, 2024; originally announced October 2024.

arXiv:2410.16647 [pdf, other]

GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting

Authors: Pai Zhu, Jacob W. Bartel, Dhruuv Agarwal, Kurt Partridge, Hyun Jin Park, Quan Wang

Abstract: We propose GE2E-KWS -- a generalized end-to-end training and evaluation framework for customized keyword spotting. Specifically, enrollment utterances are separated and grouped by keywords from the training batch and their embedding centroids are compared to all other test utterance embeddings to compute the loss. This simulates runtime enrollment and verification stages, and improves convergence… ▽ More We propose GE2E-KWS -- a generalized end-to-end training and evaluation framework for customized keyword spotting. Specifically, enrollment utterances are separated and grouped by keywords from the training batch and their embedding centroids are compared to all other test utterance embeddings to compute the loss. This simulates runtime enrollment and verification stages, and improves convergence stability and training speed by optimizing matrix operations compared to SOTA triplet loss approaches. To benchmark different models reliably, we propose an evaluation process that mimics the production environment and compute metrics that directly measure keyword matching accuracy. Trained with GE2E loss, our 419KB quantized conformer model beats a 7.5GB ASR encoder by 23.6% relative AUC, and beats a same size triplet loss model by 60.7% AUC. Our KWS models are natively streamable with low memory footprints, and designed to continuously run on-device with no retraining needed for new keywords (zero-shot). △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 8 pages, 6 figures, 2 tables The paper is accepted in IEEE Spoken Language Technology (SLT) 2024

arXiv:2410.13573 [pdf, other]

SPF-EMPC Planner: A real-time multi-robot trajectory planner for complex environments with uncertainties

Authors: Peng Liu, Pengming Zhu, Zhiwen Zeng, Xuekai Qiu, Yu Wang, Huimin Lu

Abstract: In practical applications, the unpredictable movement of obstacles and the imprecise state observation of robots introduce significant uncertainties for the swarm of robots, especially in cluster environments. However, existing methods are difficult to realize safe navigation, considering uncertainties, complex environmental structures, and robot swarms. This paper introduces an extended state mod… ▽ More In practical applications, the unpredictable movement of obstacles and the imprecise state observation of robots introduce significant uncertainties for the swarm of robots, especially in cluster environments. However, existing methods are difficult to realize safe navigation, considering uncertainties, complex environmental structures, and robot swarms. This paper introduces an extended state model predictive control planner with a safe probability field to address the multi-robot navigation problem in complex, dynamic, and uncertain environments. Initially, the safe probability field offers an innovative approach to model the uncertainty of external dynamic obstacles, combining it with an unconstrained optimization method to generate safe trajectories for multi-robot online. Subsequently, the extended state model predictive controller can accurately track these generated trajectories while considering the robots' inherent model constraints and state uncertainty, thus ensuring the practical feasibility of the planned trajectories. Simulation experiments show a success rate four times higher than that of state-of-the-art algorithms. Physical experiments demonstrate the method's ability to operate in real-time, enabling safe navigation for multi-robot in uncertain environments. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.10079 [pdf, other]

doi 10.1103/PhysRevLett.134.153801

Giant non-reciprocity and gyration through modulation-induced Hatano-Nelson coupling in integrated photonics

Authors: Ogulcan E. Orsel, Jiho Noh, Penghao Zhu, Jieun Yim, Taylor L. Hughes, Ronny Thomale, Gaurav Bahl

Abstract: Asymmetric energy exchange interactions, also known as Hatano-Nelson type couplings, enable the study of non-Hermitian physics and associated phenomena like the non-Hermitian skin effect and exceptional points (EP). Since these interactions are by definition non-reciprocal, there have been very few options for real-space implementations in integrated photonics. In this work, we show that real-spac… ▽ More Asymmetric energy exchange interactions, also known as Hatano-Nelson type couplings, enable the study of non-Hermitian physics and associated phenomena like the non-Hermitian skin effect and exceptional points (EP). Since these interactions are by definition non-reciprocal, there have been very few options for real-space implementations in integrated photonics. In this work, we show that real-space asymmetric couplings are readily achievable in integrated photonic systems through time-domain dynamic modulation. We experimentally study this concept using a two-resonator photonic molecule produced in a lithium niobate on insulator platform that is electro-optically modulated by rf stimuli. We demonstrate the dynamic tuning of the Hatano-Nelson coupling between the resonators, surpassing the asymmetry that has been achieved in previous work, to reach an EP for the first time. We are additionally able to flip the relative sign of the couplings for opposite directions by going past the EP. Using this capability, we show that the through-chain transport can be configured to exhibit both giant (60 dB) optical contrast as well as photonic gyration or non-reciprocal pi phase contrast. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Journal ref: Phys. Rev. Lett. 134, 153801 (2025)

arXiv:2410.05652 [pdf, other]

Performance Analysis of Local Partial MMSE Precoding Based User-Centric Cell-Free Massive MIMO Systems and Deployment Optimization

Authors: Peng Jiang, Jiafei Fu, Pengcheng Zhu, Yan Wang, Jiangzhou Wang, Xiaohu You

Abstract: Cell-free massive multiple-input multiple-output (MIMO) systems, leveraging tight cooperation among wireless access points, exhibit remarkable signal enhancement and interference suppression capabilities, demonstrating significant performance advantages over traditional cellular networks. This paper investigates the performance and deployment optimization of a user-centric scalable cell-free massi… ▽ More Cell-free massive multiple-input multiple-output (MIMO) systems, leveraging tight cooperation among wireless access points, exhibit remarkable signal enhancement and interference suppression capabilities, demonstrating significant performance advantages over traditional cellular networks. This paper investigates the performance and deployment optimization of a user-centric scalable cell-free massive MIMO system with imperfect channel information over correlated Rayleigh fading channels. Based on the large-dimensional random matrix theory, this paper presents the deterministic equivalent of the ergodic sum rate for this system when applying the local partial minimum mean square error (LP-MMSE) precoding method, along with its derivative with respect to the channel correlation matrix. Furthermore, utilizing the derivative of the ergodic sum rate, this paper designs a Barzilai-Borwein based gradient descent method to improve system deployment. Simulation experiments demonstrate that under various parameter settings and large-scale antenna configurations, the deterministic equivalent of the ergodic sum rate accurately approximates the Monte Carlo ergodic sum rate of the system. Furthermore, the deployment optimization algorithm effectively enhances the ergodic sum rate of this system by optimizing the positions of access points. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 14 pages, 8 figures

arXiv:2410.02510 [pdf, other]

SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Authors: James Gao, Jacob Lee, Yuting Zhou, Yunze Hu, Chang Liu, Pingping Zhu

Abstract: Swarm robotics, or very large-scale robotics (VLSR), has many meaningful applications for complicated tasks. However, the complexity of motion control and energy costs stack up quickly as the number of robots increases. In addressing this problem, our previous studies have formulated various methods employing macroscopic and microscopic approaches. These methods enable microscopic robots to adhere… ▽ More Swarm robotics, or very large-scale robotics (VLSR), has many meaningful applications for complicated tasks. However, the complexity of motion control and energy costs stack up quickly as the number of robots increases. In addressing this problem, our previous studies have formulated various methods employing macroscopic and microscopic approaches. These methods enable microscopic robots to adhere to a reference Gaussian mixture model (GMM) distribution observed at the macroscopic scale. As a result, optimizing the macroscopic level will result in an optimal overall result. However, all these methods require systematic and global generation of Gaussian components (GCs) within obstacle-free areas to construct the GMM trajectories. This work utilizes centroidal Voronoi tessellation to generate GCs methodically. Consequently, it demonstrates performance improvement while also ensuring consistency and reliability. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: Submitted to American Control Conference (ACC) 2025

arXiv:2409.15782 [pdf, other]

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions

Authors: Shuai Wang, Pengcheng Zhu, Haizhou Li

Abstract: Fixed-dimensional speaker embeddings have become the dominant approach in speaker modeling, typically spanning hundreds to thousands of dimensions. These dimensions are hyperparameters that are not specifically picked, nor are they hierarchically ordered in terms of importance. In large-scale speaker representation databases, reducing the dimensionality of embeddings can significantly lower storag… ▽ More Fixed-dimensional speaker embeddings have become the dominant approach in speaker modeling, typically spanning hundreds to thousands of dimensions. These dimensions are hyperparameters that are not specifically picked, nor are they hierarchically ordered in terms of importance. In large-scale speaker representation databases, reducing the dimensionality of embeddings can significantly lower storage and computational costs. However, directly training low-dimensional representations often yields suboptimal performance. In this paper, we introduce the Matryoshka speaker embedding, a method that allows dynamic extraction of sub-dimensions from the embedding while maintaining performance. Our approach is validated on the VoxCeleb dataset, demonstrating that it can achieve extremely low-dimensional embeddings, such as 8 dimensions, while preserving high speaker verification performance. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: ICSR 2024, Shenzhen

arXiv:2409.12884 [pdf, other]

Hypersphere Secure Sketch Revisited: Probabilistic Linear Regression Attack on IronMask in Multiple Usage

Authors: Pengxu Zhu, Lei Wang

Abstract: Protection of biometric templates is a critical and urgent area of focus. IronMask demonstrates outstanding recognition performance while protecting facial templates against existing known attacks. In high-level, IronMask can be conceptualized as a fuzzy commitment scheme building on the hypersphere directly. We devise an attack on IronMask targeting on the security notion of renewability. Our att… ▽ More Protection of biometric templates is a critical and urgent area of focus. IronMask demonstrates outstanding recognition performance while protecting facial templates against existing known attacks. In high-level, IronMask can be conceptualized as a fuzzy commitment scheme building on the hypersphere directly. We devise an attack on IronMask targeting on the security notion of renewability. Our attack, termed as Probabilistic Linear Regression Attack, utilizes the linearity of underlying used error correcting code. This attack is the first algorithm to successfully recover the original template when getting multiple protected templates in acceptable time and requirement of storage. We implement experiments on IronMask applied to protect ArcFace that well verify the validity of our attacks. Furthermore, we carry out experiments in noisy environments and confirm that our attacks are still applicable. Finally, we put forward two strategies to mitigate this type of attacks. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.09352 [pdf, other]

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion

Authors: Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li

Abstract: In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generatin… ▽ More In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies. △ Less

Submitted 10 January, 2025; v1 submitted 14 September, 2024; originally announced September 2024.

Comments: This is accepted to IEEE ICASSP 2025; Project page with Speech Demo: https://github.com/shinshoji01/MacST-project-page

arXiv:2409.09351 [pdf, other]

E1 TTS: Simple and Fast Non-Autoregressive TTS

Authors: Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li

Abstract: This paper introduces Easy One-Step Text-to-Speech (E1 TTS), an efficient non-autoregressive zero-shot text-to-speech system based on denoising diffusion pretraining and distribution matching distillation. The training of E1 TTS is straightforward; it does not require explicit monotonic alignment between the text and audio pairs. The inference of E1 TTS is efficient, requiring only one neural netw… ▽ More This paper introduces Easy One-Step Text-to-Speech (E1 TTS), an efficient non-autoregressive zero-shot text-to-speech system based on denoising diffusion pretraining and distribution matching distillation. The training of E1 TTS is straightforward; it does not require explicit monotonic alignment between the text and audio pairs. The inference of E1 TTS is efficient, requiring only one neural network evaluation for each utterance. Despite its sampling efficiency, E1 TTS achieves naturalness and speaker similarity comparable to various strong baseline models. Audio samples are available at http://e1tts.github.io/ . △ Less

Submitted 14 September, 2024; originally announced September 2024.

arXiv:2409.08282 [pdf, other]

LSR-IGRU: Stock Trend Prediction Based on Long Short-Term Relationships and Improved GRU

Authors: Peng Zhu, Yuante Li, Yifan Hu, Qinyuan Liu, Dawei Cheng, Yuqi Liang

Abstract: Stock price prediction is a challenging problem in the field of finance and receives widespread attention. In recent years, with the rapid development of technologies such as deep learning and graph neural networks, more research methods have begun to focus on exploring the interrelationships between stocks. However, existing methods mostly focus on the short-term dynamic relationships of stocks a… ▽ More Stock price prediction is a challenging problem in the field of finance and receives widespread attention. In recent years, with the rapid development of technologies such as deep learning and graph neural networks, more research methods have begun to focus on exploring the interrelationships between stocks. However, existing methods mostly focus on the short-term dynamic relationships of stocks and directly integrating relationship information with temporal information. They often overlook the complex nonlinear dynamic characteristics and potential higher-order interaction relationships among stocks in the stock market. Therefore, we propose a stock price trend prediction model named LSR-IGRU in this paper, which is based on long short-term stock relationships and an improved GRU input. Firstly, we construct a long short-term relationship matrix between stocks, where secondary industry information is employed for the first time to capture long-term relationships of stocks, and overnight price information is utilized to establish short-term relationships. Next, we improve the inputs of the GRU model at each step, enabling the model to more effectively integrate temporal information and long short-term relationship information, thereby significantly improving the accuracy of predicting stock trend changes. Finally, through extensive experiments on multiple datasets from stock markets in China and the United States, we validate the superiority of the proposed LSR-IGRU model over the current state-of-the-art baseline models. We also apply the proposed model to the algorithmic trading system of a financial company, achieving significantly higher cumulative portfolio returns compared to other baseline methods. Our sources are released at https://github.com/ZP1481616577/Baselines_LSR-IGRU. △ Less

Submitted 11 May, 2025; v1 submitted 25 August, 2024; originally announced September 2024.

arXiv:2409.01111 [pdf, other]

A Novel Massive Random Access in Cell-Free Massive MIMO Systems for High-Speed Mobility with OTFS Modulation

Authors: Yanfeng Hu, Dongming Wang, Xinjiang Xia, Jiamin Li, Pengcheng Zhu, Xiaohu You

Abstract: In the research of next-generation wireless communication technologies, orthogonal time frequency space (OTFS) modulation is emerging as a promising technique for high-speed mobile environments due to its superior efficiency and robustness in doubly selective channels. Additionally, the cell-free architecture, which eliminates the issues associated with cell boundaries, offers broader coverage for… ▽ More In the research of next-generation wireless communication technologies, orthogonal time frequency space (OTFS) modulation is emerging as a promising technique for high-speed mobile environments due to its superior efficiency and robustness in doubly selective channels. Additionally, the cell-free architecture, which eliminates the issues associated with cell boundaries, offers broader coverage for radio access networks. By combining cell-free network architecture with OTFS modulation, the system may meet the demands of massive random access required by machine-type communication devices in high-speed scenarios. This paper explores a massive random access scheme based on OTFS modulation within a cell-free architecture. A transceiver model for uplink OTFS signals involving multiple access points (APs) is developed, where channel estimation with fractional channel parameters is approximated as a block sparse matrix recovery problem. Building on existing superimposed and embedded preamble schemes, a hybrid preamble scheme is proposed. This scheme leverages superimposed and embedded preambles to respectively achieve rough and accurate active user equipment (UEs) detection (AUD), as well as precise channel estimation, under the condition of supporting a large number of access UEs. Moreover, this study introduces a generalized approximate message passing and pattern coupling sparse Bayesian learning with Laplacian prior (GAMP-PCSBL-La) algorithm, which effectively captures block sparse features after discrete cosine transform (DCT), delivering precise estimation results with reduced computational complexity. Simulation results demonstrate that the proposed scheme is effective and provides superior performance compared to other existing schemes. △ Less

Submitted 27 April, 2025; v1 submitted 2 September, 2024; originally announced September 2024.

arXiv:2408.14533 [pdf, other]

Obstruction to Broken Symmetries in Topological Flat Bands

Authors: Penghao Zhu, Shi Feng, Yuan-Ming Lu

Abstract: Motivated by the abundance of symmetry breaking states in magic-angle twisted bilayer graphene and other two-dimensional materials, we study superconducting (SC) and charge orders in two-dimensional topological flat bands in the strong correlation regime. By relating the half-filled 2D topological flat bands to the surface states of 3D topological insulators in symmetry class AIII, we reveal the t… ▽ More Motivated by the abundance of symmetry breaking states in magic-angle twisted bilayer graphene and other two-dimensional materials, we study superconducting (SC) and charge orders in two-dimensional topological flat bands in the strong correlation regime. By relating the half-filled 2D topological flat bands to the surface states of 3D topological insulators in symmetry class AIII, we reveal the topological obstruction to the formation of gapped SC and inter-valley charge orders without intrinsic topological orders, in the presence of the anti-unitary particle-hole symmetry at half filling. This is a generalization of the Li-Haldane arguments for nodal superconductivity to strongly interacting electrons. In contrast to the $\mathbb{Z}$-valued obstruction derived from the non-interacting band topology, the topological obstruction of interacting electrons in half-filled flat bands has a $\mathbb{Z}_{8}$ classification, depending on the charge (valley) Chern number of the superconducting (inter-valley charge) orders. This is demonstrated by an interacting Hamiltonian for half-filled flat bands with a net Chern number $C=4$, where superconductivity and $\mathbb{Z}_2$ topological order coexist in a gapped ground state with particle-hole symmetry. △ Less

Submitted 12 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

Comments: 5+9 pages, 2+1 figures; We added a short discussion about materials in v2

arXiv:2408.14165 [pdf, ps, other]

Formation of quasi-single helicity state from a paramagnetic pinch in KTX regime

Authors: Bing Luo, Ping Zhu, Wentan Yan, Hong Li, Wandong Liu

Abstract: The formation of quasi-single helicity (QSH) state from a paramagnetic pinch in the KTX-RFP regime has been observed in recent NIMROD simulations. The quasi-single helicity state has a dominant helical component of the magnetic field that is known to improve the RFP confinement. For the initial paramagnetic pinch, linear calculations indicate that the tearing mode growth rate decreases with the pl… ▽ More The formation of quasi-single helicity (QSH) state from a paramagnetic pinch in the KTX-RFP regime has been observed in recent NIMROD simulations. The quasi-single helicity state has a dominant helical component of the magnetic field that is known to improve the RFP confinement. For the initial paramagnetic pinch, linear calculations indicate that the tearing mode growth rate decreases with the plasma $β$. The initial QSH state arises from the dominant linear instability of the initial force-free paramagnetic pinch. The plasma's self-organization towards the second QSH state after the relaxation of the initial QSH state is found to depend on $β$. Specifically, when $β<4\%$, the plasma relaxes to an MH state; when $4\% \leq β\leq 8\%$, the plasma first transitions from a double axis (DAx) to a single helical axis (SHAx) state, and eventually return to the DAx state. The existence of such an optimal $β$ regime that is beneficial to the formation and maintenance of the QSH state, suggests an experimental scheme for the QSH formation based on $β$ tuning and control. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Showing 51–100 of 505 results for author: Zhu, P