-
All-sky search for individual Primordial Black Hole bursts with LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
G. H. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen
, et al. (293 additional authors not shown)
Abstract:
Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for…
▽ More
Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for individual PBH burst events using the data collected from March 2021 to July 2024 by the Water Cherenkov Detector Array of the Large High Altitude Air Shower Observatory (LHAASO). Three PBH burst durations, 10~s, 20~s, and 100~s, are searched, with no significant PBH bursts observed. The upper limit on the local PBH burst rate density is set to be as low as 181~pc$^{-3}$~yr$^{-1}$ at 99$\%$ confidence level, representing the most stringent limit achieved to date.
△ Less
Submitted 2 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
Authors:
Bowen Dong,
Minheng Ni,
Zitong Huang,
Guanglei Yang,
Wangmeng Zuo,
Lei Zhang
Abstract:
Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations. This failure constitutes a significant issue and hinders the diagnosis of multim…
▽ More
Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations. This failure constitutes a significant issue and hinders the diagnosis of multimodal reasoning failures within MLLMs. To address this, we propose the {\dataset} benchmark, which isolates reasoning hallucinations by constructing questions where input images are correctly perceived by MLLMs yet reasoning errors persist. {\dataset} introduces multi-granular evaluation metrics: accuracy, factuality, and LLMs hallucination score for hallucination quantification. Our analysis reveals that (1) the model scale, data scale, and training stages significantly affect the degree of logical, fabrication, and factual hallucinations; (2) current MLLMs show no effective improvement on spatial hallucinations caused by misinterpreted spatial relationships, indicating their limited visual reasoning capabilities; and (3) question types correlate with distinct hallucination patterns, highlighting targeted challenges and potential mitigation strategies. To address these challenges, we propose {\method}, a method that combines curriculum reinforcement fine-tuning to encourage models to generate logic-consistent reasoning chains by stepwise reducing learning difficulty, and collaborative hint inference to reduce reasoning complexity. {\method} establishes a baseline on {\dataset}, and reduces the logical hallucinations in original base models.
△ Less
Submitted 2 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Authors:
Minheng Ni,
Zhengyuan Yang,
Linjie Li,
Chung-Ching Lin,
Kevin Lin,
Wangmeng Zuo,
Lijuan Wang
Abstract:
Recent advances in large language models have significantly improved textual reasoning through the effective use of Chain-of-Thought (CoT) and reinforcement learning. However, extending these successes to vision-language tasks remains challenging due to inherent limitations in text-only CoT, such as visual hallucinations and insufficient multimodal integration. In this paper, we introduce Point-RF…
▽ More
Recent advances in large language models have significantly improved textual reasoning through the effective use of Chain-of-Thought (CoT) and reinforcement learning. However, extending these successes to vision-language tasks remains challenging due to inherent limitations in text-only CoT, such as visual hallucinations and insufficient multimodal integration. In this paper, we introduce Point-RFT, a multimodal reasoning framework explicitly designed to leverage visually grounded CoT reasoning for visual document understanding. Our approach consists of two stages: First, we conduct format finetuning using a curated dataset of 71K diverse visual reasoning problems, each annotated with detailed, step-by-step rationales explicitly grounded to corresponding visual elements. Second, we employ reinforcement finetuning targeting visual document understanding. On ChartQA, our approach improves accuracy from 70.88% (format-finetuned baseline) to 90.04%, surpassing the 83.92% accuracy achieved by reinforcement finetuning relying solely on text-based CoT. The result shows that our grounded CoT is more effective for multimodal reasoning compared with the text-only CoT. Moreover, Point-RFT exhibits superior generalization capability across several out-of-domain visual document reasoning benchmarks, including CharXiv, PlotQA, IconQA, TabMWP, etc., and highlights its potential in complex real-world scenarios.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
G. H. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (292 additional authors not shown)
Abstract:
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst…
▽ More
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Experimental investigation of a novel liquid metal plasma facing component with pre-filled microstructures
Authors:
Yi-Jun Wang,
Kai-Lun Li,
Rui-Zhi Chen,
Yue-Bin Hu,
Juan-Cheng Yang,
Ming-Jiu Ni,
Zhao-Hui Yao
Abstract:
Regarding the plasma facing components (PFCs) in nuclear fusion, liquid metal PFCs with stable free surface flow on PFC surface are considered a promising alternative. However, due to the poor wettability of liquid metal on most solid substrates and the complex magnetohydrodynamic (MHD), the realization of stable free surface flow on PFCs surface is challenging. In the present study, using the 3D…
▽ More
Regarding the plasma facing components (PFCs) in nuclear fusion, liquid metal PFCs with stable free surface flow on PFC surface are considered a promising alternative. However, due to the poor wettability of liquid metal on most solid substrates and the complex magnetohydrodynamic (MHD), the realization of stable free surface flow on PFCs surface is challenging. In the present study, using the 3D printed methods, we developed a novel liquid metal PFC surface with MIcrostructures pre-FIlled by Liquid Metal (MIFILM) to realize a stable free liquid metal surface flow. The experimental results demonstrated that due to the existence of MIFILM, the apparent contact angle (ACA) of liquid metal changes from 140$^{\circ}$ to approximately 20$^{\circ}$, indicating a transition from hydrophobic to hydrophilic. When the liquid metal flows on the MIFILM substrate, it is found that the liquid metal can completely spread on the surface with a stable and orderly free surface, even at a low flow rate. Moreover, the liquid metal could exhibit sustained spreading properties on the MIFILM substrate under a strong transverse magnetic field (up to 1.6 T). Results indicate that the magnetic field induces limited MHD drag but also accelerates the flow via two-dimensional effects. When the Stuart number $N<1$, the flow accelerates and the film thickness decreases. For $N>1$, both flow velocity and film thickness gradually stabilize. Therefore, the present novel MIFILM can offer a good choice for liquid metal PFC substrates in nuclear fusion.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
QuantBench: Benchmarking AI Methods for Quantitative Investment
Authors:
Saizhuo Wang,
Hao Kong,
Jiadong Guo,
Fengrui Hua,
Yiyan Qi,
Wanyun Zhou,
Jiahao Zheng,
Xinyu Wang,
Lionel M. Ni,
Jian Guo
Abstract:
The field of artificial intelligence (AI) in quantitative investment has seen significant advancements, yet it lacks a standardized benchmark aligned with industry practices. This gap hinders research progress and limits the practical application of academic innovations. We present QuantBench, an industrial-grade benchmark platform designed to address this critical need. QuantBench offers three ke…
▽ More
The field of artificial intelligence (AI) in quantitative investment has seen significant advancements, yet it lacks a standardized benchmark aligned with industry practices. This gap hinders research progress and limits the practical application of academic innovations. We present QuantBench, an industrial-grade benchmark platform designed to address this critical need. QuantBench offers three key strengths: (1) standardization that aligns with quantitative investment industry practices, (2) flexibility to integrate various AI algorithms, and (3) full-pipeline coverage of the entire quantitative investment process. Our empirical studies using QuantBench reveal some critical research directions, including the need for continual learning to address distribution shifts, improved methods for modeling relational financial data, and more robust approaches to mitigate overfitting in low signal-to-noise environments. By providing a common ground for evaluation and fostering collaboration between researchers and practitioners, QuantBench aims to accelerate progress in AI for quantitative investment, similar to the impact of benchmark platforms in computer vision and natural language processing.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
The Dance of Atoms-De Novo Protein Design with Diffusion Model
Authors:
Yujie Qin,
Ming He,
Changyong Yu,
Ming Ni,
Xian Liu,
Xiaochen Bo
Abstract:
The de novo design of proteins refers to creating proteins with specific structures and functions that do not naturally exist. In recent years, the accumulation of high-quality protein structure and sequence data and technological advancements have paved the way for the successful application of generative artificial intelligence (AI) models in protein design. These models have surpassed tradition…
▽ More
The de novo design of proteins refers to creating proteins with specific structures and functions that do not naturally exist. In recent years, the accumulation of high-quality protein structure and sequence data and technological advancements have paved the way for the successful application of generative artificial intelligence (AI) models in protein design. These models have surpassed traditional approaches that rely on fragments and bioinformatics. They have significantly enhanced the success rate of de novo protein design, and reduced experimental costs, leading to breakthroughs in the field. Among various generative AI models, diffusion models have yielded the most promising results in protein design. In the past two to three years, more than ten protein design models based on diffusion models have emerged. Among them, the representative model, RFDiffusion, has demonstrated success rates in 25 protein design tasks that far exceed those of traditional methods, and other AI-based approaches like RFjoint and hallucination. This review will systematically examine the application of diffusion models in generating protein backbones and sequences. We will explore the strengths and limitations of different models, summarize successful cases of protein design using diffusion models, and discuss future development directions.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Theoretical analysis for non-linear effects of magnetic fields on unsteady boundary layer flows
Authors:
Jing-Yu Fu,
Ming-Jiu Ni,
Nian-Mei Zhang
Abstract:
This study investigates unsteady boundary layer phenomena in electrically conducting fluids subjected to static magnetic fields. Using a semi-explicit similarity transformation method, the momentum equation associated with the Stokes stream function is solved. The nonlinear closed analytical solutions for both stagnation flow and converging flow are derived. The results demonstrate that the bounda…
▽ More
This study investigates unsteady boundary layer phenomena in electrically conducting fluids subjected to static magnetic fields. Using a semi-explicit similarity transformation method, the momentum equation associated with the Stokes stream function is solved. The nonlinear closed analytical solutions for both stagnation flow and converging flow are derived. The results demonstrate that the boundary layer structure incorporates similar shock and solitary wave components which are promoted by Lorentz force. Under extreme magnetic fields, the flow exhibits sine and cosine wave patterns, which are motivated by the strong Lorentz force. An in-depth asymptotic analysis establishes the square root scaling laws that quantify the growth of friction and flux with increasing magnetic field strength. The boundary layer thickness scales inversely with the Hartmann number, a consequence of dominant Lorentz force, which differs from the conclusion of duct flow (Hunt 1965). These findings elucidate the physical mechanisms governing the nonlinear coupling between magnetic fields and the dynamics of the boundary layer.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Measurement of LLM's Philosophies of Human Nature
Authors:
Minheng Ni,
Ennan Wu,
Zidong Gong,
Zhengyuan Yang,
Linjie Li,
Chung-Ching Lin,
Kevin Lin,
Lijuan Wang,
Wangmeng Zuo
Abstract:
The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems. Based on Wrightsman's Philosophies of Human Nature Scale (PHNS), a scale empirically validated over decades to effectively assess individuals' attitudes toward human nature, we design the…
▽ More
The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems. Based on Wrightsman's Philosophies of Human Nature Scale (PHNS), a scale empirically validated over decades to effectively assess individuals' attitudes toward human nature, we design the standardized psychological scale specifically targeting large language models (LLM), named the Machine-based Philosophies of Human Nature Scale (M-PHNS). By evaluating LLMs' attitudes toward human nature across six dimensions, we reveal that current LLMs exhibit a systemic lack of trust in humans, and there is a significant negative correlation between the model's intelligence level and its trust in humans. Furthermore, we propose a mental loop learning framework, which enables LLM to continuously optimize its value system during virtual interactions by constructing moral scenarios, thereby improving its attitude toward human nature. Experiments demonstrate that mental loop learning significantly enhances their trust in humans compared to persona or instruction prompts. This finding highlights the potential of human-based psychological assessments for LLM, which can not only diagnose cognitive biases but also provide a potential solution for ethical learning in artificial intelligence. We release the M-PHNS evaluation code and data at https://github.com/kodenii/M-PHNS.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method
Authors:
Shufang Zhang,
Hang Qian,
Minxue Ni,
Yaxuan Li,
Wenxin Ding,
Jun Liu
Abstract:
With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized…
▽ More
With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized consumer needs. To address this, we propose a novel virtual try-on method named SV-VTON, which introduces garment sizing concepts into virtual try-on tasks. The SV-VTON method first generates refined masks for multiple garment sizes, then integrates these masks with garment images at varying proportions, enabling virtual try-on simulations across different sizes. In addition, we developed a specialized size evaluation module to quantitatively assess the accuracy of size variations. This module calculates differences between generated size increments and international sizing standards, providing objective measurements of size accuracy. To further validate SV-VTON's generalization capability across different models, we conducted experiments on multiple SOTA Diffusion models. The results demonstrate that SV-VTON consistently achieves precise multi-size virtual try-on across various SOTA models, and validates the effectiveness and rationality of the proposed method, significantly fulfilling users' personalized multi-size virtual try-on requirements.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
From Deep Learning to LLMs: A survey of AI in Quantitative Investment
Authors:
Bokai Cao,
Saizhuo Wang,
Xinyi Lin,
Xiaojun Wu,
Haohan Zhang,
Lionel M. Ni,
Jian Guo
Abstract:
Quantitative investment (quant) is an emerging, technology-driven approach in asset management, increasingy shaped by advancements in artificial intelligence. Recent advances in deep learning and large language models (LLMs) for quant finance have improved predictive modeling and enabled agent-based automation, suggesting a potential paradigm shift in this field. In this survey, taking alpha strat…
▽ More
Quantitative investment (quant) is an emerging, technology-driven approach in asset management, increasingy shaped by advancements in artificial intelligence. Recent advances in deep learning and large language models (LLMs) for quant finance have improved predictive modeling and enabled agent-based automation, suggesting a potential paradigm shift in this field. In this survey, taking alpha strategy as a representative example, we explore how AI contributes to the quantitative investment pipeline. We first examine the early stage of quant research, centered on human-crafted features and traditional statistical models with an established alpha pipeline. We then discuss the rise of deep learning, which enabled scalable modeling across the entire pipeline from data processing to order execution. Building on this, we highlight the emerging role of LLMs in extending AI beyond prediction, empowering autonomous agents to process unstructured data, generate alphas, and support self-iterative workflows.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Authors:
Cheng Deng,
Luoyang Sun,
Jiwen Jiang,
Yongcheng Zeng,
Xinjian Wu,
Wenxin Zhao,
Qingfa Xiao,
Jiachuan Wang,
Haoyang Li,
Lei Chen,
Lionel M. Ni,
Haifeng Zhang,
Jun Wang
Abstract:
While scaling laws have been continuously validated in large language models (LLMs) with increasing model parameters, the inherent tension between the inference demands of LLMs and the limited resources of edge devices poses a critical challenge to the development of edge intelligence. Recently, numerous small language models have emerged, aiming to distill the capabilities of LLMs into smaller fo…
▽ More
While scaling laws have been continuously validated in large language models (LLMs) with increasing model parameters, the inherent tension between the inference demands of LLMs and the limited resources of edge devices poses a critical challenge to the development of edge intelligence. Recently, numerous small language models have emerged, aiming to distill the capabilities of LLMs into smaller footprints. However, these models often retain the fundamental architectural principles of their larger counterparts, still imposing considerable strain on the storage and bandwidth capacities of edge devices. In this paper, we introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimizes model architecture and edge system constraints. The PLM utilizes a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint during inference. During training, we collect and reorganize open-source datasets, implement a multi-phase training strategy, and empirically investigate the Warmup-Stable-Decay-Constant (WSDC) learning rate scheduler. Additionally, we incorporate Reinforcement Learning from Human Feedback (RLHF) by adopting the ARIES preference learning approach. Following a two-phase SFT process, this method yields performance gains of 2% in general tasks, 9% in the GSM8K task, and 11% in coding tasks. In addition to its novel architecture, evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data while maintaining the lowest number of activated parameters. Furthermore, deployment across various edge devices, including consumer-grade GPUs, mobile phones, and Raspberry Pis, validates PLM's suitability for peripheral applications. The PLM series models are publicly available at https://github.com/plm-team/PLM.
△ Less
Submitted 19 March, 2025; v1 submitted 15 March, 2025;
originally announced March 2025.
-
Probing Globular Cluster with MeerKAT and FAST: A Pulsar Polarization Census
Authors:
Lei Zhang,
Federico Abbate,
Di Li,
Andrea Possenti,
Matthew Bailes,
Alessandro Ridolfi,
Paulo C. C. Freire,
Scott M. Ransom,
Yong-Kun Zhang,
Meng Guo,
Meng-Meng Ni,
Jia-Le Hu,
Yi Feng,
Pei Wang,
Jie Zhang,
Qi-Jun Zhi
Abstract:
Only one globular cluster (GC), 47 Tuc, has been found to contain intracluster medium, with an electron density 100 times higher than that of the ISM in its vicinity. The characteristics of this intracluster medium are closely related to GC evolution and the compact objects within. However, significant knowledge gaps remain regarding the ionized gas content of GCs, particularly in Galactic halo cl…
▽ More
Only one globular cluster (GC), 47 Tuc, has been found to contain intracluster medium, with an electron density 100 times higher than that of the ISM in its vicinity. The characteristics of this intracluster medium are closely related to GC evolution and the compact objects within. However, significant knowledge gaps remain regarding the ionized gas content of GCs, particularly in Galactic halo clusters. We carried out a polarization census of GC pulsars using MeerKAT and FAST. This first combined effort of observations from these two major radio telescopes resulted in high signal-to-noise ratio, full polarization pulse profiles for 43 pulsars in 8 GCs, doubling the number of rotation measures (RMs) known in these clusters. The accuracy of dispersion measures (DMs) was improved by a factor of 8 compared to previous publications. No intracluster medium was found, and at least two halo GCs showed more stringent upper limits on electron density than that detected in 47 Tuc. The surprising barrenness of GCs suggests effective gas removal mechanisms, such as strong winds from millisecond pulsars and/or ionizing radiation from post-AGB stars and young white dwarfs.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Ultra-high-energy $γ$-ray emission associated with the tail of a bow-shock pulsar wind nebula
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (274 additional authors not shown)
Abstract:
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola f…
▽ More
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola function with $N0 = (1.93\pm0.23) \times 10^{-16} \rm{TeV^{-1}\,cm^{-2}\,s^{-2}}$, $α= 2.14\pm0.27$, and $β= 1.20\pm0.41$ at E0 = 30$\,$TeV. The associated pulsar, PSR J1740+1000, resides at a high galactic latitude and powers a bow-shock pulsar wind nebula (BSPWN) with an extended X-ray tail. The best-fit position of the gamma-ray source appeared to be shifted by $0.2^{\circ}$ with respect to the pulsar position. As the (i) currently identified pulsar halos do not demonstrate such offsets, and (ii) centroid of the gamma-ray emission is approximately located at the extension of the X-ray tail, we speculate that the UHE $γ$-ray emission may originate from re-accelerated electron/positron pairs that are advected away in the bow-shock tail.
△ Less
Submitted 24 February, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
Control Barrier Function-Based Quadratic Programming for SafeOperation of Tethered UAVs
Authors:
Samuel O. Folorunsho,
Maggi Ni,
William R. Norris
Abstract:
Consider an unmanned aerial vehicle (UAV) physically connected to the ground station with a tether operating in a space, tasked with performing precise maneuvers while constrained by the physical limitation of its tether, which prevents it from flying beyond a maximum allowable length. Violating this tether constraint could lead to system failure or operational hazards, making it essential to enfo…
▽ More
Consider an unmanned aerial vehicle (UAV) physically connected to the ground station with a tether operating in a space, tasked with performing precise maneuvers while constrained by the physical limitation of its tether, which prevents it from flying beyond a maximum allowable length. Violating this tether constraint could lead to system failure or operational hazards, making it essential to enforce safety constraints dynamically while ensuring the drone can track desired trajectories accurately. This paper presents a Control Barrier Function Quadratic Programming Framework (CBF-QP) for ensuring the safe and efficient operation of tethered unmanned aerial vehicles (TUAVs). The framework leverages nominal backstepping control to achieve trajectory tracking, augmented with control barrier functions to ensure compliance with the tether constraint. In this proposed method, the tether constraint is directly embedded in the control design and therefore guarantees the TUAV remains within a predefined operational region defined by the maximum tether length while achieving precise trajectory tracking. The effectiveness of the proposed framework is validated through simulations involving set-point tracking, dynamic trajectory following, and disturbances such as incorrect user inputs. The results demonstrate that the TUAV respects the tether constraint ||x(t)||</= Lmax, with tracking errors converging to zero and the control input remaining bounded.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Broadband $γ$-ray spectrum of supernova remnant Cassiopeia A
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (293 additional authors not shown)
Abstract:
The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telesc…
▽ More
The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telescopes (IACTs) and its flux near $\sim 1$ TeV is about two times higher. In combination with analyses of more than 16 years of \textit{Fermi}-LAT data covering $0.1 \, \mathrm{GeV} - 1 \, \mathrm{TeV}$, we find that the spectrum above 30 GeV deviates significantly from a single power-law, and is best described by a smoothly broken power-law with a spectral index of $1.90 \pm 0.15_\mathrm{stat}$ ($3.41 \pm 0.19_\mathrm{stat}$) below (above) a break energy of $0.63 \pm 0.21_\mathrm{stat} \, \mathrm{TeV}$. Given differences in the angular resolution of LHAASO-WCDA and IACTs, TeV $γ$-ray emission detected with LHAASO may have a significant contribution from regions surrounding the SNR illuminated by particles accelerated earlier, which, however, are treated as background by IACTs. Detailed modelling can be used to constrain acceleration processes of TeV particles in the early stage of SNR evolution.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Asymptotic solution for three-dimensional reaction-diffusion-advection equation with periodic boundary conditions
Authors:
Aleksei Liubavin,
Mingkang Ni,
Ye Zhang,
Dmitrii Chaikovskii
Abstract:
In this study, we investigate the dynamics of moving fronts in three-dimensional spaces, which form as a result of in-situ combustion during oil production. This phenomenon is also observed in other contexts, such as various autowave models and the propagation of acoustic waves. Our analysis involves a singularly perturbed reaction-diffusion-advection type initial-boundary value problem of a gener…
▽ More
In this study, we investigate the dynamics of moving fronts in three-dimensional spaces, which form as a result of in-situ combustion during oil production. This phenomenon is also observed in other contexts, such as various autowave models and the propagation of acoustic waves. Our analysis involves a singularly perturbed reaction-diffusion-advection type initial-boundary value problem of a general form. We employ methods from asymptotic theory to develop an approximate smooth solution with an internal layer. Using local coordinates, we focus on the transition layer, where the solution undergoes rapid changes. Once the location of the transition layer is established, we can describe the solution across the full domain of the problem. Numerical examples are provided, demonstrating the high accuracy of the asymptotic method in predicting the behaviors of moving fronts.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Taming Teacher Forcing for Masked Autoregressive Video Generation
Authors:
Deyu Zhou,
Quan Sun,
Yuang Peng,
Kun Yan,
Runpei Dong,
Duomin Wang,
Zheng Ge,
Nan Duan,
Xiangyu Zhang,
Lionel M. Ni,
Heung-Yeung Shum
Abstract:
We introduce MAGI, a hybrid video generation framework that combines masked modeling for intra-frame generation with causal modeling for next-frame generation. Our key innovation, Complete Teacher Forcing (CTF), conditions masked frames on complete observation frames rather than masked ones (namely Masked Teacher Forcing, MTF), enabling a smooth transition from token-level (patch-level) to frame-l…
▽ More
We introduce MAGI, a hybrid video generation framework that combines masked modeling for intra-frame generation with causal modeling for next-frame generation. Our key innovation, Complete Teacher Forcing (CTF), conditions masked frames on complete observation frames rather than masked ones (namely Masked Teacher Forcing, MTF), enabling a smooth transition from token-level (patch-level) to frame-level autoregressive generation. CTF significantly outperforms MTF, achieving a +23% improvement in FVD scores on first-frame conditioned video prediction. To address issues like exposure bias, we employ targeted training strategies, setting a new benchmark in autoregressive video generation. Experiments show that MAGI can generate long, coherent video sequences exceeding 100 frames, even when trained on as few as 16 frames, highlighting its potential for scalable, high-quality video generation.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Cross-Entropy Attacks to Language Models via Rare Event Simulation
Authors:
Mingze Ni,
Yongshun Gong,
Wei Liu
Abstract:
Black-box textual adversarial attacks are challenging due to the lack of model information and the discrete, non-differentiable nature of text. Existing methods often lack versatility for attacking different models, suffer from limited attacking performance due to the inefficient optimization with word saliency ranking, and frequently sacrifice semantic integrity to achieve better attack outcomes.…
▽ More
Black-box textual adversarial attacks are challenging due to the lack of model information and the discrete, non-differentiable nature of text. Existing methods often lack versatility for attacking different models, suffer from limited attacking performance due to the inefficient optimization with word saliency ranking, and frequently sacrifice semantic integrity to achieve better attack outcomes. This paper introduces a novel approach to textual adversarial attacks, which we call Cross-Entropy Attacks (CEA), that uses Cross-Entropy optimization to address the above issues. Our CEA approach defines adversarial objectives for both soft-label and hard-label settings and employs CE optimization to identify optimal replacements. Through extensive experiments on document classification and language translation problems, we demonstrate that our attack method excels in terms of attacking performance, imperceptibility, and sentence quality.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Ultralow-temperature heat transport evidence for residual density of states in the superconducting state of CsV3Sb5
Authors:
C. C. Zhao,
L. S. Wang,
W. Xia,
Q. W. Yin,
H. B. Deng,
G. W. Liu,
J. J. Liu,
X. Zhang,
J. M. Ni,
Y. Y. Huang,
C. P. Tu,
Z. C. Tao,
Z. J. Tu,
C. S. Gong,
Z. W. Wang,
H. C. Lei,
Y. F. Guo,
X. F. Yang,
J. X. Yin,
S. Y. Li
Abstract:
The V-based kagome superconductors $A$V$_3$Sb$_5$ ($A$ = K, Rb, and Cs) host charge density wave (CDW) and a topological nontrivial band structure, thereby provide a great platform to study the interplay of superconductivity (SC), CDW, frustration, and topology. Here, we report ultralow-temperature thermal conductivity measurements on CsV$_3$Sb$_5$ and Ta-doped Cs(V$_{0.86}$Ta$_{0.14}$)$_3$Sb$_5$…
▽ More
The V-based kagome superconductors $A$V$_3$Sb$_5$ ($A$ = K, Rb, and Cs) host charge density wave (CDW) and a topological nontrivial band structure, thereby provide a great platform to study the interplay of superconductivity (SC), CDW, frustration, and topology. Here, we report ultralow-temperature thermal conductivity measurements on CsV$_3$Sb$_5$ and Ta-doped Cs(V$_{0.86}$Ta$_{0.14}$)$_3$Sb$_5$ and scanning tunneling microscopy (STM) measurements on CsV$_3$Sb$_5$. The finite residual linear term of thermal conductivity at zero magnetic field suggests the existence of a residual density of states (DOS) in the superconducting state of CsV$_3$Sb$_5$. This is supported by the observation of non-zero conductance at zero bias in STM spectrum at an electronic temperature of 90 mK. However, in Cs(V$_{0.86}$Ta$_{0.14}$)$_3$Sb$_5$, which does not have CDW order, there is no evidence for residual DOS. These results show the importance of CDW order for the residual DOS, and a nodal $s$-wave gap or residual Fermi arc may be the origin of the residual DOS in such an unusual multiband kagome superconductor, CsV$_3$Sb$_5$.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Nonlinear control and stability analysis of a unified Tethered UAV-winder system
Authors:
Samuel Folorunsho,
Maggie Ni,
William Norris
Abstract:
This paper presents the development of a comprehensive dynamics and stabilizing control architecture for Tethered Unmanned Aerial Vehicle (TUAV) systems. The proposed architecture integrates both onboard and ground-based controllers, employing nonlinear backstepping control techniques to achieve asymptotic stability of the TUAV's equilibrium. The onboard controllers are responsible for the positio…
▽ More
This paper presents the development of a comprehensive dynamics and stabilizing control architecture for Tethered Unmanned Aerial Vehicle (TUAV) systems. The proposed architecture integrates both onboard and ground-based controllers, employing nonlinear backstepping control techniques to achieve asymptotic stability of the TUAV's equilibrium. The onboard controllers are responsible for the position and attitude control of the TUAV, while the ground controllers regulate the winder mechanism to maintain the desired tether length, ensuring it retains its catenary form. Simulation results demonstrate the ability of the TUAV system to accurately track linear and circular trajectories, ensuring robust performance under various operational scenarios. The code and movies demonstrating the performance of the system can be found at https://github.com/sof-danny/TUAV\_system\_control.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-Policy
Authors:
Minheng Ni,
Lei Zhang,
Zihan Chen,
Kaixin Bai,
Zhaopeng Chen,
Jianwei Zhang,
Lei Zhang,
Wangmeng Zuo
Abstract:
Unthinking execution of human instructions in robotic manipulation can lead to severe safety risks, such as poisonings, fires, and even explosions. In this paper, we present responsible robotic manipulation, which requires robots to consider potential hazards in the real-world environment while completing instructions and performing complex operations safely and efficiently. However, such scenario…
▽ More
Unthinking execution of human instructions in robotic manipulation can lead to severe safety risks, such as poisonings, fires, and even explosions. In this paper, we present responsible robotic manipulation, which requires robots to consider potential hazards in the real-world environment while completing instructions and performing complex operations safely and efficiently. However, such scenarios in real world are variable and risky for training. To address this challenge, we propose Safety-as-policy, which includes (i) a world model to automatically generate scenarios containing safety risks and conduct virtual interactions, and (ii) a mental model to infer consequences with reflections and gradually develop the cognition of safety, allowing robots to accomplish tasks while avoiding dangers. Additionally, we create the SafeBox synthetic dataset, which includes one hundred responsible robotic manipulation tasks with different safety risk scenarios and instructions, effectively reducing the risks associated with real-world experiments. Experiments demonstrate that Safety-as-policy can avoid risks and efficiently complete tasks in both synthetic dataset and real-world experiments, significantly outperforming baseline methods. Our SafeBox dataset shows consistent evaluation results with real-world scenarios, serving as a safe and effective benchmark for future research.
△ Less
Submitted 31 May, 2025; v1 submitted 27 November, 2024;
originally announced November 2024.
-
Deceiving Question-Answering Models: A Hybrid Word-Level Adversarial Approach
Authors:
Jiyao Li,
Mingze Ni,
Yongshun Gong,
Wei Liu
Abstract:
Deep learning underpins most of the currently advanced natural language processing (NLP) tasks such as textual classification, neural machine translation (NMT), abstractive summarization and question-answering (QA). However, the robustness of the models, particularly QA models, against adversarial attacks is a critical concern that remains insufficiently explored. This paper introduces QA-Attack (…
▽ More
Deep learning underpins most of the currently advanced natural language processing (NLP) tasks such as textual classification, neural machine translation (NMT), abstractive summarization and question-answering (QA). However, the robustness of the models, particularly QA models, against adversarial attacks is a critical concern that remains insufficiently explored. This paper introduces QA-Attack (Question Answering Attack), a novel word-level adversarial strategy that fools QA models. Our attention-based attack exploits the customized attention mechanism and deletion ranking strategy to identify and target specific words within contextual passages. It creates deceptive inputs by carefully choosing and substituting synonyms, preserving grammatical integrity while misleading the model to produce incorrect responses. Our approach demonstrates versatility across various question types, particularly when dealing with extensive long textual inputs. Extensive experiments on multiple benchmark datasets demonstrate that QA-Attack successfully deceives baseline QA models and surpasses existing adversarial techniques regarding success rate, semantics changes, BLEU score, fluency and grammar error rate.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Authors:
Jun Wang,
Meng Fang,
Ziyu Wan,
Muning Wen,
Jiachen Zhu,
Anjie Liu,
Ziqin Gong,
Yan Song,
Lei Chen,
Lionel M. Ni,
Linyi Yang,
Ying Wen,
Weinan Zhang
Abstract:
In this technical report, we introduce OpenR, an open-source framework designed to integrate key components for enhancing the reasoning capabilities of large language models (LLMs). OpenR unifies data acquisition, reinforcement learning training (both online and offline), and non-autoregressive decoding into a cohesive software platform. Our goal is to establish an open-source platform and communi…
▽ More
In this technical report, we introduce OpenR, an open-source framework designed to integrate key components for enhancing the reasoning capabilities of large language models (LLMs). OpenR unifies data acquisition, reinforcement learning training (both online and offline), and non-autoregressive decoding into a cohesive software platform. Our goal is to establish an open-source platform and community to accelerate the development of LLM reasoning. Inspired by the success of OpenAI's o1 model, which demonstrated improved reasoning abilities through step-by-step reasoning and reinforcement learning, OpenR integrates test-time compute, reinforcement learning, and process supervision to improve reasoning in LLMs. Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning, achieving advanced reasoning capabilities beyond traditional autoregressive methods. We demonstrate the efficacy of OpenR by evaluating it on the MATH dataset, utilising publicly available data and search methods. Our initial experiments confirm substantial gains, with relative improvements in reasoning and performance driven by test-time computation and reinforcement learning through process reward models. The OpenR framework, including code, models, and datasets, is accessible at https://openreasoner.github.io.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Authors:
Minheng Ni,
Yutao Fan,
Lei Zhang,
Wangmeng Zuo
Abstract:
As large-scale models evolve, language instructions are increasingly utilized in multi-modal tasks. Due to human language habits, these instructions often contain ambiguities in real-world scenarios, necessitating the integration of visual context or common sense for accurate interpretation. However, even highly intelligent large models exhibit significant performance limitations on ambiguous inst…
▽ More
As large-scale models evolve, language instructions are increasingly utilized in multi-modal tasks. Due to human language habits, these instructions often contain ambiguities in real-world scenarios, necessitating the integration of visual context or common sense for accurate interpretation. However, even highly intelligent large models exhibit significant performance limitations on ambiguous instructions, where weak reasoning abilities of disambiguation can lead to catastrophic errors. To address this issue, this paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework. It simulates human multi-modal multi-turn reasoning, providing instantial experience for highly intelligent models or empirical experience for generally intelligent models to understand ambiguous instructions. Unlike traditional methods that require models to possess high intelligence to understand long texts or perform lengthy complex reasoning, our framework does not significantly increase computational overhead and is more general and effective, even for generally intelligent models. Experiments show that our method not only significantly enhances the performance of models of different intelligence levels on ambiguous instructions but also improves their performance on general datasets. Our work highlights the potential of artificial intelligence to work like humans in real-world scenarios with uncertainty and ambiguity. We will release our data and code.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Three-dimensional simulation of film boiling on a horizontal surface with magnetic field
Authors:
Hao-Tao Gu,
Kirti Chandra Sahu,
Jie Zhang,
Ming-Jiu Ni
Abstract:
This study conducts a numerical investigation into the three-dimensional film boiling of liquid under the influence of external magnetic fields. The numerical method incorporates a sharp phase-change model based on the volume-of-fluid approach to track the liquid-vapor interface. Additionally, a consistent and conservative scheme is employed to calculate the induced current densities and electroma…
▽ More
This study conducts a numerical investigation into the three-dimensional film boiling of liquid under the influence of external magnetic fields. The numerical method incorporates a sharp phase-change model based on the volume-of-fluid approach to track the liquid-vapor interface. Additionally, a consistent and conservative scheme is employed to calculate the induced current densities and electromagnetic forces. We investigate the magnetohydrodynamic effects on film boiling, particularly examining the pattern transition of the vapor bubble and the evolution of heat transfer characteristics, exposed to either a vertical or horizontal magnetic field. In single-mode scenarios, film boiling under a vertical magnetic field displays an isotropic flow structure, forming a columnar vapor jet at higher magnetic field intensities. In contrast, horizontal magnetic fields result in anisotropic flow, creating a two-dimensional vapor sheet as the magnetic strength increases. In multi-mode scenarios, the patterns observed in single-mode film boiling persist, with the interaction of vapor bubbles introducing additional complexity to the magnetohydrodynamic flow. More importantly, our comprehensive analysis reveals how and why distinct boiling effects are generated by various orientations of magnetic fields, which induce directional electromagnetic forces to suppress flow vortices within the cross-sectional plane.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Development of the Multichannel Pulsed Ultrasonic Doppler Velocimeter for the measurement of liquid metal flow
Authors:
Ding-Yi Pan,
Yi-Fei Huang,
Ze Lyu,
Juan-Cheng Yang,
Ming-Jiu Ni
Abstract:
In the present study, by adopting the advantage of ultrasonic techniques, we developed a Multichannel Pulsed Ultrasonic Doppler Velocimetry (MPUDV) to measure the 2D2C velocity fields of liquid metal flow. Due to the specially designed Ultrasonic host and post-processing scheme, the MPUDV system can reach a high spatiotemporal resolution of 50 Hz and 3 mm. The flow loop contains a cavity test sect…
▽ More
In the present study, by adopting the advantage of ultrasonic techniques, we developed a Multichannel Pulsed Ultrasonic Doppler Velocimetry (MPUDV) to measure the 2D2C velocity fields of liquid metal flow. Due to the specially designed Ultrasonic host and post-processing scheme, the MPUDV system can reach a high spatiotemporal resolution of 50 Hz and 3 mm. The flow loop contains a cavity test section to ensure a classical recirculating flow was built to validate the accuracy of MPUDV in velocity field measurement. In the initial phase of the study, water with tracer particles was selected as the working liquid to ensure the velocity field measurements by the well-developed Particle Image Velocimetry (PIV). A comparison of the data obtained from the PIV and MPUDV methods revealed less than 3 differences in the 2D2C velocity field between the two techniques during simultaneous measurements of the same flow field. This finding strongly demonstrates the reliability of the MPUDV method developed in this paper. Moreover, the ternary alloy GaInSn was selected as the working liquid in the flow loop to validate the efficacy of the MPUDV in measuring 2D-2C velocity fields. A series of tests were conducted in the cavity at varying Reynolds numbers, ranging from 9103 to 24123. The measurements demonstrated that the MPUDV could accurately measure the flow structures characterized by a central primary circulation eddy and two secondary eddies in the opaque liquid metal. Furthermore, it was found that the vortex center of the primary circulating eddy and the size of the secondary eddies undergo significant alterations with varying Reynolds numbers, indicating the influence of inertial force on the flow characteristics in the recirculating flow. It is therefore demonstrated that the current MPUDV methodology is applicable for measuring a 2D2C velocity field in opaque liquid metal flows.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Dreaming is All You Need
Authors:
Mingze Ni,
Wei Liu
Abstract:
In classification tasks, achieving a harmonious balance between exploration and precision is of paramount importance. To this end, this research introduces two novel deep learning models, SleepNet and DreamNet, to strike this balance. SleepNet seamlessly integrates supervised learning with unsupervised ``sleep" stages using pre-trained encoder models. Dedicated neurons within SleepNet are embedded…
▽ More
In classification tasks, achieving a harmonious balance between exploration and precision is of paramount importance. To this end, this research introduces two novel deep learning models, SleepNet and DreamNet, to strike this balance. SleepNet seamlessly integrates supervised learning with unsupervised ``sleep" stages using pre-trained encoder models. Dedicated neurons within SleepNet are embedded in these unsupervised features, forming intermittent ``sleep" blocks that facilitate exploratory learning. Building upon the foundation of SleepNet, DreamNet employs full encoder-decoder frameworks to reconstruct the hidden states, mimicking the human "dreaming" process. This reconstruction process enables further exploration and refinement of the learned representations. Moreover, the principle ideas of our SleepNet and DreamNet are generic and can be applied to both computer vision and natural language processing downstream tasks. Through extensive empirical evaluations on diverse image and text datasets, SleepNet and DreanNet have demonstrated superior performance compared to state-of-the-art models, showcasing the strengths of unsupervised exploration and supervised precision afforded by our innovative approaches.
△ Less
Submitted 15 September, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Authors:
Minheng Ni,
Chenfei Wu,
Huaying Yuan,
Zhengyuan Yang,
Ming Gong,
Lijuan Wang,
Zicheng Liu,
Wangmeng Zuo,
Nan Duan
Abstract:
With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting the application on high-value scenarios such as of directing a film. Developing a movie director agent faces two major challenges: (1) Lack of paralle…
▽ More
With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting the application on high-value scenarios such as of directing a film. Developing a movie director agent faces two major challenges: (1) Lack of parallelism and online scheduling with production steps: In the production of multi-sensory films, there are complex dependencies between different sensory elements, and the production time for each element varies. (2) Diverse needs and clear communication demands with users: Users often cannot clearly express their needs until they see a draft, which requires human-computer interaction and iteration to continually adjust and optimize the film content based on user feedback. To address these issues, we introduce AutoDirector, an interactive multi-sensory composition framework that supports long shots, special effects, music scoring, dubbing, and lip-syncing. This framework improves the efficiency of multi-sensory film production through automatic scheduling and supports the modification and improvement of interactive tasks to meet user needs. AutoDirector not only expands the application scope of human-machine collaboration but also demonstrates the potential of AI in collaborating with humans in the role of a film director to complete multi-sensory films.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?
Authors:
Hao Shen,
Zihan Li,
Minqiang Yang,
Minghui Ni,
Yongfeng Tao,
Zhengyang Yu,
Weihao Zheng,
Chen Xu,
Bin Hu
Abstract:
In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years,…
▽ More
In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years, researches on the recognition and intervention of emotional disorders using large language models (LLMs) have been validated, providing new possibilities for psychological assistance therapy. However, are LLMs truly possible to conduct cognitive behavioral therapy? Many concerns have been raised by mental health experts regarding the use of LLMs for therapy. Seeking to answer this question, we collected real CBT corpus from online video websites, designed and conducted a targeted automatic evaluation framework involving the evaluation of emotion tendency of generated text, structured dialogue pattern and proactive inquiry ability. For emotion tendency, we calculate the emotion tendency score of the CBT dialogue text generated by each model. For structured dialogue pattern, we use a diverse range of automatic evaluation metrics to compare speaking style, the ability to maintain consistency of topic and the use of technology in CBT between different models . As for inquiring to guide the patient, we utilize PQA (Proactive Questioning Ability) metric. We also evaluated the CBT ability of the LLM after integrating a CBT knowledge base to explore the help of introducing additional knowledge to enhance the model's CBT counseling ability. Four LLM variants with excellent performance on natural language processing are evaluated, and the experimental result shows the great potential of LLMs in psychological counseling realm, especially after combining with other technological means.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
A diverse set of two-qubit gates for spin qubits in semiconductor quantum dots
Authors:
Ming Ni,
Rong-Long Ma,
Zhen-Zhen Kong,
Ning Chu,
Sheng-Kai Zhu,
Chu Wang,
Ao-Ran Li,
Wei-Zhu Liao,
Gang Cao,
Gui-Lei Wang,
Guang-Can Guo,
Xuedong Hu,
Hai-Ou Li,
Guo-Ping Guo
Abstract:
To realize large-scale quantum information processes, an ideal scheme for two-qubit operations should enable diverse operations with given hardware and physical interaction. However, for spin qubits in semiconductor quantum dots, the common two-qubit operations, including CPhase gates, SWAP gates, and CROT gates, are realized with distinct parameter regions and control waveforms, posing challenges…
▽ More
To realize large-scale quantum information processes, an ideal scheme for two-qubit operations should enable diverse operations with given hardware and physical interaction. However, for spin qubits in semiconductor quantum dots, the common two-qubit operations, including CPhase gates, SWAP gates, and CROT gates, are realized with distinct parameter regions and control waveforms, posing challenges for their simultaneous implementation. Here, taking advantage of the inherent Heisenberg interaction between spin qubits, we propose and verify a fast composite two-qubit gate scheme to extend the available two-qubit gate types as well as reduce the requirements for device properties. Apart from the formerly proposed CPhase (controlled-phase) gates and SWAP gates, theoretical results indicate that the iSWAP-family gate and Fermionic simulation (fSim) gate set are additionally available for spin qubits. Meanwhile, our gate scheme limits the parameter requirements of all essential two-qubit gates to a common J~ΔE_Z region, facilitate the simultaneous realization of them. Furthermore, we present the preliminary experimental demonstration of the composite gate scheme, observing excellent match between the measured and simulated results. With this versatile composite gate scheme, broad-spectrum two-qubit operations allow us to efficiently utilize the hardware and the underlying physics resources, helping accelerate and broaden the scope of the upcoming noise intermediate-scale quantum (NISQ) computing.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Extension of a Pattern Recognition Validation Approach for Noisy Boson Sampling
Authors:
Yang Ji,
Yongzheng Wu,
Shi Wang,
Jie Hou,
Meiling Chen,
Ming Ni
Abstract:
Boson sampling is one of the main quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering two main kinds of noises, which are photon distinguishability and photon loss. Inspired by the Bayesian validation extended to evaluate whether distinguishability is too high to demonstrate this advantage, the pattern recognition vali…
▽ More
Boson sampling is one of the main quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering two main kinds of noises, which are photon distinguishability and photon loss. Inspired by the Bayesian validation extended to evaluate whether distinguishability is too high to demonstrate this advantage, the pattern recognition validation is extended for boson sampling, considering both distinguishability and loss. Based on clusters constructed with the K means++ method, where parameters are carefully adjusted to optimize the extended validation performances, the distribution of characteristic values is nearly monotonically changed with indistinguishability, especially when photons are close to be indistinguishable. However, this regulation may be suppressed by photon loss. The intrinsic data structure of output events is analyzed through calculating probability distributions and mean 2-norm distances of the sorted outputs. An approximation algorithm is also used to show the data structure changes with noises.
△ Less
Submitted 19 August, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Responsible Visual Editing
Authors:
Minheng Ni,
Yeli Shen,
Lei Zhang,
Wangmeng Zuo
Abstract:
With recent advancements in visual synthesis, there is a growing risk of encountering images with detrimental effects, such as hate, discrimination, or privacy violations. The research on transforming harmful images into responsible ones remains unexplored. In this paper, we formulate a new task, responsible visual editing, which entails modifying specific concepts within an image to render it mor…
▽ More
With recent advancements in visual synthesis, there is a growing risk of encountering images with detrimental effects, such as hate, discrimination, or privacy violations. The research on transforming harmful images into responsible ones remains unexplored. In this paper, we formulate a new task, responsible visual editing, which entails modifying specific concepts within an image to render it more responsible while minimizing changes. However, the concept that needs to be edited is often abstract, making it challenging to locate what needs to be modified and plan how to modify it. To tackle these challenges, we propose a Cognitive Editor (CoEditor) that harnesses the large multimodal model through a two-stage cognitive process: (1) a perceptual cognitive process to focus on what needs to be modified and (2) a behavioral cognitive process to strategize how to modify. To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans. Experiments demonstrate that CoEditor can effectively comprehend abstract concepts within complex scenes and significantly surpass the performance of baseline models for responsible visual editing. We find that the AltBear dataset corresponds well to the harmful content found in real images, offering a consistent experimental evaluation, thereby providing a safer benchmark for future research. Moreover, CoEditor also shows great results in general editing. We release our code and dataset at https://github.com/kodenii/Responsible-Visual-Editing.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Reversible Jump Attack to Textual Classifiers with Modification Reduction
Authors:
Mingze Ni,
Zhensu Sun,
Wei Liu
Abstract:
Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack succe…
▽ More
Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis-Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis-Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization
Authors:
Jiyao Li,
Mingze Ni,
Yifei Dong,
Tianqing Zhu,
Wei Liu
Abstract:
Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. This paper presents a novel adversarial attack strategy, AICAttack (Attention-ba…
▽ More
Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. This paper presents a novel adversarial attack strategy, AICAttack (Attention-based Image Captioning Attack), designed to attack image captioning models through subtle perturbations on images. Operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We introduce an attention-based candidate selection mechanism that identifies the optimal pixels to attack, followed by a customised differential evolution method to optimise the perturbations of pixels' RGB values. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models. The experimental results demonstrate that our method outperforms current leading-edge techniques by achieving consistently higher attack success rates.
△ Less
Submitted 11 December, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model
Authors:
Saizhuo Wang,
Hang Yuan,
Lionel M. Ni,
Jian Guo
Abstract:
Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a pr…
▽ More
Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. Empirical results showcase QuantAgent's capability in uncovering viable financial signals and enhancing the accuracy of financial forecasts.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Authors:
Zecheng Tang,
Chenfei Wu,
Zekai Zhang,
Mingheng Ni,
Shengming Yin,
Yu Liu,
Zhengyuan Yang,
Lijuan Wang,
Zicheng Liu,
Juntao Li,
Nan Duan
Abstract:
To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natura…
▽ More
To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance
Authors:
Shufang Zhang,
Minxue Ni,
Lei Wang,
Wenxin Ding,
Shuai Chen,
Yuhong Liu
Abstract:
The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on scenarios. Taking images as guiding conditions of the diffusion model, this paper proposes a brand new personalized virtual try-on model (PE-VITON), w…
▽ More
The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on scenarios. Taking images as guiding conditions of the diffusion model, this paper proposes a brand new personalized virtual try-on model (PE-VITON), which uses the two stages (shape control and texture guidance) to decouple the clothing attributes. Specifically, the proposed model adaptively matches the clothing to human body parts through the Shape Control Module (SCM) to mitigate the misalignment of the clothing and the human body parts. The semantic information of the input clothing is parsed by the Texture Guided Module (TGM), and the corresponding texture is generated by directional guidance. Therefore, this model can effectively solve the problems of weak reduction of clothing folds, poor generation effect under complex human posture, blurred edges of clothing, and unclear texture styles in traditional try-on methods. Meanwhile, the model can automatically enhance the generated clothing folds and textures according to the human posture, and improve the authenticity of virtual try-on. In this paper, qualitative and quantitative experiments are carried out on high-resolution paired and unpaired datasets, the results show that the proposed model outperforms the state-of-the-art model.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Discovery and Timing of Millisecond Pulsars in the Globular Cluster M5 (NGC 5904) with FAST and Arecibo
Authors:
Lei Zhang,
Paulo C. C. Freire,
Alessandro Ridolfi,
Zhichen Pan,
Jiaqi Zhao,
Craig O. Heinke,
Jianxing Chen,
Mario Cadelano,
Cristina Pallanca,
Xian Hou,
Xiaoting Fu,
Shi Dai,
Erbil Gugercinoglu,
Meng Guo,
Jason Hessels,
Jiale Hu,
Guodong Li,
Mengmeng Ni,
Jingshan Pan,
Scott M. Ransom,
Qitong Ruan,
Ingrid Stairs,
Chao-Wei Tsai,
Pei Wang,
Long Wang
, et al. (7 additional authors not shown)
Abstract:
We report on a comprehensive multi-wavelength study of the pulsars in the globular cluster (GC) M5, including the discovery of M5G, a new compact non-eclipsing "black widow" pulsar. Thanks to the analysis of 34 years of radio data taken with the FAST and Arecibo telescopes, we obtained new phase-connected timing solutions for four pulsars in the clusters and improved those of the other three known…
▽ More
We report on a comprehensive multi-wavelength study of the pulsars in the globular cluster (GC) M5, including the discovery of M5G, a new compact non-eclipsing "black widow" pulsar. Thanks to the analysis of 34 years of radio data taken with the FAST and Arecibo telescopes, we obtained new phase-connected timing solutions for four pulsars in the clusters and improved those of the other three known pulsars. These have resulted in, among other things: a) much improved proper motions for five pulsars, with transverse velocities that are smaller than their respective escape velocities; b) 3-sigma and 1.5-sigma detections of Shapiro delays in M5F and M5D, respectively; c) greatly improved measurement of the periastron advance in M5B, whose value of 0.01361(6) implies that M5B is still likely to be a heavy neutron star. The binary pulsars M5D, E and F are confirmed to be in low-eccentricity binary systems, the low-mass companions of which are newly identified to be He white dwarfs using Hubble Space Telescope data. Four pulsars are also found to be associated with X-ray sources. Similarly to the eclipsing pulsar M5C, M5G shows little or no non-thermal X-ray emission, indicative of weak synchrotron radiation produced by intra-binary shocks. All the seven pulsars known in M5 have short spin periods and five are in binary systems with low orbital eccentricities. These characteristics differ from the overall GC pulsar population, but confirm the expectations for the pulsar population in a cluster with a small rate of stellar encounters per binary system.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
A novel control strategy to neutralize heat source within solid oxide electrolysis cell (SOEC) under variable solar power conditions
Authors:
Zhaojian Liang,
Shanlin Chen,
Meng Ni,
Jingyi Wang,
Mengying Li
Abstract:
The integration of a solid oxide electrolysis cell (SOEC) with a photovoltaic (PV) system presents a viable method for storing variable solar energy through the production of green hydrogen. To ensure the SOEC's safety and longevity amidst dramatic fluctuations in solar power, control strategies are needed to limit the temperature gradients and rates of temperature change within the SOEC. Recogniz…
▽ More
The integration of a solid oxide electrolysis cell (SOEC) with a photovoltaic (PV) system presents a viable method for storing variable solar energy through the production of green hydrogen. To ensure the SOEC's safety and longevity amidst dramatic fluctuations in solar power, control strategies are needed to limit the temperature gradients and rates of temperature change within the SOEC. Recognizing that the reactant supply influences the current, a novel control strategy is developed to modulate heat generation in the SOEC by adjusting the fuel flow rate. The effectiveness of this strategy is assessed through numerical simulations conducted on a coupled PV-SOEC system using actual solar irradiance data, recorded at two-second intervals, to account for rapid changes in solar exposure. The results indicate that conventional control strategies, which increase airflow rates, are inadequate in effectively suppressing the rate of temperature variation in scenarios of drastic solar power changes. In contrast, our proposed strategy demonstrates successful management of the SOEC's heat generation, thereby reducing the temperature gradient and rate of variation within the SOEC to below 5 K/cm and 1 K/min, respectively.
△ Less
Submitted 4 February, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
LpiCT: A logic security analysis framework for protocols
Authors:
Fusheng Wu,
Jinhui Liu,
Yanbing Li,
Mingtao Ni
Abstract:
The pi calculus is a basic theory of mobile communication based on the notion of interaction, which, aimed at analyzing and modelling the behaviors of communication process in communicating and mobile systems, is widely applied to the security analysis of cryptographic protocol's design and implementation. But the pi calculus does not provide perfect logic security analysis, so the logic flaws in…
▽ More
The pi calculus is a basic theory of mobile communication based on the notion of interaction, which, aimed at analyzing and modelling the behaviors of communication process in communicating and mobile systems, is widely applied to the security analysis of cryptographic protocol's design and implementation. But the pi calculus does not provide perfect logic security analysis, so the logic flaws in the design and the implementation of a cryptographic protocol can not be discovered in time. The aim is to analyze whether there are logic flaws in the design and the implementation of a cryptographic protocol, so as to ensure the security of the cryptographic protocol when it is encoded into a software and implemented. This paper introduces logic rules and proofs, binary tree and the KMP algorithm, and proposes a new extension the pi calculus theory, a logic security analysis framework and an algorithm. This paper presents the logic security proof and analysis of TLS1.3 protocol's interactional implementation process. Empirical results show that the new extension theory, the logic security analysis framework and the algorithm can effectively analyze whether there are logic flaws in the design and the implementation of a cryptographic protocol. The security of cryptographic protocols depends not only on cryptographic primitives, but also on the coding of cryptographic protocols and the environment in which they are implemented. The security analysis framework of cryptographic protocol implementation proposed in this paper can ensure the security of protocol implementation.
△ Less
Submitted 1 November, 2023;
originally announced December 2023.
-
A SWAP Gate for Spin Qubits in Silicon
Authors:
Ming Ni,
Rong-Long Ma,
Zhen-Zhen Kong,
Xiao Xue,
Sheng-Kai Zhu,
Chu Wang,
Ao-Ran Li,
Ning Chu,
Wei-Zhu Liao,
Gang Cao,
Gui-Lei Wang,
Guang-Can Guo,
Xuedong Hu,
Hong-Wen Jiang,
Hai-Ou Li,
Guo-Ping Guo
Abstract:
With one- and two-qubit gate fidelities approaching the fault-tolerance threshold for spin qubits in silicon, how to scale up the architecture and make large arrays of spin qubits become the more pressing challenges. In a scaled-up structure, qubit-to-qubit connectivity has crucial impact on gate counts of quantum error correction and general quantum algorithms. In our toolbox of quantum gates for…
▽ More
With one- and two-qubit gate fidelities approaching the fault-tolerance threshold for spin qubits in silicon, how to scale up the architecture and make large arrays of spin qubits become the more pressing challenges. In a scaled-up structure, qubit-to-qubit connectivity has crucial impact on gate counts of quantum error correction and general quantum algorithms. In our toolbox of quantum gates for spin qubits, SWAP gate is quite versatile: it can help solve the connectivity problem by realizing both short- and long-range spin state transfer, and act as a basic two-qubit gate, which can reduce quantum circuit depth when combined with other two-qubit gates. However, for spin qubits in silicon quantum dots, high fidelity SWAP gates have not been demonstrated due to the requirements of large circuit bandwidth and a highly adjustable ratio between the strength of the exchange coupling J and the Zeeman energy difference Delta E_z. Here we demonstrate a fast SWAP gate with a duration of ~25 ns based on quantum dots in isotopically enriched silicon, with a highly adjustable ratio between J and Delta E_z, for over two orders of magnitude in our device. We are also able to calibrate the single-qubit local phases during the SWAP gate by incorporating single-qubit gates in our circuit. By independently reading out the qubits, we probe the anti-correlations between the two spins, estimate the operation fidelity and analyze the dominant error sources for our SWAP gate. These results pave the way for high fidelity SWAP gates, and processes based on them, such as quantum communication on chip and quantum simulation by engineering the Heisenberg Hamiltonian in silicon.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Single spin qubit geometric gate in a silicon quantum dot
Authors:
Rong-Long Ma,
Ao-Ran Li,
Chu Wang,
Zhen-Zhen Kong,
Wei-Zhu Liao,
Ming Ni,
Sheng-Kai Zhu,
Ning Chu,
Cheng-Xian Zhang,
Di Liu,
Gang Cao,
Gui-Lei Wang,
Hai-Ou Li,
Guo-Ping Guo
Abstract:
Preserving qubit coherence and maintaining high-fidelity qubit control under complex noise environment is an enduring challenge for scalable quantum computing. Here we demonstrate an addressable fault-tolerant single spin qubit with an average control fidelity of 99.12% via randomized benchmarking on a silicon quantum dot device with an integrated micromagnet. Its dephasing time T2* is 1.025 us an…
▽ More
Preserving qubit coherence and maintaining high-fidelity qubit control under complex noise environment is an enduring challenge for scalable quantum computing. Here we demonstrate an addressable fault-tolerant single spin qubit with an average control fidelity of 99.12% via randomized benchmarking on a silicon quantum dot device with an integrated micromagnet. Its dephasing time T2* is 1.025 us and can be enlarged to 264 us by using the Hahn echo technique, reflecting strong low-frequency noise in our system. To break through the noise limitation, we introduce geometric quantum computing to obtain high control fidelity by exploiting its noise-resilient feature. However, the control fidelities of the geometric quantum gates are lower than 99%. According to our simulation, the noise-resilient feature of geometric quantum gates is masked by the heating effect. With further optimization to alleviate the heating effect, geometric quantum computing can be a potential approach to reproducibly achieving high-fidelity qubit control in a complex noise environment.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Singlet-triplet-state readout in silicon-metal-oxide-semiconductor double quantum dots
Authors:
Rong-Long Ma,
Sheng-Kai Zhu,
Zhen-Zhen Kong,
Tai-Ping Sun,
Ming Ni,
Yu-Chen Zhou,
Yuan Zhou,
Gang Luo,
Gang Cao,
Gui-Lei Wang,
Hai-Ou Li,
Guo-Ping Guo
Abstract:
High-fidelity singlet-triplet state readout is essential for large-scale quantum computing. However, the widely used threshold method of comparing a mean value with the fixed threshold will limit the judgment accuracy, especially for the relaxed triplet state, under the restriction of relaxation time and signal-to-noise ratio. Here, we achieve an enhanced latching readout based on Pauli spin block…
▽ More
High-fidelity singlet-triplet state readout is essential for large-scale quantum computing. However, the widely used threshold method of comparing a mean value with the fixed threshold will limit the judgment accuracy, especially for the relaxed triplet state, under the restriction of relaxation time and signal-to-noise ratio. Here, we achieve an enhanced latching readout based on Pauli spin blockade in a Si-MOS double quantum dot device and demonstrate an average singlet-triplet state readout fidelity of 97.59% by the threshold method. We reveal the inherent deficiency of the threshold method for the relaxed triplet state classification and introduce machine learning as a relaxation-independent readout method to reduce the misjudgment. The readout fidelity for classifying the simulated single-shot traces can be improved to 99.67% by machine learning method, better than the threshold method of 97.54% which is consistent with the experimental result. This work indicates that machine learning method can be a strong potential candidate for alleviating the restrictions of stably achieving high-fidelity and high-accuracy singlet-triplet state readout in large-scale quantum computing.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Correcting on-chip distortion of control pulses with silicon spin qubits
Authors:
Ming Ni,
Rong-Long Ma,
Zhen-Zhen Kong,
Ning Chu,
Wei-Zhu Liao,
Sheng-Kai Zhu,
Chu Wang,
Gang Luo,
Di Liu,
Gang Cao,
Gui-Lei Wang,
Hai-Ou Li,
Guo-Ping Guo
Abstract:
Pulse distortion, as one of the coherent error sources, hinders the characterization and control of qubits. In the semiconductor quantum dot system, the distortions on measurement pulses and control pulses disturb the experimental results, while no effective calibration procedure has yet been reported. Here, we demonstrate two different calibration methods to calibrate and correct the distortion u…
▽ More
Pulse distortion, as one of the coherent error sources, hinders the characterization and control of qubits. In the semiconductor quantum dot system, the distortions on measurement pulses and control pulses disturb the experimental results, while no effective calibration procedure has yet been reported. Here, we demonstrate two different calibration methods to calibrate and correct the distortion using the two-qubit system as a detector. The two calibration methods have different correction accuracy and complexity. One is the coarse predistortion (CPD) method, with which the distortion is partly relieved. The other method is the all predistortion (APD) method, with which we measure the transfer function and significantly improve the exchange oscillation homogeneity. The two methods use the exchange oscillation homogeneity as the metric and are appropriate for any qubit that oscillates with a diabatic pulse. With the APD procedure, an arbitrary control waveform can be accurately delivered to the device, which is essential for characterizing qubits and improving gate fidelity.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Authors:
Minheng Ni,
Yabo Zhang,
Kailai Feng,
Xiaoming Li,
Yiwen Guo,
Wangmeng Zuo
Abstract:
Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly focus on using pre-trained discriminative models (e.g., CLIP). However, we have observed that generative models (e.g., Stable Diffusion) have potentially understoo…
▽ More
Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly focus on using pre-trained discriminative models (e.g., CLIP). However, we have observed that generative models (e.g., Stable Diffusion) have potentially understood the relationships between various visual elements and text descriptions, which are rarely investigated in this task. In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models. We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models. When we combine both generative and discriminative models, our Ref-Diff outperforms these competing methods by a significant margin. This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation. Our code is publicly available at https://github.com/kodenii/Ref-Diff.
△ Less
Submitted 1 September, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
ORES: Open-vocabulary Responsible Visual Synthesis
Authors:
Minheng Ni,
Chenfei Wu,
Xiaodong Wang,
Shengming Yin,
Lijuan Wang,
Zicheng Liu,
Nan Duan
Abstract:
Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avo…
▽ More
Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment
Authors:
Saizhuo Wang,
Hang Yuan,
Leon Zhou,
Lionel M. Ni,
Heung-Yeung Shum,
Jian Guo
Abstract:
One of the most important tasks in quantitative investment research is mining new alphas (effective trading signals or factors). Traditional alpha mining methods, either hand-crafted factor synthesizing or algorithmic factor mining (e.g., search with genetic programming), have inherent limitations, especially in implementing the ideas of quants. In this work, we propose a new alpha mining paradigm…
▽ More
One of the most important tasks in quantitative investment research is mining new alphas (effective trading signals or factors). Traditional alpha mining methods, either hand-crafted factor synthesizing or algorithmic factor mining (e.g., search with genetic programming), have inherent limitations, especially in implementing the ideas of quants. In this work, we propose a new alpha mining paradigm by introducing human-AI interaction, and a novel prompt engineering algorithmic framework to implement this paradigm by leveraging the power of large language models. Moreover, we develop Alpha-GPT, a new interactive alpha mining system framework that provides a heuristic way to ``understand'' the ideas of quant researchers and outputs creative, insightful, and effective alphas. We demonstrate the effectiveness and advantage of Alpha-GPT via a number of alpha mining experiments.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Flow states and heat transport in liquid metal convection
Authors:
Lei Ren,
Xin Tao,
Lu Zhang,
Ming-Jiu Ni,
Ke-Qing Xia,
Yi-Chao Xie
Abstract:
We present an experimental study of Rayleigh-Bénard convection using liquid metal alloy gallium-indium-tin as the working fluid with a Prandtl number of $Pr=0.029$. The flow state and the heat transport were measured in a Rayleigh number range of $1.2\times10^{4} \le Ra \le 1.3\times10^{7}$. The temperature fluctuation at the cell centre is used as a proxy for the flow state. It is found that, as…
▽ More
We present an experimental study of Rayleigh-Bénard convection using liquid metal alloy gallium-indium-tin as the working fluid with a Prandtl number of $Pr=0.029$. The flow state and the heat transport were measured in a Rayleigh number range of $1.2\times10^{4} \le Ra \le 1.3\times10^{7}$. The temperature fluctuation at the cell centre is used as a proxy for the flow state. It is found that, as $Ra$ increases from the lower end of the parameter range, the flow evolves from a convection state to an oscillation state, a chaotic state, and finally a turbulent state for $Ra>10^5$. The study suggests that the large-scale circulation in the turbulent state is a residual of the cell structures near the onset of convection, which is in contrast with the case of $Pr\sim1$, where the cell structure is replaced by high-order flow modes transiently before the emergence of the large-scale circulation in the turbulent state. The evolution of the flow state is also reflected by the heat transport characterised by the Nusselt number $Nu$ and the probability density function (PDF) of the temperature fluctuation at the cell centre. It is found that the effective local heat transport scaling exponent $γ$, i.e., $Nu\sim Ra^γ$, changes continuously from $γ=0.49$ at $Ra\sim 10^4$ to $γ=0.25$ for $Ra>10^6$. Meanwhile, the PDF at the cell centre gradually evolves from a Gaussian-like shape before the transition to turbulence to an exponential-like shape in the turbulent state. For $Ra>10^6$, the flow shows self-similar behaviour, which is revealed by the universal shape of the PDF of the temperature fluctuation at the cell centre and a $Nu=0.19Ra^{0.25}$ scaling for the heat transport.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
Authors:
Jiashuo Sun,
Chengjin Xu,
Lumingyuan Tang,
Saizhuo Wang,
Chen Lin,
Yeyun Gong,
Lionel M. Ni,
Heung-Yeung Shum,
Jian Guo
Abstract:
Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``…
▽ More
Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
△ Less
Submitted 24 March, 2024; v1 submitted 14 July, 2023;
originally announced July 2023.