Search | arXiv e-print repository

Global resistive MHD accretion flows around spinning AGNs: impact of resistivity on MAD state

Authors: Ramiz Aktar, Kuo-Chuan Pan, Toru Okuda

Abstract: In this study, we investigate the effect of resistivity on the dynamics of global magnetohydrodynamic accretion flows (Res-MHD) around a spinning supermassive black hole. We perform a comparative study of 2D and 3D resistive models around black holes. We examine accretion flow dynamics considering globally uniform resistivity values, ranging from $\sim 0$ to 0.1. During the simulation time of… ▽ More In this study, we investigate the effect of resistivity on the dynamics of global magnetohydrodynamic accretion flows (Res-MHD) around a spinning supermassive black hole. We perform a comparative study of 2D and 3D resistive models around black holes. We examine accretion flow dynamics considering globally uniform resistivity values, ranging from $\sim 0$ to 0.1. During the simulation time of $t \lesssim 1000~t_g$, we find that the mass accretion rate is comparable for both the 2D and 3D models. However, as the flow becomes increasingly turbulent, non-axisymmetric effects begin to dominate, resulting in significant differences in the mass accretion rates between the 3D and 2D. All the resistive models in a highly magnetized flow belong to the Magnetically Arrested Disk (MAD) state. We propose an efficient and physically motivated approach to examine the magnetic state by estimating the spatial average plasma beta parameter across the computational domain. We find that when the average plasma beta is close to or below unity $( β_{\text{ave}} \lesssim 1 )$, the accretion flow enters the MAD state. Additionally, we find that high-resistivity flow reduces magnetorotational instability (MRI) turbulence in the accretion flow, while the turbulence structures remain qualitatively similar in low-resistivity flows. Moreover, we observe indications of plasmoid formations in low-resistivity flow compared to high-resistivity flow. Furthermore, we do not find a clear relationship between the variability of the accretion rate, magnetic flux, and resistivity. Lastly, our findings suggest that low-resistivity models produce higher power jets than those with higher resistivity. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: 17 pages, 9 figures, 2 tables; Accepted for publication in ApJ

arXiv:2505.19917 [pdf, other]

Robust self-testing and certified randomness based on chained Bell inequality

Authors: Rajdeep Paul, Sneha Munshi, Alok Kumar Pan

Abstract: Self-testing is the strongest certification procedure that uniquely characterizes the physical system based on the observed statistics, without any knowledge of the inner workings of the devices. The optimal quantum violation of a Bell inequality enables such a device-independent (DI) self-testing of the source and the measurement devices. In this work, we demonstrate the DI self-testing based on… ▽ More Self-testing is the strongest certification procedure that uniquely characterizes the physical system based on the observed statistics, without any knowledge of the inner workings of the devices. The optimal quantum violation of a Bell inequality enables such a device-independent (DI) self-testing of the source and the measurement devices. In this work, we demonstrate the DI self-testing based on the chained Bell inequality. We devise an elegant sum-of-squares (SOS) technique enabling dimension-independent optimization of the quantum violation. Our approach enables the derivation of the state along with the relationship between the local observables directly from the optimization condition. This improves the previous methods of self-testing by deriving the state from the self-testing relation instead of assuming that the state is restricted to a two-qubit system. One significant aspect is the robustness of such self-testing in real experimental situations involving noise and imperfection, leading to deviation from the optimal quantum violation. We provide an analytical technique for robust self-testing in the presence of noise. As an application of our scheme, we demonstrate the generation of two bit DI randomness and analyze the robustness of such randomness. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19581 [pdf, other]

Self-testing in a constrained prepare-measure scenario sans assuming quantum dimension

Authors: Ritesh K. Singh, Souradeep Sasmal, S. Nautiyal, A. K. Pan

Abstract: We present a device-independent (DI) self-testing protocol in a constrained prepare-measure scenario, based on the $n-$bit parity-oblivious multiplexing (POM) task. In this scenario, a parity-oblivious constraint is imposed on the preparations, allowing us to define a classical bound derived from a preparation noncontextual ontological model. We derive the optimal quantum success probability in th… ▽ More We present a device-independent (DI) self-testing protocol in a constrained prepare-measure scenario, based on the $n-$bit parity-oblivious multiplexing (POM) task. In this scenario, a parity-oblivious constraint is imposed on the preparations, allowing us to define a classical bound derived from a preparation noncontextual ontological model. We derive the optimal quantum success probability in the POM task devoid of assuming the dimension of the quantum system, an essential step towards DI self-testing, which has hitherto not been demonstrated in prepare-measure scenario. We demonstrate that the optimal quantum value exceeds preparation noncontextual bound and, as a result, this establishes DI self-testing of the preparations and the measurement devices. Furthermore, by explicitly constructing the required unitaries, we show that the optimal preparations and measurements in an unknown but finite dimensional Hilbert space, responsible for the observed input-output correlations, can be mapped, via an unitary, onto a known finite-dimensional quantum system. Our results thus pave the way for scalable, single system based DI certification protocols in the prepare-measure scenario. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.07907 [pdf, other]

An entropy for Boolean independence

Authors: Kewei Pan

Abstract: In this article, we aim to define a Boolean entropy notion parallel to the framework of free entropy proposed by Voiculescu. Motivated by the work of Lenczewski and the work of Cébron & Gillers, we mainly investigated two random matrix models (the Gaussian Symmetric Block model and the Conditioned GUE model), in which asymptotic Boolean independence appears. We showed a large deviation principle f… ▽ More In this article, we aim to define a Boolean entropy notion parallel to the framework of free entropy proposed by Voiculescu. Motivated by the work of Lenczewski and the work of Cébron & Gillers, we mainly investigated two random matrix models (the Gaussian Symmetric Block model and the Conditioned GUE model), in which asymptotic Boolean independence appears. We showed a large deviation principle for both models. As a result, the two rate functions coincide up to scaling and are minimized by the Rademacher distribution. Therefore, we refer to the logarithmic integral in the rate function as Boolean entropy. Finally, we proved this logarithmic integral is maximized by the Rademacher distribution and monotone along the Boolean Central Limit Theorem. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07538 [pdf, other]

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Authors: Bohan Wang, Zhongqi Yue, Fengda Zhang, Shuo Chen, Li'an Bi, Junzhe Zhang, Xue Song, Kennard Yanting Chan, Jiachun Pan, Weijia Wu, Mingze Zhou, Wang Lin, Kaihang Pan, Saining Zhang, Liyu Jia, Wentao Hu, Wei Zhao, Hanwang Zhang

Abstract: We completely discard the conventional spatial prior in image representation and introduce a novel discrete visual tokenizer: Self-consistency Tokenizer (Selftok). At its design core, we compose an autoregressive (AR) prior -- mirroring the causal structure of language -- into visual tokens by using the reverse diffusion process of image generation. The AR property makes Selftok fundamentally dist… ▽ More We completely discard the conventional spatial prior in image representation and introduce a novel discrete visual tokenizer: Self-consistency Tokenizer (Selftok). At its design core, we compose an autoregressive (AR) prior -- mirroring the causal structure of language -- into visual tokens by using the reverse diffusion process of image generation. The AR property makes Selftok fundamentally distinct from traditional spatial tokens in the following two key ways: - Selftok offers an elegant and minimalist approach to unify diffusion and AR for vision-language models (VLMs): By representing images with Selftok tokens, we can train a VLM using a purely discrete autoregressive architecture -- like that in LLMs -- without requiring additional modules or training objectives. - We theoretically show that the AR prior satisfies the Bellman equation, whereas the spatial prior does not. Therefore, Selftok supports reinforcement learning (RL) for visual generation with effectiveness comparable to that achieved in LLMs. Besides the AR property, Selftok is also a SoTA tokenizer that achieves a favorable trade-off between high-quality reconstruction and compression rate. We use Selftok to build a pure AR VLM for both visual comprehension and generation tasks. Impressively, without using any text-image training pairs, a simple policy gradient RL working in the visual tokens can significantly boost the visual generation benchmark, surpassing all the existing models by a large margin. Therefore, we believe that Selftok effectively addresses the long-standing challenge that visual tokens cannot support effective RL. When combined with the well-established strengths of RL in LLMs, this brings us one step closer to realizing a truly multimodal LLM. Project Page: https://selftok-team.github.io/report/. △ Less

Submitted 27 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07062 [pdf, ps, other]

Seed1.5-VL Technical Report

Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428) △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.04620 [pdf, other]

On Path to Multimodal Generalist: General-Level and General-Bench

Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu , et al. (7 additional authors not shown)

Abstract: The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expande… ▽ More The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expanded from coarse-grained to fine-grained multimodal understanding and from supporting limited modalities to arbitrary ones. While many benchmarks exist to assess MLLMs, a critical question arises: Can we simply assume that higher performance across tasks indicates a stronger MLLM capability, bringing us closer to human-level AI? We argue that the answer is not as straightforward as it seems. This project introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality, offering a methodology to compare MLLMs and gauge the progress of existing systems towards more robust multimodal generalists and, ultimately, towards AGI. At the core of the framework is the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation, and across multiple modalities. To support this evaluation, we present General-Bench, which encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325,800 instances. The evaluation results that involve over 100 existing state-of-the-art MLLMs uncover the capability rankings of generalists, highlighting the challenges in reaching genuine AI. We expect this project to pave the way for future research on next-generation multimodal foundation models, providing a robust infrastructure to accelerate the realization of AGI. Project page: https://generalist.top/ △ Less

Submitted 7 May, 2025; originally announced May 2025.

Comments: ICML'25, 305 pages, 115 tables, 177 figures, project page: https://generalist.top/

arXiv:2505.03872 [pdf, ps, other]

The eROSITA Final Equatorial Depth Survey (eFEDS): SDSS spectroscopic observations of X-ray sources

Authors: Catarina Aydar, Andrea Merloni, Tom Dwelly, Johan Comparat, Mara Salvato, Johannes Buchner, Marcella Brusa, Teng Liu, Julien Wolf, Scott F. Anderson, Carolina P. Andonie, Franz Erik Bauer, Michael R. Blanton, William Nielsen Brandt, Yaherlyn Díaz, Lorena Hernandez-García, Dong-Woo Kim, Takamitsu Miyaji, Sean Morrison, Blessing Musiimenta, Castalia Alenka Negrete, Qingling Ni, Claudio Ricci, Donald P. Schneider, Axel Schwope , et al. (23 additional authors not shown)

Abstract: We present one of the largest uniform optical spectroscopic surveys of X-ray selected sources to date that were observed as a pilot study for the Black Hole Mapper (BHM) survey. The BHM program of the Sloan Digital Sky Survey (SDSS)-V is designed to provide optical spectra for hundreds of thousands of X-ray selected sources from the SRG/eROSITA all-sky survey. This significantly improves our abili… ▽ More We present one of the largest uniform optical spectroscopic surveys of X-ray selected sources to date that were observed as a pilot study for the Black Hole Mapper (BHM) survey. The BHM program of the Sloan Digital Sky Survey (SDSS)-V is designed to provide optical spectra for hundreds of thousands of X-ray selected sources from the SRG/eROSITA all-sky survey. This significantly improves our ability to classify and characterise the physical properties of large statistical populations of X-ray emitting objects. Our sample consists of 13079 sources in the eROSITA eFEDS performance verification field, 12011 of which provide reliable redshifts from 0<z<5.8. The vast majority of these objects were detected as point-like sources (X-ray flux limit F(0.5-2 keV)>6.5x10^-15 erg/s/cm^2) and were observed for about 20 years with fibre-fed SDSS spectrographs. After including all available redshift information for the eFEDS sources from the dedicated SDSS-V plate programme and archival data, we visually inspected the SDSS optical spectra to verify the reliability of these redshift measurements and the performance of the SDSS pipeline. The visual inspection allowed us to recover reliable redshifts (for 99% of the spectra with a signal-to-noise ratio of >2) and to assign classes to the sources, and we confirm that the vast majority of our sample consists of active galactic nuclei (AGNs). Only ~3% of the eFEDS/SDSS sources are Galactic objects. We also show the diversity of the optical spectra of the X-ray selected AGNs and provide spectral stacks with a high signal-to-noise ratio in various sub-samples with different redshift and optical broad-band colours. Our AGN sample contains optical spectra of (broad-line) quasars, narrow-line galaxies, and optically passive galaxies. It is considerably diverse in its colours and in its levels of nuclear obscuration. △ Less

Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

arXiv:2504.17462 [pdf, other]

Measuring short-range correlations and quasi-elastic cross sections in A(e,e') at x>1 and modest Q$^2$

Authors: Y. P. Zhang, Z. H. Ye, D. Nguyen, P. Aguilera, Z. Ahmed, H. Albataineh, K. Allada, B. Anderson, D. Anez, K. Aniol, J. Annand, J. Arrington, T. Averett, H. Baghdasaryan, X. Bai, A. Beck, S. Beck, V. Bellini, F. Benmokhtar, A. Camsonne, C. Chen, J. -P. Chen, K. Chirapatpimol, E. Cisbani, S. Covrig Dusa , et al. (74 additional authors not shown)

Abstract: We present results from the Jefferson Lab E08-014 experiment, investigating short-range correlations (SRC) through measurements of absolute inclusive quasi-elastic cross sections and their ratios. This study utilized 3.356 GeV electrons scattered off targets including $^2$H, $^3$He, $^4$He, $^{12}$C, $^{40}$Ca, and $^{48}$Ca, at modest momentum transfers ($1.3 < Q^2 \leq 2$ GeV$^2$). Kinematics we… ▽ More We present results from the Jefferson Lab E08-014 experiment, investigating short-range correlations (SRC) through measurements of absolute inclusive quasi-elastic cross sections and their ratios. This study utilized 3.356 GeV electrons scattered off targets including $^2$H, $^3$He, $^4$He, $^{12}$C, $^{40}$Ca, and $^{48}$Ca, at modest momentum transfers ($1.3 < Q^2 \leq 2$ GeV$^2$). Kinematics were selected to enhance the cross-section contribution from high-momentum nucleons originating from the strongly interacting, short-distance components of two-nucleon SRCs (2N-SRCs), known to exhibit a universal structure across both light and heavy nuclei.We analyzed the A/$^2$H ratio within the region dominated by 2N-SRCs to characterize the nuclear dependence of SRC contributions across various nuclei. Additionally, the A/$^3$He ratio was examined at kinematics sensitive to nucleons with even higher momentum, aiming to identify signals indicative of three-nucleon SRCs (3N-SRCs). The traditional analysis method in the expected 3N-SRC region ($x > 2$) did not yield a clear plateau; instead, the data diverged from the predicted 3N-SRC behavior as momentum transfer increased. However, when analyzed in terms of the struck nucleon's light-cone momentum, the data exhibited the opposite trend, progressively approaching the predicted 3N-SRC plateau. These observations suggest that future measurements at higher energies may facilitate a definitive isolation and identification of 3N-SRCs. △ Less

Submitted 24 April, 2025; originally announced April 2025.

arXiv:2504.15932 [pdf, other]

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning

Authors: Wang Lin, Liyu Jia, Wentao Hu, Kaihang Pan, Zhongqi Yue, Wei Zhao, Jingyuan Chen, Fei Wu, Hanwang Zhang

Abstract: Despite recent progress in video generation, producing videos that adhere to physical laws remains a significant challenge. Traditional diffusion-based methods struggle to extrapolate to unseen physical conditions (eg, velocity) due to their reliance on data-driven approximations. To address this, we propose to integrate symbolic reasoning and reinforcement learning to enforce physical consistency… ▽ More Despite recent progress in video generation, producing videos that adhere to physical laws remains a significant challenge. Traditional diffusion-based methods struggle to extrapolate to unseen physical conditions (eg, velocity) due to their reliance on data-driven approximations. To address this, we propose to integrate symbolic reasoning and reinforcement learning to enforce physical consistency in video generation. We first introduce the Diffusion Timestep Tokenizer (DDT), which learns discrete, recursive visual tokens by recovering visual attributes lost during the diffusion process. The recursive visual tokens enable symbolic reasoning by a large language model. Based on it, we propose the Phys-AR framework, which consists of two stages: The first stage uses supervised fine-tuning to transfer symbolic knowledge, while the second stage applies reinforcement learning to optimize the model's reasoning abilities through reward functions based on physical conditions. Our approach allows the model to dynamically adjust and improve the physical properties of generated videos, ensuring adherence to physical laws. Experimental results demonstrate that PhysAR can generate videos that are physically consistent. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.14666 [pdf, other]

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Authors: Kaihang Pan, Wang Lin, Zhongqi Yue, Tenglong Ao, Liyu Jia, Wei Zhao, Juncheng Li, Siliang Tang, Hanwang Zhang

Abstract: Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation by combining LLM and diffusion models, the state-of-the-art in each task, respectively. Existing approaches rely on spatial visual tokens, where image patches are encoded and arranged according to a spatial order (e.g., raster scan). However, we show that spatial tokens lack the recursive… ▽ More Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation by combining LLM and diffusion models, the state-of-the-art in each task, respectively. Existing approaches rely on spatial visual tokens, where image patches are encoded and arranged according to a spatial order (e.g., raster scan). However, we show that spatial tokens lack the recursive structure inherent to languages, hence form an impossible language for LLM to master. In this paper, we build a proper visual language by leveraging diffusion timesteps to learn discrete, recursive visual tokens. Our proposed tokens recursively compensate for the progressive attribute loss in noisy images as timesteps increase, enabling the diffusion model to reconstruct the original image at any timestep. This approach allows us to effectively integrate the strengths of LLMs in autoregressive reasoning and diffusion models in precise image generation, achieving seamless multimodal comprehension and generation within a unified framework. Extensive experiments show that we achieve superior performance for multimodal comprehension and generation simultaneously compared with other MLLMs. Project Page: https://DDT-LLaMA.github.io/. △ Less

Submitted 20 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR 2025 (Oral)

arXiv:2504.01260 [pdf, other]

The Social Life of Industrial Arms: How Arousal and Attention Shape Human-Robot Interaction

Authors: Roy El-Helou, Matthew K. X. J Pan

Abstract: This study explores how human perceptions of a non-anthropomorphic robotic manipulator are shaped by two key dimensions of behaviour: arousal, defined as the robot's movement energy and expressiveness, and attention, defined as the robot's capacity to selectively orient toward and engage with a user. We introduce a novel control architecture that integrates a gaze-like attention engine with an aro… ▽ More This study explores how human perceptions of a non-anthropomorphic robotic manipulator are shaped by two key dimensions of behaviour: arousal, defined as the robot's movement energy and expressiveness, and attention, defined as the robot's capacity to selectively orient toward and engage with a user. We introduce a novel control architecture that integrates a gaze-like attention engine with an arousal-modulated motion system to generate socially meaningful behaviours. In a user study, we find that robots exhibiting high attention -- actively directing their focus toward users -- are perceived as warmer and more competent, intentional, and lifelike. In contrast, high arousal -- characterized by fast, expansive, and energetic motions -- increases perceptions of discomfort and disturbance. Importantly, a combination of focused attention and moderate arousal yields the highest ratings of trust and sociability, while excessive arousal diminishes social engagement. These findings offer design insights for endowing non-humanoid robots with expressive, intuitive behaviours that support more natural human-robot interaction. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: 7 pages, 3 figures, 1 table

arXiv:2503.13575 [pdf, other]

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

Authors: Kai Tong, Kang Pan, Xiao Zhang, Erli Meng, Run He, Yawen Cui, Nuoyan Guo, Huiping Zhuang

Abstract: Large Language Models (LLMs) possess encompassing capabilities that can process diverse language-related tasks. However, finetuning on LLMs will diminish this general skills and continual finetuning will further cause severe degradation on accumulated knowledge. Recently, Continual Learning (CL) in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks while main… ▽ More Large Language Models (LLMs) possess encompassing capabilities that can process diverse language-related tasks. However, finetuning on LLMs will diminish this general skills and continual finetuning will further cause severe degradation on accumulated knowledge. Recently, Continual Learning (CL) in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks while maintaining previously learned knowledge and inheriting general skills. Existing techniques either leverage previous data to replay, leading to extra computational costs, or utilize a single parameter-efficient module to learn the downstream task, constraining new knowledge absorption with interference between different tasks. Toward these issues, this paper proposes Analytic Subspace Routing(ASR) to address these challenges. For each task, we isolate the learning within a subspace of deep layers' features via low-rank adaptation, eliminating knowledge interference between different tasks. Additionally, we propose an analytic routing mechanism to properly utilize knowledge learned in different subspaces. Our approach employs Recursive Least Squares to train a multi-task router model, allowing the router to dynamically adapt to incoming data without requiring access to historical data. Also, the router effectively assigns the current task to an appropriate subspace and has a non-forgetting property of previously learned tasks with a solid theoretical guarantee. Experimental results demonstrate that our method achieves near-perfect retention of prior knowledge while seamlessly integrating new information, effectively overcoming the core limitations of existing methods. Our code will be released after acceptance. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Comments: 11 pages, 4 figures

arXiv:2503.12912 [pdf, other]

Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset

Authors: Bin Tang, Keqi Pan, Miao Zheng, Ning Zhou, Jialu Sui, Dandan Zhu, Cheng-Long Deng, Shu-Guang Kuai

Abstract: In recent years, predicting Big Five personality traits from multimodal data has received significant attention in artificial intelligence (AI). However, existing computational models often fail to achieve satisfactory performance. Psychological research has shown a strong correlation between pose and personality traits, yet previous research has largely ignored pose data in computational models.… ▽ More In recent years, predicting Big Five personality traits from multimodal data has received significant attention in artificial intelligence (AI). However, existing computational models often fail to achieve satisfactory performance. Psychological research has shown a strong correlation between pose and personality traits, yet previous research has largely ignored pose data in computational models. To address this gap, we develop a novel multimodal dataset that incorporates full-body pose data. The dataset includes video recordings of 287 participants completing a virtual interview with 36 questions, along with self-reported Big Five personality scores as labels. To effectively utilize this multimodal data, we introduce the Psychology-Inspired Network (PINet), which consists of three key modules: Multimodal Feature Awareness (MFA), Multimodal Feature Interaction (MFI), and Psychology-Informed Modality Correlation Loss (PIMC Loss). The MFA module leverages the Vision Mamba Block to capture comprehensive visual features related to personality, while the MFI module efficiently fuses the multimodal features. The PIMC Loss, grounded in psychological theory, guides the model to emphasize different modalities for different personality dimensions. Experimental results show that the PINet outperforms several state-of-the-art baseline models. Furthermore, the three modules of PINet contribute almost equally to the model's overall performance. Incorporating pose data significantly enhances the model's performance, with the pose modality ranking mid-level in importance among the five modalities. These findings address the existing gap in personality-related datasets that lack full-body pose data and provide a new approach for improving the accuracy of personality prediction models, highlighting the importance of integrating psychological insights into AI frameworks. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Comments: 9 pages, 6 figures, AAAI 2025 Oral

arXiv:2503.11408 [pdf, other]

A Neural Network Architecture Based on Attention Gate Mechanism for 3D Magnetotelluric Forward Modeling

Authors: Xin Zhong, Weiwei Ling, Kejia Pan, Pinxia Wu, Jiajing Zhang, Zhiliang Zhan, Wenbo Xiao

Abstract: Traditional three-dimensional magnetotelluric (MT) numerical forward modeling methods, such as the finite element method (FEM) and finite volume method (FVM), suffer from high computational costs and low efficiency due to limitations in mesh refinement and computational resources. We propose a novel neural network architecture named MTAGU-Net, which integrates an attention gating mechanism for 3D… ▽ More Traditional three-dimensional magnetotelluric (MT) numerical forward modeling methods, such as the finite element method (FEM) and finite volume method (FVM), suffer from high computational costs and low efficiency due to limitations in mesh refinement and computational resources. We propose a novel neural network architecture named MTAGU-Net, which integrates an attention gating mechanism for 3D MT forward modeling. Specifically, a dual-path attention gating module is designed based on forward response data images and embedded in the skip connections between the encoder and decoder. This module enables the fusion of critical anomaly information from shallow feature maps during the decoding of deep feature maps, significantly enhancing the network's capability to extract features from anomalous regions. Furthermore, we introduce a synthetic model generation method utilizing 3D Gaussian random field (GRF), which accurately replicates the electrical structures of real-world geological scenarios with high fidelity. Numerical experiments demonstrate that MTAGU-Net outperforms conventional 3D U-Net in terms of convergence stability and prediction accuracy, with the structural similarity index (SSIM) of the forward response data consistently exceeding 0.98. Moreover, the network can accurately predict forward response data on previously unseen datasets models, demonstrating its strong generalization ability and validating the feasibility and effectiveness of this method in practical applications. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 12 pages, 16 figures

arXiv:2503.09534 [pdf, other]

Contextuality sans incompatibility in the simplest scenario: Communication supremacy of a qubit

Authors: Partha Patra, Sumit Mukherjee, A. K. Pan

Abstract: Conventional wisdom asserts that measurement incompatibility is necessary for revealing the non-locality and contextuality. In contrast, a recent work [Phys. Rev. Lett. 130, 230201 (2023)] demonstrates the generalized contextuality without measurement incompatibility by using a five-outcome qubit measurement. In this paper, we introduce a two-party prepare-measure communication game involving spec… ▽ More Conventional wisdom asserts that measurement incompatibility is necessary for revealing the non-locality and contextuality. In contrast, a recent work [Phys. Rev. Lett. 130, 230201 (2023)] demonstrates the generalized contextuality without measurement incompatibility by using a five-outcome qubit measurement. In this paper, we introduce a two-party prepare-measure communication game involving specific constraints on preparations, and we demonstrate contextuality sans incompatibility in the simplest measurement scenario, requiring only a three-outcome qubit measurement. This is in contrast to the aforementioned five-outcome qubit measurement, which can be simulated by an appropriate convex mixture of five three-outcome incompatible qubit measurements. Furthermore, we illustrate that our result has a prominent implication in information theory. Our communication game can be perceived as a constrained Holevo scenario, as operational restrictions are imposed on preparations. We show that the maximum success probability of the game by using a qubit surpasses that attainable by a c-bit, even when shared randomness is a free resource. Consequently, this finding exemplifies the supremacy of a qubit over a c-bit within a constrained Holevo framework. Thus, alongside offering fresh insights into quantum foundations, our results pave a novel pathway for exploring the efficacy of a qubit in information processing tasks. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Comments: 12 pages, 2 figures, initial submission, comments are welcome

arXiv:2503.05726 [pdf]

The computation of average kernel with Gauss-Laguerre quadrature for double integrals

Authors: Kejun Pan, Mingliang Xie

Abstract: The use of average kernel method based on the Laplace transformation can significantly simplify the procedure for obtaining approximate analytical solution of Smoluchowski equation. However, this method also has its own shortcomings, one of which is the higher computational complexity of the binary Laplace transformation for a nonlinear collision kernel. In this study, a universal algorithm based… ▽ More The use of average kernel method based on the Laplace transformation can significantly simplify the procedure for obtaining approximate analytical solution of Smoluchowski equation. However, this method also has its own shortcomings, one of which is the higher computational complexity of the binary Laplace transformation for a nonlinear collision kernel. In this study, a universal algorithm based on the Gauss-Laguerre quadrature for treating the double integral is developed to obtain easily and quickly pre-exponential factor of the average kernel. Furthermore, the corresponding truncation error estimate also provided. △ Less

Submitted 17 February, 2025; originally announced March 2025.

Comments: arXiv admin note: substantial text overlap with arXiv:2502.13377, arXiv:2502.13378

arXiv:2502.13378 [pdf]

A universal preprocessing algorithm of average kernel method with Gauss-Laguerre quadrature for double integrals

Authors: Kejun Pan, Mingliang Xie

Abstract: To address the computational challenges posed by nonlinear collision kernels in the Smoluchowski equation, this study proposes a universal preprocessing algorithm for the average kernel method based on the Gauss-Laguerre quadrature for double integrals. With this algorithm, the numerical code accurately and efficiently determines the pre-exponential factor of the average kernel. Additionally, the… ▽ More To address the computational challenges posed by nonlinear collision kernels in the Smoluchowski equation, this study proposes a universal preprocessing algorithm for the average kernel method based on the Gauss-Laguerre quadrature for double integrals. With this algorithm, the numerical code accurately and efficiently determines the pre-exponential factor of the average kernel. Additionally, the exact pre-exponential factors of the four fundamental average kernels and their associated truncation error estimations were analyzed. The results demonstrate the reasonability and reliability of the preprocessing algorithm. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.12699 [pdf]

The evolution of nanoparticles due to Brownian coagulation in the temporal mixing layer with AK-iDNS over a long time

Authors: Kejun Pan, Mingliang Xie

Abstract: In this article, the evolution of nanoparticles in a two-dimensional temporal mixing layer over a long time is investigated. the flow field is calculated with direct numerical simulation (DNS), while the particle field is simulated using the average kernel method coupled with iterative direct numerical simulation (AK-iDNS). Under moderate Reynolds number, the flow field undergoes processes of vort… ▽ More In this article, the evolution of nanoparticles in a two-dimensional temporal mixing layer over a long time is investigated. the flow field is calculated with direct numerical simulation (DNS), while the particle field is simulated using the average kernel method coupled with iterative direct numerical simulation (AK-iDNS). Under moderate Reynolds number, the flow field undergoes processes of vortex emergence, entrainment, rolling and pairing, merging, and dissipation. Due to the small Stokes number of nanoparticles, and the particles moves closely following the flow field. Meanwhile, the particle undergoes coagulation under the influence of Brownian motion. This article discusses the evolution nanoparticle under the combined effect of advection, diffusion and coagulation. Under the influence of vortices or large-scale coherent structures, the spatial distribution of particle moments is similar to the structure of the flow field. And diffusion and coagulation have a significant impact on the amplitude of the distribution of particle moments. However, diffusion has little impact on the mean distribution, while coagulation has a much greater impact on the mean distribution. As the flow field evolves, the temporal mixing layer degenerates into Couette flow. The particles exhibit similar asymptotic behavior as that of 0-dimensional problem. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.09413 [pdf, ps, other]

Analysis of harmonic average method for interface problems with discontinuous solutions and fluxes

Authors: Kejia Pan, Hengrui Xu, Zhilin Li

Abstract: Harmonic average method has been widely utilized to deal with heterogeneous coefficients in solving differential equations. One remarkable advantage of the harmonic averaging method is that no derivative of the coefficient is needed. Furthermore, the coefficient matrix of the finite difference equations is an M-matrix which guarantees the stability of the algorithm. It has been numerically observe… ▽ More Harmonic average method has been widely utilized to deal with heterogeneous coefficients in solving differential equations. One remarkable advantage of the harmonic averaging method is that no derivative of the coefficient is needed. Furthermore, the coefficient matrix of the finite difference equations is an M-matrix which guarantees the stability of the algorithm. It has been numerically observed but not theoretically proved that the method produces second order pointwise accuracy when the solution and flux are continuous even if the coefficient has finite discontinuities for which the method is inconsistent ($O(1)$ in the local truncation errors). It has been believed that there are some fortunate error cancellations. The harmonic average method does not converge when the solution or the flux has finite discontinuities. In this paper, not only we rigorously prove the second order convergence of the harmonic averaging method for one-dimensional interface problem when the coefficient has a finite discontinuities and the solution and the flux are continuous, but also proposed an {\em improved harmonic average method} that is also second order accurate (in the $L^{\infty}$ norm), which allows discontinuous solutions and fluxes along with the discontinuous coefficients. The key in the convergence proof is the construction of the Green's function. The proof shows how the error cancellations occur in a subtle way. Numerical experiments in both 1D and 2D confirmed the theoretical proof of the improved harmonic average method. △ Less

Submitted 13 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

arXiv:2502.08547 [pdf, other]

Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

Authors: Doudou Zhou, Han Tong, Linshanshan Wang, Suqi Liu, Xin Xiong, Ziming Gan, Romain Griffier, Boris Hejblum, Yun-Chung Liu, Chuan Hong, Clara-Lea Bonzel, Tianrun Cai, Kevin Pan, Yuk-Lam Ho, Lauren Costa, Vidul A. Panickan, J. Michael Gaziano, Kenneth Mandl, Vianney Jouhet, Rodolphe Thiebaut, Zongqi Xia, Kelly Cho, Katherine Liao, Tianxi Cai

Abstract: The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of i… ▽ More The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of including multi-institutional patient-level data required to study similarities and differences across patient subgroups. To address these challenges, we developed the GAME algorithm. Tested and validated across 7 institutions and 2 languages, GAME integrates data in several levels: (1) at the institutional level with knowledge graphs to establish relationships between codes and existing knowledge sources, providing the medical context for standard codes and their relationship to each other; (2) between institutions, leveraging language models to determine the relationships between institution-specific codes with established standard codes; and (3) quantifying the strength of the relationships between codes using a graph attention network. Jointly trained embeddings are created using transfer and federated learning to preserve data privacy. In this study, we demonstrate the applicability of GAME in selecting relevant features as inputs for AI-driven algorithms in a range of conditions, e.g., heart failure, rheumatoid arthritis. We then highlight the application of GAME harmonized multi-institutional EHR data in a study of Alzheimer's disease outcomes and suicide risk among patients with mental health disorders, without sharing patient-level data outside individual institutions. △ Less

Submitted 12 February, 2025; originally announced February 2025.

arXiv:2501.07044 [pdf, other]

Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities

Authors: Jialin Wu, Kaikai Pan, Yanjiao Chen, Jiangyi Deng, Shengyuan Pang, Wenyuan Xu

Abstract: Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems. However, these models are still influenced by adversarial examples. In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models. To understand and… ▽ More Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems. However, these models are still influenced by adversarial examples. In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models. To understand and analyse the bias in neural network decisions when the input is adversarial, we use two visualisation techniques that are attention rollout and grad attention rollout. To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples of ViT models. Nonetheless, this is challenging due to a diversity of attack strategies that may be adopted by adversaries. Inspired by the attention mechanism, we know that the token of prediction contains all the information from the input sample. Additionally, the attention region for adversarial examples differs from that of normal examples. Given these points, we can train a detector that achieves superior performance than existing detection methods to identify adversarial examples. Our experiments have demonstrated the high effectiveness of our detection method. For these six adversarial attack methods, our detector's AUC scores all exceed 0.95. Protego may advance investigations in metaverse security. △ Less

Submitted 12 January, 2025; originally announced January 2025.

Comments: Accepted by IEEE MetaCom 2024

arXiv:2501.01495 [pdf, other]

doi 10.3847/1538-4357/adb3a0

Search for continuous gravitational waves from known pulsars in the first part of the fourth LIGO-Virgo-KAGRA observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1794 additional authors not shown)

Abstract: Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent ana… ▽ More Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent analysis methods considering the single-harmonic and the dual-harmonic emission models. We find no evidence of a CW signal in O4a data for both models and set upper limits on the signal amplitude and on the ellipticity, which quantifies the asymmetry in the neutron star mass distribution. For the single-harmonic emission model, 29 targets have the upper limit on the amplitude below the theoretical spin-down limit. The lowest upper limit on the amplitude is $6.4\!\times\!10^{-27}$ for the young energetic pulsar J0537-6910, while the lowest constraint on the ellipticity is $8.8\!\times\!10^{-9}$ for the bright nearby millisecond pulsar J0437-4715. Additionally, for a subset of 16 targets we performed a narrowband search that is more robust regarding the emission model, with no evidence of a signal. We also found no evidence of non-standard polarizations as predicted by the Brans-Dicke theory. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: main paper: 12 pages, 6 figures, 4 tables

Report number: LIGO-P2400315

Journal ref: Astrophys.J. 983 (2025) 2, 99

arXiv:2412.10342 [pdf, other]

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Authors: Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Abstract: Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems. While text-based agents built on Large Language Models (LLMs) often require frequent updates due to platform-specific APIs, visual agents leveraging Multimodal Large Language Models (MLLMs) offer enhanced adaptability by interacting directl… ▽ More Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems. While text-based agents built on Large Language Models (LLMs) often require frequent updates due to platform-specific APIs, visual agents leveraging Multimodal Large Language Models (MLLMs) offer enhanced adaptability by interacting directly with Graphical User Interfaces (GUIs). However, these agents face significant challenges in visual perception, particularly when handling high-resolution, visually complex digital environments. This paper introduces Iris, a foundational visual agent that addresses these challenges through two key innovations: Information-Sensitive Cropping (ISC) and Self-Refining Dual Learning (SRDL). ISC dynamically identifies and prioritizes visually dense regions using a edge detection algorithm, enabling efficient processing by allocating more computational resources to areas with higher information density. SRDL enhances the agent's ability to handle complex tasks by leveraging a dual-learning loop, where improvements in referring (describing UI elements) reinforce grounding (locating elements) and vice versa, all without requiring additional annotated data. Empirical evaluations demonstrate that Iris achieves state-of-the-art performance across multiple benchmarks with only 850K GUI annotations, outperforming methods using 10x more training data. These improvements further translate to significant gains in both web and OS agent downstream tasks. △ Less

Submitted 3 February, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

arXiv:2412.04825 [pdf, other]

Universal Hamming Weight Preserving Variational Quantum Ansatz

Authors: Ge Yan, Kaisen Pan, Ruocheng Wang, Mengfei Ran, Hongxu Chen, Xunuo Wang, Junchi Yan

Abstract: Understanding the mathematical properties of variational quantum ansätze is crucial for determining quantum advantage in Variational Quantum Eigensolvers (VQEs). A deeper understanding of ansätze not only enriches theoretical discussions but also facilitates the design of more efficient and robust frameworks for near-term applications. In this work, we address the challenge of balancing expressivi… ▽ More Understanding the mathematical properties of variational quantum ansätze is crucial for determining quantum advantage in Variational Quantum Eigensolvers (VQEs). A deeper understanding of ansätze not only enriches theoretical discussions but also facilitates the design of more efficient and robust frameworks for near-term applications. In this work, we address the challenge of balancing expressivity and trainability by utilizing a Hamming Weight Preserving (HWP) ansatz that confines quantum state evolution to a symmetry-preserving subspace. We rigorously establish the necessary and sufficient conditions for subspace universality of HWP ansätze, along with a comprehensive analysis of the trainability. These theoretical advances are validated via the accurate approximation of arbitrary unitary matrices in the HWP subspace. Furthermore, the practical utility of the HWP ansatz is substantiated for solving ground-state properties of Fermionic systems, achieving energy errors below $1\times 10^{-10}$Ha. This work highlights the critical role of symmetry-preserving ansätze in VQE research, offering insights that extend beyond supremacy debates and paving the way for more reliable and efficient quantum algorithms in the near term. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2412.00161 [pdf, other]

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Authors: Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Abstract: Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step spatio-temporal inference across object relations, interactions, and events. The hurdles to enhancing this capability include extensive manual labor, the lack… ▽ More Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step spatio-temporal inference across object relations, interactions, and events. The hurdles to enhancing this capability include extensive manual labor, the lack of spatio-temporal compositionality in existing data and the absence of explicit reasoning supervision. In this paper, we propose STEP, a novel graph-guided self-training method that enables Video-LLMs to generate reasoning-rich fine-tuning data from any raw videos to improve itself. Specifically, we first induce Spatio-Temporal Scene Graph (STSG) representation of diverse videos to capture fine-grained, multi-granular video semantics. Then, the STSGs guide the derivation of multi-step reasoning Question-Answer (QA) data with Chain-of-Thought (CoT) rationales. Both answers and rationales are integrated as training objective, aiming to enhance model's reasoning abilities by supervision over explicit reasoning steps. Experimental results demonstrate the effectiveness of STEP across models of varying scales, with a significant 21.3\% improvement in tasks requiring three or more reasoning steps. Furthermore, it achieves superior performance with a minimal amount of self-generated rationale-enriched training samples in both compositional reasoning and comprehensive understanding benchmarks, highlighting the broad applicability and vast potential. △ Less

Submitted 30 March, 2025; v1 submitted 29 November, 2024; originally announced December 2024.

arXiv:2411.15738 [pdf, other]

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Authors: Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang

Abstract: Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing p… ▽ More Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types and five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity. △ Less

Submitted 29 March, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

Comments: Accepted by CVPR 2025

arXiv:2411.01879 [pdf, ps, other]

On the Equivalence of Synchronous Coordination Game and Asynchronous Coordination Design

Authors: Xinnian Kazusa Pan

Abstract: This paper establishes the equivalence between synchronous and asynchronous coordination mechanisms in dynamic games with strategic complementarities and common interests. Synchronous coordination, characterized by simultaneous commitments, and asynchronous coordination, defined by sequential action timing, are both prevalent in economic contexts such as crowdfunding and fund management. We introd… ▽ More This paper establishes the equivalence between synchronous and asynchronous coordination mechanisms in dynamic games with strategic complementarities and common interests. Synchronous coordination, characterized by simultaneous commitments, and asynchronous coordination, defined by sequential action timing, are both prevalent in economic contexts such as crowdfunding and fund management. We introduce Monotone Subgame Perfect Nash Equilibrium, MSPNE, to analyze least favorable equilibrium outcomes. We provide a recursive characterization for synchronous coordination and a graph-theoretic representation for asynchronous coordination, demonstrating their equivalence in terms of the greatest implementable outcome. Our results show that the structure of commitment, whether simultaneous or sequential, does not affect the achievable welfare outcome under certain conditions. Additionally, we discuss computational aspects, highlighting the general NP-Hardness of the problem but identifying a significant class of games that are computationally tractable. These findings offer valuable insights for the optimal design of coordination mechanisms. △ Less

Submitted 4 November, 2024; originally announced November 2024.

arXiv:2411.01178 [pdf, other]

LLM4PR: Improving Post-Ranking in Search Engine with Large Language Models

Authors: Yang Yan, Yihao Wang, Chi Zhang, Wenyuan Hou, Kang Pan, Xingkai Ren, Zelun Wu, Zhixin Zhai, Enyun Yu, Wenwu Ou, Yang Song

Abstract: Alongside the rapid development of Large Language Models (LLMs), there has been a notable increase in efforts to integrate LLM techniques in information retrieval (IR) and search engines (SE). Recently, an additional post-ranking stage is suggested in SE to enhance user satisfaction in practical applications. Nevertheless, research dedicated to enhancing the post-ranking stage through LLMs remains… ▽ More Alongside the rapid development of Large Language Models (LLMs), there has been a notable increase in efforts to integrate LLM techniques in information retrieval (IR) and search engines (SE). Recently, an additional post-ranking stage is suggested in SE to enhance user satisfaction in practical applications. Nevertheless, research dedicated to enhancing the post-ranking stage through LLMs remains largely unexplored. In this study, we introduce a novel paradigm named Large Language Models for Post-Ranking in search engine (LLM4PR), which leverages the capabilities of LLMs to accomplish the post-ranking task in SE. Concretely, a Query-Instructed Adapter (QIA) module is designed to derive the user/item representation vectors by incorporating their heterogeneous features. A feature adaptation step is further introduced to align the semantics of user/item representations with the LLM. Finally, the LLM4PR integrates a learning to post-rank step, leveraging both a main task and an auxiliary task to fine-tune the model to adapt the post-ranking task. Experiment studies demonstrate that the proposed framework leads to significant improvements and exhibits state-of-the-art performance compared with other alternatives. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2411.00304 [pdf, other]

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Authors: Wei Chow, Juncheng Li, Qifan Yu, Kaihang Pan, Hao Fei, Zhiqi Ge, Shuai Yang, Siliang Tang, Hanwang Zhang, Qianru Sun

Abstract: In recent times, Vision-Language Models (VLMs) have been trained under two predominant paradigms. Generative training has enabled Multimodal Large Language Models (MLLMs) to tackle various complex tasks, yet issues such as hallucinations and weak object discrimination persist. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet… ▽ More In recent times, Vision-Language Models (VLMs) have been trained under two predominant paradigms. Generative training has enabled Multimodal Large Language Models (MLLMs) to tackle various complex tasks, yet issues such as hallucinations and weak object discrimination persist. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet struggles with complex scenarios requiring fine-grained semantic differentiation. This paper addresses these challenges by proposing a unified approach that integrates the strengths of both paradigms. Considering interleaved image-text sequences as the general format of input samples, we introduce a structure-induced training strategy that imposes semantic relationships between input samples and the MLLM's hidden state. This approach enhances the MLLM's ability to capture global semantics and distinguish fine-grained semantics. By leveraging dynamic sequence alignment within the Dynamic Time Warping framework and integrating a novel kernel for fine-grained semantic differentiation, our method effectively balances generative and discriminative tasks. Extensive experiments demonstrate the effectiveness of our approach, achieving state-of-the-art results in multiple generative tasks, especially those requiring cognitive and discrimination abilities. Additionally, our method surpasses discriminative benchmarks in interleaved and fine-grained retrieval tasks. By employing a retrieval-augmented generation strategy, our approach further enhances performance in some generative tasks within one model, offering a promising direction for future research in vision-language modeling. △ Less

Submitted 31 October, 2024; originally announced November 2024.

arXiv:2410.17021 [pdf, other]

SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine

Authors: Xiaochen Wang, Junqing He, Liang Chen, Reza Haf Zhe Yang, Yiru Wang, Xiangdi Meng, Kunhao Pan, Zhifang Sui

Abstract: Large Language Models with chain-of-thought prompting, such as OpenAI-o1, have shown impressive capabilities in natural language inference tasks. However, Multi-hop Question Answering (MHQA) remains challenging for many existing models due to issues like hallucination, error propagation, and limited context length. To address these challenges and enhance LLMs' performance on MHQA, we propose the S… ▽ More Large Language Models with chain-of-thought prompting, such as OpenAI-o1, have shown impressive capabilities in natural language inference tasks. However, Multi-hop Question Answering (MHQA) remains challenging for many existing models due to issues like hallucination, error propagation, and limited context length. To address these challenges and enhance LLMs' performance on MHQA, we propose the Self-Guiding prompting Finite State Machine (SG-FSM), designed to strengthen multi-hop reasoning abilities. Unlike traditional chain-of-thought methods, SG-FSM tackles MHQA by iteratively breaking down complex questions into sub-questions, correcting itself to improve accuracy. It processes one sub-question at a time, dynamically deciding the next step based on the current context and results, functioning much like an automaton. Experiments across various benchmarks demonstrate the effectiveness of our approach, outperforming strong baselines on challenging datasets such as Musique. SG-FSM reduces hallucination, enabling recovery of the correct final answer despite intermediate errors. It also improves adherence to specified output formats, simplifying evaluation significantly. △ Less

Submitted 22 October, 2024; originally announced October 2024.

arXiv:2410.16565 [pdf, other]

doi 10.3847/1538-4357/adc681

Search for gravitational waves emitted from SN 2023ixf

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné, A. Allocca , et al. (1758 additional authors not shown)

Abstract: We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been… ▽ More We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been identified in data when at least two gravitational-wave observatories were operating, which covered $\sim 14\%$ of this five-day window. We report the search detection efficiency for various possible gravitational-wave emission models. Considering the distance to M101 (6.7 Mpc), we derive constraints on the gravitational-wave emission mechanism of core-collapse supernovae across a broad frequency spectrum, ranging from 50 Hz to 2 kHz where we assume the gravitational-wave emission occurred when coincident data are available in the on-source window. Considering an ellipsoid model for a rotating proto-neutron star, our search is sensitive to gravitational-wave energy $1 \times 10^{-4} M_{\odot} c^2$ and luminosity $2.6 \times 10^{-4} M_{\odot} c^2/s$ for a source emitting at 82 Hz. These constraints are around an order of magnitude more stringent than those obtained so far with gravitational-wave data. The constraint on the ellipticity of the proto-neutron star that is formed is as low as 1.08, at frequencies above 1200 Hz, surpassing past results. △ Less

Submitted 11 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: Main paper: 6 pages, 4 figures and 1 table. Total with appendices: 20 pages, 4 figures, and 1 table

Report number: LIGO-P2400125

Journal ref: ApJ 985 183 (2025)

arXiv:2410.09151 [pdf, other]

doi 10.3847/1538-4357/ad8de0

A search using GEO600 for gravitational waves coincident with fast radio bursts from SGR 1935+2154

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1758 additional authors not shown)

Abstract: The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by… ▽ More The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by CHIME/FRB, as well as X-ray glitches and X-ray bursts detected by NICER and NuSTAR close to the time of one of the FRBs. We do not detect any significant GW emission from any of the events. Instead, using a short-duration GW search (for bursts $\leq$ 1 s) we derive 50\% (90\%) upper limits of $10^{48}$ ($10^{49}$) erg for GWs at 300 Hz and $10^{49}$ ($10^{50}$) erg at 2 kHz, and constrain the GW-to-radio energy ratio to $\leq 10^{14} - 10^{16}$. We also derive upper limits from a long-duration search for bursts with durations between 1 and 10 s. These represent the strictest upper limits on concurrent GW emission from FRBs. △ Less

Submitted 21 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages of text including references, 4 figures, 5 tables

Report number: LIGO-P2400192

Journal ref: ApJ 977 255 (2024)

arXiv:2410.07634 [pdf, other]

Bipartite and Euclidean Gallai-Ramsey Theory

Authors: Isabel McGuigan, Katherine Pan

Abstract: In this paper, we investigate the following Gallai-Ramsey question: how large must a complete bipartite graph $K_{n_1, n_2}$ be before any coloring of its edges with $r$ colors contains either a monochromatic copy of $G = K_{s,t}$ or a rainbow copy of $H = K_{s,t}$? We demonstrate that the answer is linear in $r$, and provide more precise bounds for the specific case $s = 2$. Furthermore, we also… ▽ More In this paper, we investigate the following Gallai-Ramsey question: how large must a complete bipartite graph $K_{n_1, n_2}$ be before any coloring of its edges with $r$ colors contains either a monochromatic copy of $G = K_{s,t}$ or a rainbow copy of $H = K_{s,t}$? We demonstrate that the answer is linear in $r$, and provide more precise bounds for the specific case $s = 2$. Furthermore, we also consider the following Euclidean Gallai-Ramsey question: given a configuration $H$ in Euclidean space, what is the smallest $n$ such that any $r$-coloring of $n$-dimensional Euclidean space contains a monochromatic or rainbow configuration congruent to $H$? Through a natural translation between edge colorings of the complete bipartite graph $K_{n_1,n_2}$ and colorings of a subset of $(n_1+n_2)$-dimensional Euclidean space, we prove new upper bounds on $n$ for some configurations which can be expressed as Cartesian products of simplices. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 18 pages, 3 figures

arXiv:2409.19872 [pdf, other]

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

Authors: Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, Qianru Sun

Abstract: The swift advancement in Multimodal LLMs (MLLMs) also presents significant challenges for effective knowledge editing. Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses, struggling to balance the desired properties of reliability, generality, and locality when applied to MLLMs. In this paper, we propose UniKE, a novel mul… ▽ More The swift advancement in Multimodal LLMs (MLLMs) also presents significant challenges for effective knowledge editing. Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses, struggling to balance the desired properties of reliability, generality, and locality when applied to MLLMs. In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting. Both types of knowledge are conceptualized as vectorized key-value memories, with the corresponding editing processes resembling the assimilation and accommodation phases of human cognition, conducted at the same semantic levels. Within such a unified framework, we further promote knowledge collaboration by disentangling the knowledge representations into the semantic and truthfulness spaces. Extensive experiments validate the effectiveness of our method, which ensures that the post-edit MLLM simultaneously maintains excellent reliability, generality, and locality. The code for UniKE is available at \url{https://github.com/beepkh/UniKE}. △ Less

Submitted 30 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

Comments: Accepted by NeurIPS 2024 (Spotlight)

arXiv:2409.19419 [pdf, other]

Device-independent full network nonlocality for arbitrary-party and unbounded-input scenario

Authors: Sneha Munshi, A. K. Pan

Abstract: The nonlocality arising in a multi-party network involving multiple independent sources radically differs from the standard multipartite Bell nonlocality involving a single source. The notion of the full network nonlocality (FNN) (Phys. Rev. Lett.128, 010403 (2022)) characterizes the quantum correlations that cannot be reproduced by a local-nonlocal model featuring one local source and the rest of… ▽ More The nonlocality arising in a multi-party network involving multiple independent sources radically differs from the standard multipartite Bell nonlocality involving a single source. The notion of the full network nonlocality (FNN) (Phys. Rev. Lett.128, 010403 (2022)) characterizes the quantum correlations that cannot be reproduced by a local-nonlocal model featuring one local source and the rest of nonlocal no-signaling sources. However, the demonstration of FNN was limited to bilocal and trilocal star-shaped network scenarios involving three or two dichotomic measurements for edge parties. In this paper, we first demonstrate that a large class of prevailing network inequalities does not exhibit FNN. We then introduce an elegant set of arbitrary-party and unbounded-input network inequalities in star-shaped and linear-chain networks whose optimal quantum violation exhibits FNN, certifying that the nonlocality is genuinely distributed to the entire network. Contrasting to existing demonstrations of FNN that inevitably require fixed-input and four-output elegant joint measurements for the central party, our generalized inequalities are more experimentally friendly, requiring only two-output measurements. Moreover, our derivation of optimal quantum violation is fully analytic and devoid of assuming the dimension of the quantum system, thereby showcasing its potential for device-independent self-testing. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.17873 [pdf, other]

doi 10.14722/ndss.2025.23691

ReThink: Reveal the Threat of Electromagnetic Interference on Power Inverters

Authors: Fengchen Yang, Zihao Dan, Kaikai Pan, Chen Yan, Xiaoyu Ji, Wenyuan Xu

Abstract: With the boom of renewable energy sources (RES), the number of power inverters proliferates. Power inverters are the key electronic devices that transform the direct current (DC) power from RES to the alternating current (AC) power on the grids, and their security can affect the stable operation of RES and even power grids. This paper analyzes the security of photovoltaic (PV) inverters from the a… ▽ More With the boom of renewable energy sources (RES), the number of power inverters proliferates. Power inverters are the key electronic devices that transform the direct current (DC) power from RES to the alternating current (AC) power on the grids, and their security can affect the stable operation of RES and even power grids. This paper analyzes the security of photovoltaic (PV) inverters from the aspects of internal sensors since they serve as the foundation for safe power conversion. We discover that both the embedded current sensors and voltage sensors are vulnerable to electromagnetic interference (EMI) of 1 GHz or higher, despite electromagnetic compatibility (EMC) countermeasures. Such vulnerabilities can lead to incorrect measurements and deceiving the control algorithms, and we design ReThink that could produce three types of consequences on PV inverters by emitting carefully crafted EMI, i.e., Denial of Service (DoS), damaging inverters physically or damping the power output. We successfully validate these consequences on 5 off-the-shelf PV inverters, and even in a real-world microgrid, by transmitting EMI signals at a distance of 100-150cm and a total power within 20W. Our work aims to raise awareness of the security of power electronic devices of RES, as they represent an emerging Cyber-Physical attack surface to the future RES-dominated grid. Finally, to cope with such threats, we provide hardware and software-based countermeasures. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Fengchen Yang, Zihao Dan, Kaikai Pan, Chen Yan, Xiaoyu Ji, Wenyuan Xu. ReThink: Reveal the Threat of Electromagnetic Interference on Power Inverters. In the Network and Distributed System Security Symposium 2025 (NDSS 2025)."

arXiv:2409.12229 [pdf, other]

The SDSS-V Black Hole Mapper Reverberation Mapping Project: A Kinematically Variable Broad-Line Region and Consequences for Masses of Luminous Quasars

Authors: Logan B. Fries, Jonathan R. Trump, Keith Horne, Megan C. Davis, Catherine J. Grier, Yue Shen, Scott F. Anderson, Tom Dwelly, Y. Homayouni, Sean Morrison, Jessie C. Runnoe, Benny Trakhtenbrot, Roberto J. Assef, Dmitry Bizyaev, W. N. Brandt, Peter Breiding, Joel Browstein, Priyanka Chakraborty, P. B. Hall, Anton M. Koekemoer, Héctor J. Ibarra-Medel, Mary Loli Martínez-Aldama, C. Alenka Negrete, Kaike Pan, Claudio Ricci , et al. (5 additional authors not shown)

Abstract: We present a velocity-resolved reverberation mapping analysis of the hypervariable quasar RM160 (SDSS J141041.25+531849.0) at z = 0.359 with 153 spectroscopic epochs of data representing a ten-year baseline (2013-2023). We split the baseline into two regimes based on the 3x flux increase in the light curve: a 'low state' phase during the years 2013-2019 and a 'high state' phase during the years 20… ▽ More We present a velocity-resolved reverberation mapping analysis of the hypervariable quasar RM160 (SDSS J141041.25+531849.0) at z = 0.359 with 153 spectroscopic epochs of data representing a ten-year baseline (2013-2023). We split the baseline into two regimes based on the 3x flux increase in the light curve: a 'low state' phase during the years 2013-2019 and a 'high state' phase during the years 2022-2023. The velocity-resolved lag profiles (VRLP) indicate that gas with different kinematics dominates the line emission in different states. The H\b{eta} VRLP begins with a signature of inflow onto the BLR in the 'low state', while in the 'high state' it is flatter with less signature of inflow. The Hα VRLP begins consistent with a virialized BLR in the 'low state', while in the 'high state' shows a signature of inflow. The differences in the kinematics between the Balmer lines and between the 'low state' and the 'high state' suggests complex BLR dynamics. We find that the BLR radius and velocity (both FWHM and σ) do not obey a constant virial product throughout the monitoring period. We find that BLR lags and continuum luminosity are correlated, consistent with rapid response of the BLR gas to the illuminating continuum. The BLR kinematic profile changes in unpredictable ways that are not related to continuum changes and reverberation lag. Our observations indicate that non-virial kinematics can significantly contribute to observed line profiles, suggesting caution for black-hole mass estimation in luminous and highly varying quasars like RM160. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 23 pages, 17 figures

arXiv:2409.09664 [pdf, ps, other]

Ring operads and symmetric bimonoidal categories

Authors: Kailin Pan

Abstract: We generalize the classical operad pair theory to a new model for $E_\infty$ ring spaces, which we call ring operad theory, and establish a connection with the classical operad pair theory, allowing the classical multiplicative infinite loop machine to be applied to algebras over any $E_\infty$ ring operad. As an application, we show that classifying spaces of symmetric bimonoidal categories are d… ▽ More We generalize the classical operad pair theory to a new model for $E_\infty$ ring spaces, which we call ring operad theory, and establish a connection with the classical operad pair theory, allowing the classical multiplicative infinite loop machine to be applied to algebras over any $E_\infty$ ring operad. As an application, we show that classifying spaces of symmetric bimonoidal categories are directly homeomorphic to certain $E_\infty$ ring spaces in the ring operad sense. Consequently, this provides an alternative construction from symmetric bimonoidal categories to classical $E_\infty$ ring spaces. We also present a comparison between this construction and the classical approach. △ Less

Submitted 15 September, 2024; originally announced September 2024.

Comments: 42 pages

arXiv:2409.01064 [pdf, ps, other]

Fourth-order compact finite difference schemes for solving biharmonic equations with Dirichlet boundary conditions

Authors: Kejia Pan, Jin Li, Zhilin Li, Kang Fu

Abstract: In this study, we propose a genuine fourth-order compact finite difference scheme for solving biharmonic equations with Dirichlet boundary conditions in both two and three dimensions. In the 2D case, we build upon the high-order compact (HOC) schemes for flux-type boundary conditions originally developed by Zhilin Li and Kejia Pan [SIAM J. Sci. Comput., 45 (2023), pp. A646-A674] to construct a hig… ▽ More In this study, we propose a genuine fourth-order compact finite difference scheme for solving biharmonic equations with Dirichlet boundary conditions in both two and three dimensions. In the 2D case, we build upon the high-order compact (HOC) schemes for flux-type boundary conditions originally developed by Zhilin Li and Kejia Pan [SIAM J. Sci. Comput., 45 (2023), pp. A646-A674] to construct a high order compact discretization for coupled boundary conditions. When considering the 3D case, we modify carefully designed undetermined coefficient methods of Li and Pan to derive the finite difference approximations of coupled boundary conditions. The resultant FD discretization maintains the global fourth order convergence and compactness. Unlike the very popular Stephenson method, the number of unknows do not increase with dimensions. Besides, it is noteworthy that the condition number of the coefficient matrix increases at a rate of $O(h^{-2})$ in both 2D and 3D. We also validate the performance of the proposed genuine HOC methods through nontrivial examples. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 14 pages, 2 figures

arXiv:2408.10732 [pdf, other]

Robust self-testing of the $m-$partite maximally entangled state and observables

Authors: Ritesh K. Singh, Souradeep Sasmal, A. K. Pan

Abstract: As quantum technologies continue to advance rapidly, the device-independent testing of the functioning of a quantum device has become increasingly important. Self-testing, a correlation based protocol, enables such certification of a promised quantum state as well as measurements performed on it without requiring knowledge of the device's internal workings. This approach typically relies on achiev… ▽ More As quantum technologies continue to advance rapidly, the device-independent testing of the functioning of a quantum device has become increasingly important. Self-testing, a correlation based protocol, enables such certification of a promised quantum state as well as measurements performed on it without requiring knowledge of the device's internal workings. This approach typically relies on achieving the optimal quantum violation of a suitable Bell inequality. Self-testing has been extensively investigated in the context of bipartite Bell experiments. However, its extension to multipartite scenarios remains largely unexplored, owing to the intricate nature of multipartite quantum correlations. In this work, we propose a simple and efficient self-testing protocol that certifies the state and observables based on the optimal quantum violation of the Svetlichny inequality involving an arbitrary number of parties, each with two inputs. Our method leverages an elegant sum-of-squares approach to derive the optimal quantum value of the Svetlichny functional, devoid of assuming the dimension of the quantum system. This enables the self-testing of the $m-$partite maximally entangled state and local anti-commuting observables for each party. Moreover, we develop a swap circuit isometry to assess the proximity of reference states and measurements to their ideal counterparts in the presence of noise and imperfections in real experiments, thereby demonstrating the robustness of our self-testing protocol. Finally, we illustrate how our self-testing protocol facilitates the generation of certified genuine randomness from correlations that enable the optimal violation of the Svetlichny inequality. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 29 pages, 3 figures

arXiv:2408.10363 [pdf, other]

doi 10.1103/PhysRevA.110.012444

Self-testing of multiple unsharpness parameters through sequential violations of non-contextual inequality

Authors: Rajdeep Paul, Souradeep Sasmal, A. K. Pan

Abstract: The self-testing protocols refer to novel device-independent certification schemes wherein the devices are uncharacterised, and the dimension of the system remains unspecified. The optimal quantum violation of a Bell's inequality facilitates such self-testing. In this work, we put forth a protocol for self-testing of noisy quantum instruments, specifically, the unsharpness parameter of smeared pro… ▽ More The self-testing protocols refer to novel device-independent certification schemes wherein the devices are uncharacterised, and the dimension of the system remains unspecified. The optimal quantum violation of a Bell's inequality facilitates such self-testing. In this work, we put forth a protocol for self-testing of noisy quantum instruments, specifically, the unsharpness parameter of smeared projective measurements in any arbitrary dimension. Our protocol hinges on the sequential quantum violations of a bipartite Bell-type preparation non-contextual inequality, involving three measurement settings per party. First, we demonstrate that at most three sequential independent Bobs manifest simultaneous preparation contextuality with a single Alice through the violation of this inequality. Subsequently, we show that the sub-optimal sequential quantum violations of the non-contextual inequality form an optimal set, eventually enabling the self-testing of shared state, local measurements and unsharpness parameters of one party. Notably, we derive the optimal set of quantum violations without specifying the dimension of the quantum system, thereby circumventing the constraint that may arise due to Naimark's theorem. Furthermore, we extend our investigation to quantify the degree of incompatible measurements pertaining to the sequential observers, exploring how variations in the degree of incompatibility impact the values of unsharp parameters necessary for sequential quantum violation. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 21 pages, 3 figures

Journal ref: Phys. Rev. A 110, 012444 (2024)

arXiv:2408.10350 [pdf, other]

doi 10.1103/PhysRevA.110.052432

Towards Necessary and sufficient state condition for violation of a multi-settings Bell inequality

Authors: Swapnil Bhowmick, Som Kanjilal, A. K. Pan, Souradeep Sasmal

Abstract: High dimensional quantum entanglement and the advancements in their experimental realization provide a playground for fundamental research and eventually lead to quantum technological developments. The Horodecki criterion determines whether a state violates Clauser-Horne-Shimony-Holt (CHSH) inequality for a two-qubit entangled state, solely from the state parameters. However, it remains a challeng… ▽ More High dimensional quantum entanglement and the advancements in their experimental realization provide a playground for fundamental research and eventually lead to quantum technological developments. The Horodecki criterion determines whether a state violates Clauser-Horne-Shimony-Holt (CHSH) inequality for a two-qubit entangled state, solely from the state parameters. However, it remains a challenging task to formulate similar necessary and sufficient criteria for a high-dimensional entangled state for the violation of a suitable Bell inequality. Here, we develop a Horodecki-like criterion based on the state parameters of arbitrary two-qudit states to violate a two-outcome Bell inequality involving $2^{n-1}$ and $n$ measurement settings for Alice and Bob, respectively. This inequality reduces to the well-known CHSH and Gisin's elegant Bell inequalities for $n=2$ and $n=3$, respectively. While the proposed criterion is sufficient to violate the Bell inequality, it becomes necessary as well for the following cases; (i) $m$ copies of Bell diagonal states for arbitrary $n$, (ii) Non-decomposable states whose correlation matrix is diagonalized by local unitaries, and (iii) for any arbitrary two-qubit state when $n=3$, where the maximal value of the Bell functional is achieved with Bob's measurements being pairwise anticommuting. For any states, we derive the constraints on Alice's measurements in achieving the maximum quantum violation for this inequality. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 18 pages, 3 figures

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 27 March, 2025; v1 submitted 13 July, 2024; originally announced July 2024.

Comments: Update to version accepted for publication in ApJ. 50 pages, 10 figures, 4 tables

Journal ref: ApJ, Volume 980, 2025, 207

arXiv:2407.07470 [pdf]

Observation of Klein bottle quadrupole topological insulators in electric circuits

Authors: Xizhou Shen, Keyu Pan, Xiumei Wang, Xingping Zhou

Abstract: The Klein bottle Benalcazar-Bernevig-Hughes (BBH) insulator phase plays a pivotal role in understanding higher-order topological phases. The insulator phase is characterized by a unique feature: a nonsymmorphic glide symmetry that exists within momentum space, rather than real space. This characteristic transforms the Brillouin zone's fundamental domain into a structure of Klein bottle. Here, we r… ▽ More The Klein bottle Benalcazar-Bernevig-Hughes (BBH) insulator phase plays a pivotal role in understanding higher-order topological phases. The insulator phase is characterized by a unique feature: a nonsymmorphic glide symmetry that exists within momentum space, rather than real space. This characteristic transforms the Brillouin zone's fundamental domain into a structure of Klein bottle. Here, we report an observation of a Klein bottle topoelectrical model under gauge fields. To provide a comprehensive understanding of the different corner distributions of odd and even unit cells, we present theoretical calculations and demonstrate that the symmetry properties significantly affect the topological nature. These theoretical predictions are confirmed by experimental results, which demonstrate the practical feasibility of such topological configurations in electronic circuits. Our work establishes a vital connection between the realms of condensed matter physics and circuit systems, thereby paving a pathway for investigating exotic condensed matter physics. △ Less

Submitted 19 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.02964 [pdf, other]

FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

Authors: Xiaochen Wang, Junqing He, Zhe yang, Yiru Wang, Xiangdi Meng, Kunhao Pan, Zhifang Sui

Abstract: Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance th… ▽ More Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks in addition to improved effectiveness and trustworthiness. Different from COT methods, FSM addresses MHQA by iteratively decomposing a question into multi-turn sub-questions, and self-correcting in time, improving the accuracy of answers in each step. Specifically, FSM addresses one sub-question at a time and decides on the next step based on its current result and state, in an automaton-like format. Experiments on benchmarks show the effectiveness of our method. Although our method performs on par with the baseline on relatively simpler datasets, it excels on challenging datasets like Musique. Moreover, this approach mitigates the hallucination phenomenon, wherein the correct final answer can be recovered despite errors in intermediate reasoning. Furthermore, our method improves LLMs' ability to follow specified output format requirements, significantly reducing the difficulty of answer interpretation and the need for reformatting. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19593 [pdf, other]

SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

Authors: Xin Su, Man Luo, Kris W Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard

Abstract: Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte… ▽ More Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for context-augmented generation. Resources for adapting such models are therefore crucial for enabling their use in retrieval-augmented generation (RAG) settings, where a retriever is used to gather relevant information that is then subsequently provided to a generative model via context augmentation. To address this challenging problem, we generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs which require external knowledge to determine the final answer. Our dataset is both larger and significantly more diverse than existing resources of its kind, possessing over 11x more unique questions and containing images from a greater variety of sources than previously-proposed datasets. Through extensive experiments, we demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18070 [pdf, other]

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the unique characteristics of egocentric videos and provides strong support for our competition submissions. In the Ego4D challenges, we tackle various tasks including Natural Language Queries, Step Grounding, Moment Queries, Short-term Object Interaction Anticipation, and Long-term Action Anticipation. In addition, we also participate in the EPIC-Kitchens challenge, where we engage in the Action Recognition, Multiple Instance Retrieval, and Domain Adaptation for Action Recognition tracks. By adapting EgoVideo to these diverse tasks, we showcase its versatility and effectiveness in different egocentric video analysis scenarios, demonstrating the powerful representation ability of EgoVideo as an egocentric foundation model. Our codebase and pretrained models are publicly available at https://github.com/OpenGVLab/EgoVideo. △ Less

Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Champion solutions in the EgoVis CVPR 2024 workshop

arXiv:2406.17555 [pdf, ps, other]

A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al

Authors: Ji Yan, Jiwei Li, X. T. He, Lifeng Wang, Yaohua Chen, Feng Wang, Xiaoying Han, Kaiqiang Pan, Juxi Liang, Yulong Li, Zanyang Guan, Xiangming Liu, Xingsen Che, Zhongjing Chen, Xing Zhang, Yan Xu, Bin Li, Minging He, Hongbo Cai, Liang. Hao, Zhanjun Liu, Chunyang Zheng, Zhensheng Dai, Zhengfeng Fan, Bin Qiao , et al. (4 additional authors not shown)

Abstract: A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16095 [pdf, other]

doi 10.1088/1367-2630/ad96d8

Constrained Measurement Incompatibility from Generalised Contextuality of Steered Preparation

Authors: Sumit Mukherjee, A. K. Pan

Abstract: In a bipartite Bell scenario involving two local measurements per party and two outcome per measurement, the measurement incompatibility in one wing is both necessary and sufficient to reveal the nonlocality. However, such a one-to-one correspondence fails when one of the observers performs more than two measurements. In such a scenario, the measurement incompatibility is necessary but not suffici… ▽ More In a bipartite Bell scenario involving two local measurements per party and two outcome per measurement, the measurement incompatibility in one wing is both necessary and sufficient to reveal the nonlocality. However, such a one-to-one correspondence fails when one of the observers performs more than two measurements. In such a scenario, the measurement incompatibility is necessary but not sufficient to reveal the nonlocality. In this work, within the formalism of general probabilistic theory (GPT), we demonstrate that unlike the nonlocality, the incompatibility of N arbitrary measurements in one wing is both necessary and sufficient for revealing the generalised contextuality for the sub-system in the other wing. Further, we formulate a novel form of inequality for any GPT that are necessary for N-wise compatibility of N arbitrary observables. Moreover, we argue that any theory that violates the proposed inequality possess a degree of incompatibility that can be quantified through the amount of violation. Finally, we claim that it is the generalised contextuality that provides a restriction to the allowed degree of measurement incompatibility of any viable theory of nature and thereby super-select the the quantum theory. △ Less

Submitted 15 December, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: Final version, after modifications

Journal ref: New J. Phys. 2024

Showing 1–50 of 542 results for author: Pan, K