Search | arXiv e-print repository

Decentralized Min-Max Optimization with Gradient Tracking

Abstract: This paper presents a novel distributed formulation of the min-max optimization problem. Such a formulation enables enhanced flexibility among agents when optimizing their maximization variables. To address the problem, we propose two distributed gradient methods over networks, termed Distributed Gradient Tracking Ascent (DGTA) and Distributed Stochastic Gradient Tracking Ascent (DSGTA). We demons… ▽ More This paper presents a novel distributed formulation of the min-max optimization problem. Such a formulation enables enhanced flexibility among agents when optimizing their maximization variables. To address the problem, we propose two distributed gradient methods over networks, termed Distributed Gradient Tracking Ascent (DGTA) and Distributed Stochastic Gradient Tracking Ascent (DSGTA). We demonstrate that DGTA achieves an iteration complexity of $\mathcal{O}(κ^2\varepsilon^{-2})$, and DSGTA attains a sample complexity of $\mathcal{O}(κ^3\varepsilon^{-4})$ for nonconvex strongly concave (NC-SC) objective functions. Both results match those of their centralized counterparts up to constant factors related to the communication network. Numerical experiments further demonstrate the superior empirical performance of the proposed algorithms compared to existing methods. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.10322 [pdf, other]

Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework

Authors: Yijie Zhou, Shi Pu

Abstract: Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assum… ▽ More Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assumptions of bounded computation and communication times. To understand the convergence of ADSGD, we first analyze Asynchronous Stochastic Block Coordinate Descent (ASBCD) as a tool, and then show that ADSGD converges under computation-delay-independent step sizes. The convergence result is established without assuming bounded data heterogeneity. Empirical experiments reveal that ADSGD outperforms existing methods in wall-clock convergence time across various scenarios. With its simplicity, efficiency in memory and communication, and resilience to communication and computation delays, ADSGD is well-suited for real-world decentralized learning tasks. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2503.17489 [pdf, other]

Judge Anything: MLLM as a Judge Across Any Modality

Authors: Shu Pu, Yaochen Wang, Dongping Chen, Yuhang Chen, Guohao Wang, Qi Qin, Zhongyi Zhang, Zhiyuan Zhang, Zetong Zhou, Shuang Gong, Yi Gui, Yao Wan, Philip S. Yu

Abstract: Evaluating generative foundation models on open-ended multimodal understanding (MMU) and generation (MMG) tasks across diverse modalities (e.g., images, audio, video) poses significant challenges due to the complexity of cross-modal interactions. To this end, the idea of utilizing Multimodal LLMs (MLLMs) as automated judges has emerged, with encouraging results in assessing vision-language underst… ▽ More Evaluating generative foundation models on open-ended multimodal understanding (MMU) and generation (MMG) tasks across diverse modalities (e.g., images, audio, video) poses significant challenges due to the complexity of cross-modal interactions. To this end, the idea of utilizing Multimodal LLMs (MLLMs) as automated judges has emerged, with encouraging results in assessing vision-language understanding tasks. Moving further, this paper extends MLLM-as-a-Judge across modalities to a unified manner by introducing two benchmarks, TaskAnything and JudgeAnything, to respectively evaluate the overall performance and judging capabilities of MLLMs across any-to-any modality tasks. Specifically, TaskAnything evaluates the MMU and MMG capabilities across 15 any-to-any modality categories, employing 1,500 queries curated from well-established benchmarks. Furthermore, JudgeAnything evaluates the judging capabilities of 5 advanced (e.g., GPT-4o and Gemini-2.0-Flash) from the perspectives of Pair Comparison and Score Evaluation, providing a standardized testbed that incorporates human judgments and detailed rubrics. Our extensive experiments reveal that while these MLLMs show promise in assessing MMU (i.e., achieving an average of 66.55% in Pair Comparison setting and 42.79% in Score Evaluation setting), they encounter significant challenges with MMG tasks (i.e., averaging only 53.37% in Pair Comparison setting and 30.05% in Score Evaluation setting), exposing cross-modality biases and hallucination issues. To address this, we present OmniArena, an automated platform for evaluating omni-models and multimodal reward models. Our work highlights the need for fairer evaluation protocols and stronger alignment with human preferences. The source code and dataset are publicly available at: https://urrealhero.github.io/judgeanythingweb/. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.16123 [pdf, other]

Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

Authors: Runze You, Shi Pu

Abstract: We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, Spanning Tree Push-Pull (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradi… ▽ More We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, Spanning Tree Push-Pull (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradients. Unlike prior approaches that rely heavily on spectral gap properties, STPP leverages a more flexible topological characterization, enabling robust information flow and efficient updates. Theoretically, we prove that STPP achieves linear speedup and polynomial transient iteration complexity, up to $O(n^7)$ for smooth nonconvex objectives and $\tilde{O}(n^3)$ for smooth strongly convex objectives, under arbitrary network topologies. Moreover, compared with the existing methods, STPP achieves faster convergence rates on sparse and non-regular topologies (e.g., directed ring) and reduces communication overhead on dense networks (e.g., static exponential graph). These results significantly advance the state of the art, especially when $n$ is large. Numerical experiments further demonstrate the strong performance of STPP and confirm the practical relevance of its theoretical convergence rates across various common graph architectures. Our code is available at https://anonymous.4open.science/r/SpanningTreePushPull-5D3E. △ Less

Submitted 20 March, 2025; originally announced March 2025.

arXiv:2503.13320 [pdf, other]

Radiative corrections on vortical spin polarization in hot QCD matter

Authors: Shuo Fang, Shi Pu, Di-Lun Yang

Abstract: We investigate the radiative corrections on spin polarization of relativistic fermions induced by vortical fields in thermal-equilibrium QCD matter at weak coupling. Such corrections stem from the self-energy gradients in quantum kinetic theory, which are further obtained by a more systematic and general approach through the Keldysh equation. By applying the hard-thermal-loop approximation, we obt… ▽ More We investigate the radiative corrections on spin polarization of relativistic fermions induced by vortical fields in thermal-equilibrium QCD matter at weak coupling. Such corrections stem from the self-energy gradients in quantum kinetic theory, which are further obtained by a more systematic and general approach through the Keldysh equation. By applying the hard-thermal-loop approximation, we obtain new corrections upon the spin-polarization spectrum and also the axial-charge current in connection to the axial/chiral vortical effect for massive quarks up to the leading order of the QCD coupling. Further influence on spin alignment of vector mesons from similar effects is also analyzed. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Comments: 14 pages

arXiv:2501.07424 [pdf]

Photonic antiferromagnetic topological insulator with a single surface Dirac cone

Authors: Fujia Chen, Ning Han, Songyang Pu, Rui Zhao, Li Zhang, Qiaolu Chen, Yuze Hu, Mingyu Tong, Wenhao Li, Junyao Wu, Yudong Ren Xinrui Li, Wenyan Yin, Hongsheng Chen, Rui-Xing Zhang, Yihao Yang

Abstract: Antiferromagnetism, characterized by magnetic moments aligned in alternating directions with a vanished ensemble average, has garnered renewed interest for its potential applications in spintronics and axion dynamics. The synergy between antiferromagnetism and topology can lead to the emergence of an exotic topological phase unique to certain magnetic order, termed antiferromagnetic topological in… ▽ More Antiferromagnetism, characterized by magnetic moments aligned in alternating directions with a vanished ensemble average, has garnered renewed interest for its potential applications in spintronics and axion dynamics. The synergy between antiferromagnetism and topology can lead to the emergence of an exotic topological phase unique to certain magnetic order, termed antiferromagnetic topological insulators (AF TIs). A hallmark signature of AF TIs is the presence of a single surface Dirac cone--a feature typically associated with strong three-dimensional (3D) topological insulators--only on certain symmetry-preserving crystal terminations. However, the direct observation of this phenomenon poses a significant challenge. Here, we have theoretically and experimentally discovered a 3D photonic AF TI hosting a single surface Dirac cone protected by the combined symmetry of time reversal and half-lattice translation. Conceptually, our setup can be viewed as a z-directional stack of two-dimensional Chern insulators, with adjacent layers oppositely magnetized to form a 3D type-A AF configuration. By measuring both bulk and surface states, we have directly observed the symmetry-protected gapless single-Dirac-cone surface state, which shows remarkable robustness against random magnetic disorders. Our work constitutes the first realization of photonic AF TIs and photonic analogs of strong topological insulators, opening a new chapter for exploring novel topological photonic devices and phenomena that incorporate additional magnetic degrees of freedom. △ Less

Submitted 13 January, 2025; originally announced January 2025.

Comments: 13 pages, 4 figures

arXiv:2501.03390 [pdf, ps, other]

State-of-the-art Methods for Pseudo-Boolean Solving with SCIP

Authors: Gioni Mexi, Dominik Kamp, Yuji Shinano, Shanwen Pu, Alexander Hoen, Ksenia Bestuzheva, Christopher Hojny, Matthias Walter, Marc E. Pfetsch, Sebastian Pokutta, Thorsten Koch

Abstract: The Pseudo-Boolean problem deals with linear or polynomial constraints with integer coefficients over Boolean variables. The objective lies in optimizing a linear objective function, or finding a feasible solution, or finding a solution that satisfies as many constraints as possible. In the 2024 Pseudo-Boolean competition, solvers incorporating the SCIP framework won five out of six categories it… ▽ More The Pseudo-Boolean problem deals with linear or polynomial constraints with integer coefficients over Boolean variables. The objective lies in optimizing a linear objective function, or finding a feasible solution, or finding a solution that satisfies as many constraints as possible. In the 2024 Pseudo-Boolean competition, solvers incorporating the SCIP framework won five out of six categories it was competing in. From a total of 1,207 instances, SCIP successfully solved 759, while its parallel version FiberSCIP solved 776. Based on the results from the competition, we further enhanced SCIP's Pseudo-Boolean capabilities. This article discusses the results and presents the winning algorithmic ideas. △ Less

Submitted 8 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

arXiv:2412.19400 [pdf, ps, other]

Spin alignment of vector mesons in local equilibrium by Zubarev's approach

Authors: Shi-Zheng Yang, Xin-Qing Xie, Shi Pu, Jian-Hua Gao, Qun Wang

Abstract: We compute the $00$ element of the spin density matrix, denoted as $ρ_{00}$ and called the spin alignment, up to the second order of the gradient expansion in local equilibrium by Zubarev's approach. In the first order, we obtain $ρ_{00}=1/3$, meaning that the contributions from thermal vorticity and shear stress tensor are vanishing. The non-vanishing contributions to $ρ_{00}-1/3$ appear in the s… ▽ More We compute the $00$ element of the spin density matrix, denoted as $ρ_{00}$ and called the spin alignment, up to the second order of the gradient expansion in local equilibrium by Zubarev's approach. In the first order, we obtain $ρ_{00}=1/3$, meaning that the contributions from thermal vorticity and shear stress tensor are vanishing. The non-vanishing contributions to $ρ_{00}-1/3$ appear in the second order of gradients in the Belinfante and canonical cases. We also discuss the properties of the spin density matrix under the time reversal transformation. The effective transport coefficient for the spin alignment induced by the thermal shear stress tensor is T-odd in the first order, implying that the first order effect is dissipative. △ Less

Submitted 26 December, 2024; originally announced December 2024.

Comments: 20 pages

arXiv:2412.13054 [pdf, other]

Distributed Normal Map-based Stochastic Proximal Gradient Methods over Networks

Authors: Kun Huang, Shi Pu, Angelia Nedić

Abstract: Consider $n$ agents connected over a network collaborate to minimize the average of their local cost functions combined with a common nonsmooth function. This paper introduces a unified algorithmic framework for solving such a problem through distributed stochastic proximal gradient methods, leveraging the normal map update scheme. Within this framework, we propose two new algorithms, termed Norma… ▽ More Consider $n$ agents connected over a network collaborate to minimize the average of their local cost functions combined with a common nonsmooth function. This paper introduces a unified algorithmic framework for solving such a problem through distributed stochastic proximal gradient methods, leveraging the normal map update scheme. Within this framework, we propose two new algorithms, termed Normal Map-based Distributed Stochastic Gradient Tracking (norM-DSGT) and Normal Map-based Exact Diffusion (norM-ED), to solve the distributed composite optimization problem over a connected network. We demonstrate that both methods can asymptotically achieve comparable convergence rates to the centralized stochastic proximal gradient descent method under a general variance condition on the stochastic gradients. Additionally, the number of iterations required for norM-ED to achieve such a rate (i.e., the transient time) behaves as $\mathcal{O}(n^{3}/(1-λ)^2)$ for minimizing composite objective functions, matching the performance of the non-proximal ED algorithm. Here $1-λ$ denotes the spectral gap of the mixing matrix related to the underlying network topology. To our knowledge, such a convergence result is state-of-the-art for the considered composite problem. Under the same condition, norM-DSGT enjoys a transient time of $\mathcal{O}(\max\{n^3/(1-λ)^2, n/(1-λ)^4\})$ and behaves more stable than norM-ED under decaying stepsizes for solving the tested problems. △ Less

Submitted 26 December, 2024; v1 submitted 17 December, 2024; originally announced December 2024.

Comments: 34 pages, 5 figures

arXiv:2412.02320 [pdf, other]

Simulating Composite Fermion Excitons by Density Functional Theory and Monte Carlo on a Disk

Authors: Yi Yang, Songyang Pu, Yayun Hu, Zi-Xiang Hu

Abstract: The Kohn-Sham density functional method for the fractional quantum Hall (FQH) effect has recently been developed by mapping the strongly interacting electrons into an auxiliary system of weakly interacting composite fermions (CFs) that experience a density-dependent effective magnetic field. This approach has been successfully applied to explore the edge rescontruction, fractional charge and fract… ▽ More The Kohn-Sham density functional method for the fractional quantum Hall (FQH) effect has recently been developed by mapping the strongly interacting electrons into an auxiliary system of weakly interacting composite fermions (CFs) that experience a density-dependent effective magnetic field. This approach has been successfully applied to explore the edge rescontruction, fractional charge and fractional braiding statistics of quasiparticle excitations. In this work, we investigate composite fermion excitons in the bulk of the disk geometry. By varying the separation of the quasiparticle-quasihole pairs and calculating their energy, we compare the dispersion of the magnetoroton mode with results from other numerical methods, such as exact diagonalization (ED) and Monte Carlo (MC) simulation. Furthermore, through an evaluation of the spectral function, we identify chiral ``graviton'' excitations: a spin $-2$ mode for the particle-like Laughlin state and a spin $2$ mode for the hole-like Laughlin state. This method can be extended to construct neutral collective excitations for other fractional quantum Hall states in disk geometry. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 11 pages, 6 figures

arXiv:2412.00659 [pdf, other]

Linear Convergence Analysis of Single-loop Algorithm for Bilevel Optimization via Small-gain Theorem

Authors: Jianhui Li, Shi Pu, Jianqi Chen, Junfeng Wu

Abstract: Bilevel optimization has gained considerable attention due to its broad applicability across various fields. While several studies have investigated the convergence rates in the strongly-convex-strongly-convex (SC-SC) setting, no prior work has proven that a single-loop algorithm can achieve linear convergence. This paper employs a small-gain theorem in {robust control theory} to demonstrate that… ▽ More Bilevel optimization has gained considerable attention due to its broad applicability across various fields. While several studies have investigated the convergence rates in the strongly-convex-strongly-convex (SC-SC) setting, no prior work has proven that a single-loop algorithm can achieve linear convergence. This paper employs a small-gain theorem in {robust control theory} to demonstrate that a single-loop algorithm based on the implicit function theorem attains a linear convergence rate of $\mathcal{O}(ρ^{k})$, where $ρ\in(0,1)$ is specified in Theorem 3. Specifically, We model the algorithm as a dynamical system by identifying its two interconnected components: the controller (the gradient or approximate gradient functions) and the plant (the update rule of variables). We prove that each component exhibits a bounded gain and that, with carefully designed step sizes, their cascade accommodates a product gain strictly less than one. Consequently, the overall algorithm can be proven to achieve a linear convergence rate, as guaranteed by the small-gain theorem. The gradient boundedness assumption adopted in the single-loop algorithm (\cite{hong2023two, chen2022single}) is replaced with a gradient Lipschitz assumption in Assumption 2.2. To the best of our knowledge, this work is first-known result on linear convergence for a single-loop algorithm. △ Less

Submitted 30 November, 2024; originally announced December 2024.

arXiv:2411.17285 [pdf, other]

A solvable model for spin polarizations with flow-momentum correspondence

Authors: Anum Arslan, Wen-Bo Dong, Guo-Liang Ma, Shi Pu, Qun Wang

Abstract: We present an analytically solvable model based on the blast-wave picture of heavy-ion collisions with flow-momentum correspondence. It can describe the key features of spin polarizations in heavy-ion collisions. With the analytical solution, we can clearly show that the spin polarization with respect to the reaction plane is governed by the directed flow, while the spin polarization along the bea… ▽ More We present an analytically solvable model based on the blast-wave picture of heavy-ion collisions with flow-momentum correspondence. It can describe the key features of spin polarizations in heavy-ion collisions. With the analytical solution, we can clearly show that the spin polarization with respect to the reaction plane is governed by the directed flow, while the spin polarization along the beam direction is governed by the ellipticity in flow and in transverse emission area. There is a symmetry between the contribution from the vorticity and from the shear stress tensor due to the flow-momentum correspondence. The solution can be improved systematically by perturbation method. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: RevTex 4, 12 pages, 8 figures, 2 tables

arXiv:2411.17188 [pdf, other]

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment

Authors: Dongping Chen, Ruoxi Chen, Shu Pu, Zhaoyi Liu, Yanru Wu, Caixi Chen, Benlin Liu, Yue Huang, Yao Wan, Pan Zhou, Ranjay Krishna

Abstract: Many real-world user queries (e.g. "How do to make egg fried rice?") could benefit from systems capable of generating responses with both textual steps with accompanying images, similar to a cookbook. Models designed to generate interleaved text and images face challenges in ensuring consistency within and across these modalities. To address these challenges, we present ISG, a comprehensive evalua… ▽ More Many real-world user queries (e.g. "How do to make egg fried rice?") could benefit from systems capable of generating responses with both textual steps with accompanying images, similar to a cookbook. Models designed to generate interleaved text and images face challenges in ensuring consistency within and across these modalities. To address these challenges, we present ISG, a comprehensive evaluation framework for interleaved text-and-image generation. ISG leverages a scene graph structure to capture relationships between text and image blocks, evaluating responses on four levels of granularity: holistic, structural, block-level, and image-specific. This multi-tiered evaluation allows for a nuanced assessment of consistency, coherence, and accuracy, and provides interpretable question-answer feedback. In conjunction with ISG, we introduce a benchmark, ISG-Bench, encompassing 1,150 samples across 8 categories and 21 subcategories. This benchmark dataset includes complex language-vision dependencies and golden answers to evaluate models effectively on vision-centric tasks such as style transfer, a challenging area for current models. Using ISG-Bench, we demonstrate that recent unified vision-language models perform poorly on generating interleaved content. While compositional approaches that combine separate language and image models show a 111% improvement over unified models at the holistic level, their performance remains suboptimal at both block and image levels. To facilitate future work, we develop ISG-Agent, a baseline agent employing a "plan-execute-refine" pipeline to invoke tools, achieving a 122% performance improvement. △ Less

Submitted 24 March, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

Comments: Accepted by ICLR 2025 as Spotlight. Project homepage: https://interleave-eval.github.io/

arXiv:2411.12591 [pdf, other]

Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination

Authors: Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun

Abstract: Multimodal large language models (MLLMs) have advanced the integration of visual and linguistic modalities, establishing themselves as the dominant paradigm for visual-language tasks. Current approaches like chain of thought (CoT) reasoning have augmented the cognitive capabilities of large language models (LLMs), yet their adaptation to MLLMs is hindered by heightened risks of hallucination in cr… ▽ More Multimodal large language models (MLLMs) have advanced the integration of visual and linguistic modalities, establishing themselves as the dominant paradigm for visual-language tasks. Current approaches like chain of thought (CoT) reasoning have augmented the cognitive capabilities of large language models (LLMs), yet their adaptation to MLLMs is hindered by heightened risks of hallucination in cross-modality comprehension. In this paper, we find that the thinking while looking paradigm in current multimodal CoT approaches--where reasoning chains are generated alongside visual input--fails to mitigate hallucinations caused by misleading images. To address these limitations, we propose the Visual Inference Chain (VIC) framework, a novel approach that constructs reasoning chains using textual context alone before introducing visual input, effectively reducing cross-modal biases and enhancing multimodal reasoning accuracy. Comprehensive evaluations demonstrate that VIC significantly improves zero-shot performance across various vision-related tasks, mitigating hallucinations while refining the reasoning capabilities of MLLMs. Our code repository can be found at https://github.com/Terry-Xu-666/visual_inference_chain. △ Less

Submitted 15 November, 2024; originally announced November 2024.

arXiv:2410.13491 [pdf, other]

Progenitor diversity in the accreted stellar halos of Milky Way-like galaxies

Authors: Sy-Yun Pu, Andrew P. Cooper, Robert J. J. Grand, Facundo A. Gómez, Antonela Monachesi

Abstract: Ongoing large stellar spectroscopic surveys of the Milky Way seek to reconstruct the major events in the assembly history of the Galaxy. Chemical and kinematic observations can be used to separate the contributions of different progenitor galaxies to the present-day stellar halo. Here we compute the number of progenitors that contribute to the accreted stellar halos of simulated Milky Way-like gal… ▽ More Ongoing large stellar spectroscopic surveys of the Milky Way seek to reconstruct the major events in the assembly history of the Galaxy. Chemical and kinematic observations can be used to separate the contributions of different progenitor galaxies to the present-day stellar halo. Here we compute the number of progenitors that contribute to the accreted stellar halos of simulated Milky Way-like galaxies as a function of radius (the radial diversity) in three suites of models: Bullock & Johnston, Aquarius and Auriga. We show that there are significant differences between the predictions of these three models, beyond the halo-to-halo scatter expected in $Λ$CDM. Predictions of diversity from numerical simulations are sensitive to model-dependent assumptions regarding the efficiency of star formation in dwarf galaxies. We compare, at face value, to current constraints on the radial diversity of the Milky Way's accreted halo. These constraints imply that the halo of our Galaxy is dominated by $\sim2$ progenitors in the range $8-45\,\mathrm{kpc}$, in contrast to averages of $7$ progenitors in the Bullock & Johnston models, $3.5$ in Aquarius and $4.2$ in Auriga over the same region. We additionally find that the models with radial diversity most similar to that of the Milky Way are predominantly those with ongoing merger events. The Milky Way therefore appears unusual in having an accreted stellar halo dominated by a small number of progenitors accreted at very early times. △ Less

Submitted 25 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: 20 pages, 10 figures, published in ApJ. This paper describes the first public release of the particle-tagging stellar halo data from Cooper et al. (2010), see https://github.com/nthu-ga/aquarius-halos

arXiv:2410.02524 [pdf, other]

Constraining cosmology with N-body simulations for future spectroscopic galaxy surveys at $2\leq z\leq 3$

Authors: Sy-Yun Pu, Teppei Okumura, Chian-Chou Chen, Takahiro Nishimichi, Kazuyuki Akitsu

Abstract: Determining the spatial curvature ($Ω_k$) independent of cosmic microwave background observations plays a key role in revealing the physics of the early universe. The Hubble tension is one of the most serious issues in modern cosmology. We investigate halo catalogs identified from $N$-body simulations at $z=2$ and 3, mimicking high-redshift galaxy surveys. We measure redshift-space correlation fun… ▽ More Determining the spatial curvature ($Ω_k$) independent of cosmic microwave background observations plays a key role in revealing the physics of the early universe. The Hubble tension is one of the most serious issues in modern cosmology. We investigate halo catalogs identified from $N$-body simulations at $z=2$ and 3, mimicking high-redshift galaxy surveys. We measure redshift-space correlation functions of halos from the two snapshots. We detect clear features of baryon acoustic oscillations and redshift-space distortions. We find that we can obtain a few percent constraints on both the geometric distances and growth of structure at the distant universe in future surveys. By taking into account the information of the underlying matter power spectrum, we demonstrate that we can also achieve constraint on the Hubble constant $H_0$ with a few percent as well as the spatial curvature with $|Ω_k|\lesssim 0.1$ by observing galaxies with the number density with $\bar{n}_{\rm g}\simeq 10^{-4} (~h^3{\rm ~Mpc}^{-3})$. Our analysis provides a timely forecast for the upcoming spectroscopic surveys, which target emission line galaxy or dusty star-forming galaxy samples. △ Less

Submitted 28 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: 8 pages, 4 figures, 3 tables; references and results updated; typos corrected

arXiv:2409.18971 [pdf, other]

Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better

Authors: Mengying Ge, Mingyang Li, Dongkai Tang, Pengbo Li, Kuo Liu, Shuhao Deng, Songbai Pu, Long Liu, Yang Song, Tao Zhang

Abstract: In this paper, we present our solutions for emotion recognition in the sub-challenges of Multimodal Emotion Recognition Challenge (MER2024). To mitigate the modal competition issue between audio and text, we adopt an early fusion strategy based on a large language model, where joint training of audio and text is conducted initially. And the joint Audio-Text modal feature will be late-fused with ot… ▽ More In this paper, we present our solutions for emotion recognition in the sub-challenges of Multimodal Emotion Recognition Challenge (MER2024). To mitigate the modal competition issue between audio and text, we adopt an early fusion strategy based on a large language model, where joint training of audio and text is conducted initially. And the joint Audio-Text modal feature will be late-fused with other unimodal features. In order to solve the problems of data insufficiency and class imbalance, We use multiple turns of multi-model voting for data mining. Moreover, to enhance the quality of audio features, we employ speech source separation to preprocess audios. Our model ranks \textbf{2nd} in both MER2024-SEMI and MER2024-NOISE, validating our method's effectiveness. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.00456 [pdf, ps, other]

Corrections from space-time dependent electromagnetic fields to Wigner functions and spin polarization

Authors: Shi-Zheng Yang, Jian-Hua Gao, Shi Pu

Abstract: We have derived the Wigner equations at global equilibrium with constant vorticity but space-time dependent electromagnetic fields up to second order in semiclassical expansion. We obtain the new second-order contributions to the charge currents and energy-momentum tensor from the varying electromagnetic fields. We also compute the new corrections to the spin polarization pesudo-vector from both c… ▽ More We have derived the Wigner equations at global equilibrium with constant vorticity but space-time dependent electromagnetic fields up to second order in semiclassical expansion. We obtain the new second-order contributions to the charge currents and energy-momentum tensor from the varying electromagnetic fields. We also compute the new corrections to the spin polarization pesudo-vector from both contant and varying electromagnetic fields. We also find that the space-time dependent electromagnetic field provides a tighter constraint on the solutions of Wigner functions in global equilibrium compared with constant electromagnetic field. △ Less

Submitted 23 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

Comments: 20 pages, typos corrected, Sec.VI reorganized

arXiv:2408.09877 [pdf, other]

doi 10.1103/PhysRevD.111.034015

Collisional corrections to spin polarization from quantum kinetic theory using Chapman-Enskog expansion

Authors: Shuo Fang, Shi Pu

Abstract: We have investigated the collisional corrections to the spin polarization pseudo-vector, $δ\mathcal{P}^μ$, using quantum kinetic theory in Chapman-Enskog expansion. We derive the spin Boltzmann equation incorporating Møller scattering process. We further consider two distinct scenarios using hard thermal loop approximations for simplification. In scenario (I), the vector charge distribution functi… ▽ More We have investigated the collisional corrections to the spin polarization pseudo-vector, $δ\mathcal{P}^μ$, using quantum kinetic theory in Chapman-Enskog expansion. We derive the spin Boltzmann equation incorporating Møller scattering process. We further consider two distinct scenarios using hard thermal loop approximations for simplification. In scenario (I), the vector charge distribution function is treated as off-equilibrium under the validity domain of gradient expansion. Remarkably, the polarization induced by gradients of thermal chemical potential and shear viscous tensors are modified, but $δ\mathcal{P}_{\textrm{ }}^μ$ in this scenario does not depend on the coupling constant. In scenario (II), the vector charge distribution function is assumed to be in local thermal equilibrium. Then collisional corrections $δ\mathcal{P}_{\textrm{ }}^μ$ in this scenario are at $\mathcal{O}(\hbar^{2}\partial^{2})$. Additionally, we evaluate the $δ\mathcal{P}^μ$ using relaxation time approach for comparative analysis. Our results establish the theoretical framework necessary for the future numerical investigations on the interaction corrections to spin polarization. △ Less

Submitted 10 March, 2025; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: 32 pages, 1 figure; version accepted for publication of PRD

Journal ref: Phys.Rev.D 111 (2025) 3, 034015

arXiv:2408.04296 [pdf, other]

Spin polarization of $Λ$ hyperons along beam direction in p+Pb collisions at $\sqrt{s_{NN}}=8.16$ TeV using hydrodynamic approaches

Authors: Cong Yi, Xiang-Yu Wu, Jie Zhu, Shi Pu, Guang-You Qin

Abstract: We have implemented the 3+1 dimensional CLVisc hydrodynamics model with TRENTO-3D initial conditions to investigate the spin polarization of $Λ$ hyperons along the beam direction in p+Pb collisions at $\sqrt{s_{NN}} = 8.16$ TeV. Following our previous theoretical framework based on quantum kinetic theory, we consider three different scenarios: $Λ$ equilibrium, $s$ quark equilibrium, and iso-therma… ▽ More We have implemented the 3+1 dimensional CLVisc hydrodynamics model with TRENTO-3D initial conditions to investigate the spin polarization of $Λ$ hyperons along the beam direction in p+Pb collisions at $\sqrt{s_{NN}} = 8.16$ TeV. Following our previous theoretical framework based on quantum kinetic theory, we consider three different scenarios: $Λ$ equilibrium, $s$ quark equilibrium, and iso-thermal equilibrium scenarios. We have computed the second Fourier sine coefficients of spin polarization along the beam direction, denoted as $\left\langle P_{z} \sin 2(φ_{p} - Ψ_{2}) \right\rangle$, with $φ_{p} - Ψ_{2}$ being the azimuthal angle relative to the second-order event plane $Ψ_{2}$, as functions of multiplicity, transverse momentum and pseudo-rapidity in the three scenarios. Additionally, we have also computed the spin polarization along the beam direction, $P_{z}$, as a function of the azimuthal angle. We find that the spin polarization induced by thermal vorticity always provides an opposite contribution compared to the shear-induced polarization in p+Pb collisions. The total spin polarization computed by the current hydrodynamic model disagrees with the data measured by LHC-CMS experiments. Our findings imply that other non-flow effects may play a crucial role in p+Pb collisions. △ Less

Submitted 30 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 8 pages, 5 figures and 1 table. A new figure for local polarization as a function of pseudo-rapidity is added. submitted to PRC

arXiv:2408.03781 [pdf, other]

Late-time asymptotic solutions, attractor, and focusing behavior of spin hydrodynamics

Authors: Dong-Lin Wang, Li Yan, Shi Pu

Abstract: We have investigated the late-time asymptotic solutions, attractor, and focusing behavior of minimal causal spin hydrodynamics in Bjorken expansion. Using the method of dominant balance, we derive the late-time asymptotic solutions of the evolution equation for spin density and identify the specific conditions necessary for the spin density to exhibit a power-law decay. We then analyze both the la… ▽ More We have investigated the late-time asymptotic solutions, attractor, and focusing behavior of minimal causal spin hydrodynamics in Bjorken expansion. Using the method of dominant balance, we derive the late-time asymptotic solutions of the evolution equation for spin density and identify the specific conditions necessary for the spin density to exhibit a power-law decay. We then analyze both the late-time and early-time attractors for the decay rate of spin density. Additionally, we report the focusing behavior in spin hydrodynamics, which has not been found in conventional relativistic hydrodynamics in Bjorken expansion. Our findings suggest that spin density can be treated as a conventional hydrodynamic variable at late times under certain conditions. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 34 pages, 6 figures

arXiv:2408.01727 [pdf, other]

A Robust Compressed Push-Pull Method for Decentralized Nonconvex Optimization

Authors: Yiwei Liao, Zhuorui Li, Shi Pu, Tsung-Hui Chang

Abstract: In the modern paradigm of multi-agent networks, communication has become one of the main bottlenecks for decentralized optimization, where a large number of agents are involved in minimizing the average of the local cost functions. In this paper, we propose a robust compressed push-pull algorithm (RCPP) that combines gradient tracking with communication compression. In particular, RCPP is robust u… ▽ More In the modern paradigm of multi-agent networks, communication has become one of the main bottlenecks for decentralized optimization, where a large number of agents are involved in minimizing the average of the local cost functions. In this paper, we propose a robust compressed push-pull algorithm (RCPP) that combines gradient tracking with communication compression. In particular, RCPP is robust under a much more general class of compression operators that allow both relative and absolute compression errors, in contrast to the existing works which can handle either one of them or assume convex problems. We show that RCPP enjoys sublinear convergence rate for smooth and possibly nonconvex objective functions over general directed networks. Moreover, under the additional Polyak-Łojasiewicz condition, linear convergence rate can be achieved for RCPP. Numerical examples verify the theoretical findings and demonstrate the efficiency, flexibility, and robustness of the proposed algorithm. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.07091

arXiv:2407.16978 [pdf, other]

doi 10.1103/PhysRevB.110.024427

Understanding the Ising zigzag antiferromagnetism of FePS3 and FePSe3 monolayers

Authors: Ke Yang, Yueyue Ning, Yuxuan Zhou, Di Lu, Yaozhenghang Ma, Lu Liu, Shengli Pu, Hua Wu

Abstract: This study investigates the spin-orbital states of FePS3 and FePSe3 monolayers and the origin of their Ising zigzag AFM, using DFT, crystal field level diagrams, superexchange analyses, and parallel tempering MC simulations. Our calculations show that under the trigonal elongation of the FeS6 (FeSe6) octahedra, the $e_g^π$ doublet of the Fe 3d crystal field levels lies lower than the $a_{1g}$ sing… ▽ More This study investigates the spin-orbital states of FePS3 and FePSe3 monolayers and the origin of their Ising zigzag AFM, using DFT, crystal field level diagrams, superexchange analyses, and parallel tempering MC simulations. Our calculations show that under the trigonal elongation of the FeS6 (FeSe6) octahedra, the $e_g^π$ doublet of the Fe 3d crystal field levels lies lower than the $a_{1g}$ singlet by about 108 meV (123 meV), which is much larger than the strength of Fe 3d SOC. Then, the half-filled minority-spin $e_g^π$ doublet of the high-spin Fe$^{2+}$ ions ($d^{5\uparrow,1\downarrow}$) splits by the SOC into the lower $L_{z+}$ and higher $L_{z-}$ states. The spin-orbital ground state $d^{5\uparrow}$$L_{z+}^{1\downarrow}$ formally with $S_z$ = 2 and $L_z$ = 1 gives the large z-axis spin/orbital moments of 3.51/0.76 $μ_{B}$ (3.41/0.67 $μ_{B}$) for FePS$_3$ (FePSe$_3$) monolayer, and both the moments are reduced by the strong (stronger) Fe 3d hybridizations with S 3p (Se 4p) states. As a result, FePS3 (FePSe3) monolayer has a huge perpendicular single-ion anisotropy energy of 19.4 meV (14.9 meV), giving an Ising-type magnetism. Moreover, via the maximally localized Wannier functions, we find that the first nearest neighboring (1NN) Fe-Fe pair has large hopping parameters in between some specific orbitals, and so does the 3NN Fe-Fe pair. In contrast, the 2NN Fe-Fe pair has much smaller hopping parameters and the 4NN Fe-Fe pair has negligibly small ones. Then, a combination of those hopping parameters and the superexchange picture can readily explain the computed strong 1NN ferromagnetic coupling and the strong 3NN antiferromagnetic one but the relatively much smaller 2NN antiferromagnetic coupling. Furthermore, our PTMC simulations give TN of 119 K for FePS3 monolayer and also predict for FePSe3 monolayer the same magnetic structure with a close or even higher TN. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures, 3 tables

Journal ref: Phys. Rev. B 110, 024427 (2024)

arXiv:2407.11119 [pdf, other]

doi 10.1103/PhysRevB.111.115119

Entanglement scaling and charge fluctuations in a Fermi liquid of composite fermions

Authors: Cristian Voinea, Songyang Pu, Ajit C. Balram, Zlatko Papić

Abstract: The composite fermion Fermi liquid (CFL) state at $ν=1/2$ filling of a Landau level is a paradigmatic non-Fermi liquid borne out purely by Coulomb interactions. But in what ways is this exotic state of matter different from a Fermi liquid? The CFL entanglement entropy was indeed found to exhibit a significant enhancement compared to free electrons [Shao et al., Phys. Rev. Lett. 114, 206402 (2015)]… ▽ More The composite fermion Fermi liquid (CFL) state at $ν=1/2$ filling of a Landau level is a paradigmatic non-Fermi liquid borne out purely by Coulomb interactions. But in what ways is this exotic state of matter different from a Fermi liquid? The CFL entanglement entropy was indeed found to exhibit a significant enhancement compared to free electrons [Shao et al., Phys. Rev. Lett. 114, 206402 (2015)], which was subsequently ruled out as a finite-size effect by the study of a lattice CFL analog [Mishmash and Motrunich, Phys. Rev. B 94, 081110 (2016)]. Moreover, the enhancement was not observed in a quasi-one-dimensional limit of the Coulomb ground state at $ν=1/2$ [Geraedts et al., Science 352, 197 (2016)]. Here, we revisit the problem of entanglement scaling in the CFL state realized in a two-dimensional electron gas. Using Monte Carlo evaluation of the second Rényi entropy $S_2$ for the CFL variational wave function, we show that the entanglement enhancement is present not only at $ν=1/2$ but also at $ν=1/4$, as well as in bosonic CFL states at $ν=1$ and $ν=1/3$ fillings. In all cases, we find the scaling of $S_2$ with subsystem size to be enhanced compared to the non-interacting case, and insensitive to the choice of geometry and projection to the lowest Landau level. We also demonstrate that, for CFL states, the variance of the particle number in a subsystem obeys area-law scaling with a universal subleading corner contribution, in stark contrast with free fermions. Our results establish the enhanced entanglement scaling and suppressed charge fluctuations as fingerprints of non-Fermi-liquid correlations in CFL states. △ Less

Submitted 28 March, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures; changed format, added data

Journal ref: Phys. Rev. B 111, 115119 (2025)

arXiv:2407.06091 [pdf, other]

Light nuclei photoproduction in relativistic heavy ion ultraperipheral collisions

Authors: Jin-Yu Hu, Shuo Lin, Shi Pu, Qun Wang

Abstract: We have investigated light nuclei pair photoproduction in relativistic heavy ion ultraperipheral collisions. As a first attempt, we employ our previously developed quantum electrodynamics model, which incorporates a wave-packet description of initial nuclei, to compute the cross section for proton-antiproton pair photoproduction. The effective vertex for the photon and proton interaction is chosen… ▽ More We have investigated light nuclei pair photoproduction in relativistic heavy ion ultraperipheral collisions. As a first attempt, we employ our previously developed quantum electrodynamics model, which incorporates a wave-packet description of initial nuclei, to compute the cross section for proton-antiproton pair photoproduction. The effective vertex for the photon and proton interaction is chosen based on studies of two-photon exchange effects in hadron physics. We present the transverse momentum, invariant mass, and azimuthal angle distributions of proton-antiproton pairs at $\sqrt{s_{NN}}=200$ GeV in Au+Au ultraperipheral collisions. We observe a $\cos(2φ)$ modulation and an almost negligible $\cos(4φ)$ modulation in the azimuthal angle distribution. Our studies helps us better understand the matter generated by light. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures

arXiv:2406.19605 [pdf, other]

A Customized Augmented Lagrangian Method for Block-Structured Integer Programming

Authors: Rui Wang, Chuwen Zhang, Shanwen Pu, Jianjun Gao, Zaiwen Wen

Abstract: Integer programming with block structures has received considerable attention recently and is widely used in many practical applications such as train timetabling and vehicle routing problems. It is known to be NP-hard due to the presence of integer variables. We define a novel augmented Lagrangian function by directly penalizing the inequality constraints and establish the strong duality between… ▽ More Integer programming with block structures has received considerable attention recently and is widely used in many practical applications such as train timetabling and vehicle routing problems. It is known to be NP-hard due to the presence of integer variables. We define a novel augmented Lagrangian function by directly penalizing the inequality constraints and establish the strong duality between the primal problem and the augmented Lagrangian dual problem. Then, a customized augmented Lagrangian method is proposed to address the block-structures. In particular, the minimization of the augmented Lagrangian function is decomposed into multiple subproblems by decoupling the linking constraints and these subproblems can be efficiently solved using the block coordinate descent method. We also establish the convergence property of the proposed method. To make the algorithm more practical, we further introduce several refinement techniques to identify high-quality feasible solutions. Numerical experiments on a few interesting scenarios show that our proposed algorithm often achieves a satisfactory solution and is quite effective. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.11410 [pdf, other]

HARE: HumAn pRiors, a key to small language model Efficiency

Authors: Lingyun Zhang, Bin jin, Gaojian Ge, Lunhui Liu, Xuewen Shen, Mingyong Wu, Houqian Zhang, Yongneng Jiang, Shiqi Chen, Shi Pu

Abstract: Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale… ▽ More Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale training data, neglecting the proper incorporation of human priors. This oversight limits the training efficiency of language models in resource-constrained settings. In this paper, we propose a principle to leverage human priors for data construction. This principle emphasizes achieving high-performance SLMs by training on a concise dataset that accommodates both semantic diversity and data quality consistency, while avoiding benchmark data leakage. Following this principle, we train an SLM named HARE-1.1B. Extensive experiments on large-scale benchmark datasets demonstrate that HARE-1.1B performs favorably against state-of-the-art SLMs, validating the effectiveness of the proposed principle. Additionally, this provides new insights into efficient language model training in resource-constrained environments from the view of human priors. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.16491 [pdf, other]

doi 10.1103/PhysRevD.111.074020

Nuclear deformation effects in photoproduction of $ρ$ mesons in ultraperipheral isobaric collisions

Authors: Shuo Lin, Jin-Yu Hu, Hao-Jie Xu, Shi Pu, Qun Wang

Abstract: We have investigated the $ρ^{0}$ meson photoproduction in ultraperipheral isobaric collisions between $_{44}^{96}\textrm{Ru}+_{44}^{96}\textrm{Ru}$ and $_{40}^{96}\textrm{Zr}+_{40}^{96}\textrm{Zr}$ at $\sqrt{s_{NN}}=200$ GeV, employing the dipole model with the equivalent photon approximation. By implementing the Woods-Saxon distribution to represent the nuclear mass density, which is derived from… ▽ More We have investigated the $ρ^{0}$ meson photoproduction in ultraperipheral isobaric collisions between $_{44}^{96}\textrm{Ru}+_{44}^{96}\textrm{Ru}$ and $_{40}^{96}\textrm{Zr}+_{40}^{96}\textrm{Zr}$ at $\sqrt{s_{NN}}=200$ GeV, employing the dipole model with the equivalent photon approximation. By implementing the Woods-Saxon distribution to represent the nuclear mass density, which is derived from density functional theory with an inclusion of nuclear deformation effects, we have calculated the transverse momentum $q_{T}$ spectra in isobaric collisions. We observe the characteristic dip behavior in these spectra, indicative of diffraction phenomena in high-energy physics. We notice that the deformation effects cause a nearly linear increase with $q_{T}^{2}$ for $q_{T}^{2}\lesssim0.015$ $\textrm{GeV}^{2}$, aligning with experimental observations. We offer a simple explanation for the observed behavior in these spectra by introducing the effective width of the nuclei in the thickness function. We also extend our discussion on the $ρ^{0}$ meson photoproduction with the targets $^{63}\textrm{Cu}$,$^{197}\textrm{Au}$, and $^{238}\textrm{U}$. △ Less

Submitted 19 April, 2025; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures

arXiv:2405.03105 [pdf, ps, other]

Thermodynamic stability in relativistic viscous and spin hydrodynamics

Authors: Xiang Ren, Chen Yang, Dong-Lin Wang, Shi Pu

Abstract: We have applied thermodynamic stability analysis to derive the stability and causality conditions for conventional relativistic viscous hydrodynamics and spin hydrodynamics. We obtain the thermodynamic stability conditions for second-order relativistic hydrodynamics with shear and bulk viscous tensors, finding them identical to those derived from linear mode analysis. We then derive the thermodyna… ▽ More We have applied thermodynamic stability analysis to derive the stability and causality conditions for conventional relativistic viscous hydrodynamics and spin hydrodynamics. We obtain the thermodynamic stability conditions for second-order relativistic hydrodynamics with shear and bulk viscous tensors, finding them identical to those derived from linear mode analysis. We then derive the thermodynamic stability conditions for minimal causal extended second-order spin hydrodynamics in canonical form, both with and without viscous tensors. Without viscous tensors, the constraints from thermodynamic stability exactly match those from linear mode analysis. In the presence of viscous tensors, the thermodynamic stability imposes more stringent constraints than those obtained from linear mode analysis. Our results suggest that conditions derived from thermodynamic stability analysis can guarantee both causality and stability in linear mode analysis. △ Less

Submitted 11 August, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 30 pages; published version

Journal ref: Phys.Rev.D 110 (2024) 3, 034010

arXiv:2404.05454 [pdf, other]

B-ary Tree Push-Pull Method is Provably Efficient for Distributed Learning on Heterogeneous Data

Authors: Runze You, Shi Pu

Abstract: This paper considers the distributed learning problem where a group of agents cooperatively minimizes the summation of their local cost functions based on peer-to-peer communication. Particularly, we propose a highly efficient algorithm, termed ``B-ary Tree Push-Pull'' (BTPP), that employs two B-ary spanning trees for distributing the information related to the parameters and stochastic gradients… ▽ More This paper considers the distributed learning problem where a group of agents cooperatively minimizes the summation of their local cost functions based on peer-to-peer communication. Particularly, we propose a highly efficient algorithm, termed ``B-ary Tree Push-Pull'' (BTPP), that employs two B-ary spanning trees for distributing the information related to the parameters and stochastic gradients across the network. The simple method is efficient in communication since each agent interacts with at most $(B+1)$ neighbors per iteration. More importantly, BTPP achieves linear speedup for smooth nonconvex and strongly convex objective functions with only $\tilde{O}(n)$ and $\tilde{O}(1)$ transient iterations, respectively, significantly outperforming the state-of-the-art results to the best of our knowledge. Our code is available at https://github.com/ryou98/BTPP. △ Less

Submitted 21 November, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.05172 [pdf, other]

Learning Expressive And Generalizable Motion Features For Face Forgery Detection

Authors: Jingyi Zhang, Peng Zhang, Jingjing Wang, Di Xie, Shiliang Pu

Abstract: Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However… ▽ More Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection. To this end, we propose an effective sequence-based forgery detection framework based on an existing video classification method. To make the motion features more expressive for manipulation detection, we propose an alternative motion consistency block instead of the original motion features module. To make the learned features more generalizable, we propose an auxiliary anomaly detection block. With these two specially designed improvements, we make a general video classification network achieve promising results on three popular face forgery datasets. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted to ICASSP 2023

arXiv:2403.05117 [pdf, other]

Arbitrary-Scale Point Cloud Upsampling by Voxel-Based Network with Latent Geometric-Consistent Learning

Authors: Hang Du, Xuejun Yan, Jingjing Wang, Di Xie, Shiliang Pu

Abstract: Recently, arbitrary-scale point cloud upsampling mechanism became increasingly popular due to its efficiency and convenience for practical applications. To achieve this, most previous approaches formulate it as a problem of surface approximation and employ point-based networks to learn surface representations. However, learning surfaces from sparse point clouds is more challenging, and thus they o… ▽ More Recently, arbitrary-scale point cloud upsampling mechanism became increasingly popular due to its efficiency and convenience for practical applications. To achieve this, most previous approaches formulate it as a problem of surface approximation and employ point-based networks to learn surface representations. However, learning surfaces from sparse point clouds is more challenging, and thus they often suffer from the low-fidelity geometry approximation. To address it, we propose an arbitrary-scale Point cloud Upsampling framework using Voxel-based Network (\textbf{PU-VoxelNet}). Thanks to the completeness and regularity inherited from the voxel representation, voxel-based networks are capable of providing predefined grid space to approximate 3D surface, and an arbitrary number of points can be reconstructed according to the predicted density distribution within each grid cell. However, we investigate the inaccurate grid sampling caused by imprecise density predictions. To address this issue, a density-guided grid resampling method is developed to generate high-fidelity points while effectively avoiding sampling outliers. Further, to improve the fine-grained details, we present an auxiliary training supervision to enforce the latent geometric consistency among local surface patches. Extensive experiments indicate the proposed approach outperforms the state-of-the-art approaches not only in terms of fixed upsampling rates but also for arbitrary-scale upsampling. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted to AAAI 2024. The source code is available at https://github.com/hikvision-research/3DVision

arXiv:2403.02806 [pdf, other]

Atacama Large Aperture Submillimeter Telescope (AtLAST) Science: Surveying the distant Universe

Authors: Eelco van Kampen, Tom Bakx, Carlos De Breuck, Chian-Chou Chen, Helmut Dannerbauer, Benjamin Magnelli, Francisco Miguel Montenegro-Montes, Teppei Okumura, Sy-Yun Pu, Matus Rybak, Amelie Saintonge, Claudia Cicone, Evanthia Hatziminaoglou, Juliette Hilhorst, Pamela Klaassen, Minju Lee, Christopher C. Lovell, Andreas Lundgren, Luca Di Mascolo, Tony Mroczkowski, Laura Sommovigo, Mark Booth, Martin A. Cordiner, Rob Ivison, Doug Johnstone , et al. (5 additional authors not shown)

Abstract: During the most active period of star formation in galaxies, which occurs in the redshift range 1<z<3, strong bursts of star formation result in significant quantities of dust, which obscures new stars being formed as their UV/optical light is absorbed and then re-emitted in the infrared, which redshifts into the mm/sub-mm bands for these early times. To get a complete picture of the high-z galaxy… ▽ More During the most active period of star formation in galaxies, which occurs in the redshift range 1<z<3, strong bursts of star formation result in significant quantities of dust, which obscures new stars being formed as their UV/optical light is absorbed and then re-emitted in the infrared, which redshifts into the mm/sub-mm bands for these early times. To get a complete picture of the high-z galaxy population, we need to survey a large patch of the sky in the sub-mm with sufficient angular resolution to resolve all galaxies, but we also need the depth to fully sample their cosmic evolution, and therefore obtain their redshifts using direct mm spectroscopy with a very wide frequency coverage. This requires a large single-dish sub-mm telescope with fast mapping speeds at high sensitivity and angular resolution, a large bandwidth with good spectral resolution and multiplex spectroscopic capabilities. The proposed 50-m Atacama Large Aperture Submillimeter Telescope (AtLAST) will deliver these specifications. We discuss how AtLAST allows us to study the whole population of high-z galaxies, including the dusty star-forming ones which can only be detected and studied in the sub-mm, and obtain a wealth of information for each of these up to z~7: gas content, cooling budget, star formation rate, dust mass, and dust temperature. We present worked examples of surveys that AtLAST can perform, both deep and wide, and also focused on galaxies in proto-clusters. In addition we show how such surveys with AtLAST can measure the growth rate and the Hubble constant with high accuracy, and demonstrate the power of the line-intensity mapping method in the mm/sub-mm wavebands to constrain the cosmic expansion history at high redshifts, as good examples of what can uniquely be done by AtLAST in this research field. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 17 pages, 10 figures, submitted to Open Research Europe as part of the AtLAST collection

arXiv:2403.00258 [pdf, ps, other]

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Authors: Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, Shiliang Pu, Robert C. Qiu, Zhenyu Liao

Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, s… ▽ More Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 32 pages, 4 figures, and 2 tables. Fixing typos in Theorems 1 and 2 from NeurIPS 2022 proceeding (https://proceedings.neurips.cc/paper_files/paper/2022/hash/185087ea328b4f03ea8fd0c8aa96f747-Abstract-Conference.html)

arXiv:2402.18627 [pdf, other]

Topologically protected emergent Fermi surface in an Abrikosov vortex lattice

Authors: Songyang Pu, Jay D. Sau, Rui-Xing Zhang

Abstract: We show that a three-dimensional (3D) fully gapped type-II superconductor can feature emergent in-gap Fermi surfaces of Caroli-de Gennes Matricon (CdGM) quasiparticles in the presence of an Abrikosov vortex lattice. In particular, these CdGM Fermi surfaces manifest in the emergent 3D band structure enabled by the intervortex tunneling physics, and their stability is guaranteed by a $\mathbb{Z}_2$… ▽ More We show that a three-dimensional (3D) fully gapped type-II superconductor can feature emergent in-gap Fermi surfaces of Caroli-de Gennes Matricon (CdGM) quasiparticles in the presence of an Abrikosov vortex lattice. In particular, these CdGM Fermi surfaces manifest in the emergent 3D band structure enabled by the intervortex tunneling physics, and their stability is guaranteed by a $\mathbb{Z}_2$ topological index. By developing an effective analytical theory, we find that each vortex line carrying a 1D nodal dispersion is a sufficient condition for the vortex lattice to form CdGM Fermi surfaces. Following this prediction, in-gap CdGM Fermi surfaces are numerically confirmed in a microscopic vortex-lattice simulation of a superconducting Dirac semimetal with an $s$-wave spin-singlet pairing, which is directly applicable to a large class of type-II superconductors such as LiFeAs. Remarkably, the CdGM Fermi surfaces persist even when the normal state is deformed to a doped insulator of trivial band topology. Our work establishes the vortex lattice as a new experimentally feasible control knob for emergent topological phenomena in conventional superconductors. △ Less

Submitted 17 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 6 + 9 pages, 3 + 6 figures

arXiv:2402.17294 [pdf, ps, other]

Advancing Continuous Distribution Generation: An Exponentiated Odds Ratio Generator Approach

Authors: Xinyu Chen, Yuanqi Xie, Achraf Cohen, Shusen Pu

Abstract: This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of distribution models to effectively address the complexities inherent in contemporary datasets. The core of this advancement is illustrated by introducing a particul… ▽ More This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of distribution models to effectively address the complexities inherent in contemporary datasets. The core of this advancement is illustrated by introducing a particular subfamily, the "Type-2 Gumbel Weibull-G Family of Distributions." We provide a comprehensive analysis of the mathematical properties of these distributions, encompassing statistical properties such as density functions, moments, hazard rate and quantile functions, Rényi entropy, order statistics, and the concept of stochastic ordering. To establish the robustness of our approach, we apply five distinct methods for parameter estimation. The practical applicability of the Type-2 Gumbel Weibull-G distributions is further supported through the analysis of three real-world datasets. These empirical applications illustrate the exceptional statistical precision of our distributions compared to existing models, thereby reinforcing their significant value in both theoretical and practical statistical applications. △ Less

Submitted 27 February, 2024; originally announced February 2024.

MSC Class: 62E99; 60E05

arXiv:2402.09714 [pdf, other]

An Accelerated Distributed Stochastic Gradient Method with Momentum

Authors: Kun Huang, Shi Pu, Angelia Nedić

Abstract: In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum trac… ▽ More In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method. We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as $\mathcal{O}(n^{5/3}/(1-λ))$ for minimizing general smooth objective functions, and $\mathcal{O}(\sqrt{n/(1-λ)})$ under the Polyak-Łojasiewicz (PL) condition. Here, the term $1-λ$ denotes the spectral gap of the mixing matrix related to the underlying network topology. Notably, the obtained results do not rely on multiple inter-node communications or stochastic gradient accumulation per iteration, and the transient times are the shortest under the setting to the best of our knowledge. △ Less

Submitted 26 March, 2025; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 45 pages, 5 figures

arXiv:2402.04540 [pdf, other]

Spin polarization in relativistic heavy-ion collisions

Authors: Francesco Becattini, Matteo Buzzegoli, Takafumi Niida, Shi Pu, Ai-Hong Tang, Qun Wang

Abstract: Polarization has opened a new physics chapter in relativistic heavy-ion collisions. Since the first prediction and experimental observation of global spin polarization, a lot of progress has been made in understanding its features, both at experimental and theoretical level. In this paper, we give an overview on the recent advances in this field. The covered topics include a review of measurements… ▽ More Polarization has opened a new physics chapter in relativistic heavy-ion collisions. Since the first prediction and experimental observation of global spin polarization, a lot of progress has been made in understanding its features, both at experimental and theoretical level. In this paper, we give an overview on the recent advances in this field. The covered topics include a review of measurements of global and local spin polarization of hyperons and the global spin alignment of vector mesons. We account for the basic theoretical framework to describe spin polarization in a relativistic fluid such as the Quark Gluon Plasma, including statistical quantum field theory and local thermodynamic equilibrium, spin hydrodynamics, relativistic kinetic theory with spin and coalescence models. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: RevTeX 4, 41 pages, 12 figures, review article as a book chapter for QGP6

arXiv:2402.03672 [pdf, other]

The spin alignment of rho mesons in a pion gas

Authors: Yi-Liang Yin, Wen-Bo Dong, Jin-Yi Pang, Shi Pu, Qun Wang

Abstract: We study the spin alignment of neutral rho mesons in a pion gas using spin kinetic or Boltzmann equations. The $ρππ$ coupling is given by the chiral effective theory. The collision terms at the leading and next-to-leading order in spin Boltzmann equations are derived. The evolution of the spin density matrix of the neutral rho meson is simulated with different initial conditions. The numerical res… ▽ More We study the spin alignment of neutral rho mesons in a pion gas using spin kinetic or Boltzmann equations. The $ρππ$ coupling is given by the chiral effective theory. The collision terms at the leading and next-to-leading order in spin Boltzmann equations are derived. The evolution of the spin density matrix of the neutral rho meson is simulated with different initial conditions. The numerical results show that the interaction of pions and neutral rho mesons creates very small spin alignment in the central rapidity region if there is no rho meson in the system at the initial time. Such a small spin alignment in the central rapidity region will decay rapidly toward zero in later time. If there are rho mesons with a sizable spin alignment at the initial time the spin alignment will also decrease rapidly. We also considered the effect on $ρ_{00}$ from the elliptic flow of pions in the blast wave model. With vanishing spin alignment at the initial time, the deviation of $ρ_{00}$ from 1/3 is positive but very small. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: RevTex 4, 17 pages, 12 figures

arXiv:2401.17352 [pdf, other]

doi 10.1103/PhysRevLett.132.236503

Microscopic Model for Fractional Quantum Hall Nematics

Authors: Songyang Pu, Ajit C. Balram, Joseph Taylor, Eduardo Fradkin, Zlatko Papić

Abstract: Geometric fluctuations of the density mode in a fractional quantum Hall (FQH) state can give rise to a nematic FQH phase, a topological state with a spontaneously broken rotational symmetry. While experiments on FQH states in the second Landau level have reported signatures of putative FQH nematics in anisotropic transport, a realistic model for this state has been lacking. We show that the standa… ▽ More Geometric fluctuations of the density mode in a fractional quantum Hall (FQH) state can give rise to a nematic FQH phase, a topological state with a spontaneously broken rotational symmetry. While experiments on FQH states in the second Landau level have reported signatures of putative FQH nematics in anisotropic transport, a realistic model for this state has been lacking. We show that the standard model of particles in the lowest Landau level interacting via the Coulomb potential realizes the FQH nematic transition, which is reached by a progressive reduction of the strength of the shortest-range Haldane pseudopotential. Using exact diagonalization and variational wave functions, we demonstrate that the FQH nematic transition occurs when the system's neutral gap closes in the long-wavelength limit while the charge gap remains open. We confirm the symmetry-breaking nature of the transition by demonstrating the existence of a "circular moat" potential in the manifold of states with broken rotational symmetry, while its geometric character is revealed through the strong fluctuations of the nematic susceptibility and Hall viscosity. △ Less

Submitted 9 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Four figures in main text with supplementary information

Journal ref: Phys. Rev. Lett. 132, 236503 (2024)

arXiv:2401.09703 [pdf, other]

Fast Updating Truncated SVD for Representation Learning with Sparse Matrices

Authors: Haoran Deng, Yang Yang, Jiahe Li, Cheng Chen, Weihao Jiang, Shiliang Pu

Abstract: Updating a truncated Singular Value Decomposition (SVD) is crucial in representation learning, especially when dealing with large-scale data matrices that continuously evolve in practical scenarios. Aligning SVD-based models with fast-paced updates becomes increasingly important. Existing methods for updating truncated SVDs employ Rayleigh-Ritz projection procedures, where projection matrices are… ▽ More Updating a truncated Singular Value Decomposition (SVD) is crucial in representation learning, especially when dealing with large-scale data matrices that continuously evolve in practical scenarios. Aligning SVD-based models with fast-paced updates becomes increasingly important. Existing methods for updating truncated SVDs employ Rayleigh-Ritz projection procedures, where projection matrices are augmented based on original singular vectors. However, these methods suffer from inefficiency due to the densification of the update matrix and the application of the projection to all singular vectors. To address these limitations, we introduce a novel method for dynamically approximating the truncated SVD of a sparse and temporally evolving matrix. Our approach leverages sparsity in the orthogonalization process of augmented matrices and utilizes an extended decomposition to independently store projections in the column space of singular vectors. Numerical experiments demonstrate a remarkable efficiency improvement of an order of magnitude compared to previous methods. Remarkably, this improvement is achieved while maintaining a comparable precision to existing approaches. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.09979 [pdf, other]

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increase… ▽ More Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this challenge, we propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network, like a plugin version of Mixture of Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks, to alleviate world knowledge-edge forgetting. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM. △ Less

Submitted 8 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 14 pages, 7 figures

arXiv:2312.09068 [pdf, other]

Global and local polarization of $Λ$ hyperons across RHIC-BES energies

Authors: Xiang-Yu Wu, Cong Yi, Guang-You Qin, Shi Pu

Abstract: We report our recent study on the global and local polarization of $Λ$ hyperons in Au+Au collisions at RHIC-BES energies within the (3+1)-dimensional CLVisc hydrodynamics framework. We present our numerical results for the global polarization as the function of collision energies and the local polarization along the beam direction as functions of azimuthal angle in $20-50$% centrality at… ▽ More We report our recent study on the global and local polarization of $Λ$ hyperons in Au+Au collisions at RHIC-BES energies within the (3+1)-dimensional CLVisc hydrodynamics framework. We present our numerical results for the global polarization as the function of collision energies and the local polarization along the beam direction as functions of azimuthal angle in $20-50$% centrality at $\sqrt{s_{NN}}$=7.7 GeV Au+Au collision energy. We have discussed the effects of initial conditions, Spin Hall effect and baryon diffusion. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 4 pages, 3 figures. Contribution to the proceedings of Quark Matter 2023 (Houston, TX, 3-9 Sep. 2023)

arXiv:2312.06779 [pdf, other]

doi 10.1103/PhysRevB.110.L081107

Fingerprints of Composite Fermion Lambda Levels in Scanning Tunneling Microscopy

Authors: Songyang Pu, Ajit C. Balram, Yuwen Hu, Yen-Chen Tsui, Minhao He, Nicolas Regnault, Michael P. Zaletel, Ali Yazdani, Zlatko Papić

Abstract: Composite fermion (CF) is a topological quasiparticle that emerges from a non-perturbative attachment of vortices to electrons in strongly correlated two-dimensional materials. Similar to non-interacting fermions that form Landau levels in a magnetic field, CFs can fill analogous ``Lambda'' levels, giving rise to the fractional quantum Hall (FQH) effect of electrons. Here, we show that Lambda leve… ▽ More Composite fermion (CF) is a topological quasiparticle that emerges from a non-perturbative attachment of vortices to electrons in strongly correlated two-dimensional materials. Similar to non-interacting fermions that form Landau levels in a magnetic field, CFs can fill analogous ``Lambda'' levels, giving rise to the fractional quantum Hall (FQH) effect of electrons. Here, we show that Lambda levels can be directly visualized through the characteristic peak structure in the signal obtained via spectroscopy with the scanning tunneling microscopy (STM) on a FQH state. Complementary to transport, which probes low-energy properties of CFs, we show that \emph{high-energy} features in STM spectra can be interpreted in terms of Lambda levels. We numerically demonstrate that STM spectra can be accurately modeled using Jain's CF theory. Our results show that STM provides a powerful tool for revealing the anatomy of FQH states and identifying physics beyond the non-interacting CF paradigm. △ Less

Submitted 15 August, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Seven figures including supplementary materials

Journal ref: Phys. Rev. B 110, L081107 (2024)

arXiv:2311.15197 [pdf, other]

Spin polarization and spin alignment from quantum kinetic theory with self-energy corrections

Authors: Shuo Fang, Shi Pu, Di-Lun Yang

Abstract: We derive the quantum kinetic theory for massive fermions with collision terms and self-energy corrections based on quantum field theory. We adopt an effective power counting scheme with $\hbar$ expansion to obtain the leading-order perturbative solutions of the vector and axial Wigner functions and the corresponding kinetic equations. We observe that both the onshell relation and the structure of… ▽ More We derive the quantum kinetic theory for massive fermions with collision terms and self-energy corrections based on quantum field theory. We adopt an effective power counting scheme with $\hbar$ expansion to obtain the leading-order perturbative solutions of the vector and axial Wigner functions and the corresponding kinetic equations. We observe that both the onshell relation and the structure of Wigner functions, along with the kinetic equations, are modified due to the presence of self-energies and their space-time gradients. We further apply our formalism to investigate the spin polarization phenomena in relativistic heavy ion collisions and derive the modification to the spin polarization spectrum of massive quarks. We find that the gradient of vector self-energy plays a similar role to the background electromagnetic fields, which induces a more dominant contribution than the collisional effects by a naive power counting in the gradient expansion and weak coupling. Our findings could further modify the spin polarization of strange quarks and spin alignment of $φ$ mesons beyond local thermal equilibrium. △ Less

Submitted 27 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 52 pages, 1 table

Journal ref: Phys.Rev.D 109 (2024) 3, 034034

arXiv:2310.08298 [pdf, other]

MProto: Multi-Prototype Network with Denoised Optimal Transport for Distantly Supervised Named Entity Recognition

Authors: Shuhui Wu, Yongliang Shen, Zeqi Tan, Wenqi Ren, Jietian Guo, Shiliang Pu, Weiming Lu

Abstract: Distantly supervised named entity recognition (DS-NER) aims to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus. However, distant annotations are noisy and degrade the performance of NER models. In this paper, we propose a noise-robust prototype network named MProto for the DS-NER task. Different from previous prototype-based NER methods,… ▽ More Distantly supervised named entity recognition (DS-NER) aims to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus. However, distant annotations are noisy and degrade the performance of NER models. In this paper, we propose a noise-robust prototype network named MProto for the DS-NER task. Different from previous prototype-based NER methods, MProto represents each entity type with multiple prototypes to characterize the intra-class variance among entity representations. To optimize the classifier, each token should be assigned an appropriate ground-truth prototype and we consider such token-prototype assignment as an optimal transport (OT) problem. Furthermore, to mitigate the noise from incomplete labeling, we propose a novel denoised optimal transport (DOT) algorithm. Specifically, we utilize the assignment result between Other class tokens and all prototypes to distinguish unlabeled entity tokens from true negatives. Experiments on several DS-NER benchmarks demonstrate that our MProto achieves state-of-the-art performance. The source code is now available on Github. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP-2023, camera ready version

arXiv:2309.11708 [pdf, other]

Stability and causality criteria in linear mode analysis: stability means causality

Authors: Dong-Lin Wang, Shi Pu

Abstract: Causality and stability are fundamental requirements for the differential equations describing predictable relativistic many-body systems. In this work, we investigate the stability and causality criteria in linear mode analysis. We discuss the updated stability criterion in 3+1 dimensional systems and introduce the improved sufficient criterion for causality. Our findings clearly demonstrate that… ▽ More Causality and stability are fundamental requirements for the differential equations describing predictable relativistic many-body systems. In this work, we investigate the stability and causality criteria in linear mode analysis. We discuss the updated stability criterion in 3+1 dimensional systems and introduce the improved sufficient criterion for causality. Our findings clearly demonstrate that stability implies causality in linear mode analysis. Furthermore, based on the theorems present in this work, we conclude that if updated stability criterion and improved causality criterion are fulfilled in one inertial frame of reference (IFR), they hold for all IFR. △ Less

Submitted 20 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: 6+8 pages, 1 figure; references added, typos corrected

Journal ref: Phys. Rev. D 109, L031504 (2024)

arXiv:2309.04527 [pdf, other]

doi 10.1103/PhysRevResearch.6.013105

Deformed Fredkin model for the $ν{=}5/2$ Moore-Read state on thin cylinders

Authors: Cristian Voinea, Songyang Pu, Ammar Kirmani, Pouyan Ghaemi, Armin Rahmani, Zlatko Papić

Abstract: We propose a frustration-free model for the Moore-Read quantum Hall state on sufficiently thin cylinders with circumferences $\lesssim 7$ magnetic lengths. While the Moore-Read Hamiltonian involves complicated long-range interactions between triplets of electrons in a Landau level, our effective model is a simpler one-dimensional chain of qubits with deformed Fredkin gates. We show that the ground… ▽ More We propose a frustration-free model for the Moore-Read quantum Hall state on sufficiently thin cylinders with circumferences $\lesssim 7$ magnetic lengths. While the Moore-Read Hamiltonian involves complicated long-range interactions between triplets of electrons in a Landau level, our effective model is a simpler one-dimensional chain of qubits with deformed Fredkin gates. We show that the ground state of the Fredkin model has high overlap with the Moore-Read wave function and accurately reproduces the latter's entanglement properties. Moreover, we demonstrate that the model captures the dynamical response of the Moore-Read state to a geometric quench, induced by suddenly changing the anisotropy of the system. We elucidate the underlying mechanism of the quench dynamics and show that it coincides with the linearized bimetric field theory. The minimal model introduced here can be directly implemented as a first step towards quantum simulation of the Moore-Read state, as we demonstrate by deriving an efficient circuit approximation to the ground state and implementing it on IBM quantum processor. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 18 pages, 15 figures

Journal ref: Phys. Rev. Research 6, 013105 (2024)

arXiv:2308.14038 [pdf, other]

Momentum dependence of $φ$ meson's spin alignment

Authors: Xin-Li Sheng, Shi Pu, Qun Wang

Abstract: We study the rapidity and azimuthal angle dependences of the global spin alignment $ρ_{00}$ for $φ$ mesons with respect to the reaction plane in Au+Au collisions at RHIC by the relativistic coalescence model in the spin transport theory. The global spin alignment of $φ$ mesons arises from local fluctuations of strong force fields whose values are extracted from the STAR's data. The calculated resu… ▽ More We study the rapidity and azimuthal angle dependences of the global spin alignment $ρ_{00}$ for $φ$ mesons with respect to the reaction plane in Au+Au collisions at RHIC by the relativistic coalescence model in the spin transport theory. The global spin alignment of $φ$ mesons arises from local fluctuations of strong force fields whose values are extracted from the STAR's data. The calculated results show that $ρ_{00}<1/3$ at the rapidity $Y=0$, and then it increases with rapidity and becomes $ρ_{00}>1/3$ at $Y=1$. Such a rapidity dependence is dominated by the relative motion of the $φ$ meson in the bulk matter. We also give prediction for the azimuthal angle dependence of $ρ_{00}$ at different rapidities. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: RevTex 4, 5 pages, 4 figures

arXiv:2308.08171 [pdf, other]

doi 10.1609/aaai.v38i8.28646

Learning to Pivot as a Smart Expert

Authors: Tianhao Liu, Shanwen Pu, Dongdong Ge, Yinyu Ye

Abstract: Linear programming has been practically solved mainly by simplex and interior point methods. Compared with the weakly polynomial complexity obtained by the interior point methods, the existence of strongly polynomial bounds for the length of the pivot path generated by the simplex methods remains a mystery. In this paper, we propose two novel pivot experts that leverage both global and local infor… ▽ More Linear programming has been practically solved mainly by simplex and interior point methods. Compared with the weakly polynomial complexity obtained by the interior point methods, the existence of strongly polynomial bounds for the length of the pivot path generated by the simplex methods remains a mystery. In this paper, we propose two novel pivot experts that leverage both global and local information of the linear programming instances for the primal simplex method and show their excellent performance numerically. The experts can be regarded as a benchmark to evaluate the performance of classical pivot rules, although they are hard to directly implement. To tackle this challenge, we employ a graph convolutional neural network model, trained via imitation learning, to mimic the behavior of the pivot expert. Our pivot rule, learned empirically, displays a significant advantage over conventional methods in various linear programming problems, as demonstrated through a series of rigorous experiments. △ Less

Submitted 31 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Showing 1–50 of 265 results for author: Pu, S