Search | arXiv e-print repository

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Authors: Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient est… ▽ More While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient estimates in zeroth order optimization often depends on the data dimensionality, potentially explaining why MeZO still exhibits significant performance drops compared to standard fine-tuning across various tasks. Inspired by the success of Parameter-Efficient Fine-Tuning (PEFT), this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additionally, we develop a memory-optimized implementation for sparse masking, ensuring the algorithm requires only inference-level memory consumption, allowing Sparse-MeZO to fine-tune LLaMA-30b on a single A100 GPU. Experimental results illustrate that Sparse-MeZO consistently improves both performance and convergence speed over MeZO without any overhead. For example, it achieves a 9\% absolute accuracy improvement and 3.5x speedup over MeZO on the RTE task. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.15410 [pdf, other]

Detailed Report on the Measurement of the Positive Muon Anomalous Magnetic Moment to 0.20 ppm

Authors: D. P. Aguillard, T. Albahri, D. Allspach, A. Anisenkov, K. Badgley, S. Baeßler, I. Bailey, L. Bailey, V. A. Baranov, E. Barlas-Yucel, T. Barrett, E. Barzi, F. Bedeschi, M. Berz, M. Bhattacharya, H. P. Binney, P. Bloom, J. Bono, E. Bottalico, T. Bowcock, S. Braun, M. Bressler, G. Cantatore, R. M. Carey, B. C. K. Casey , et al. (168 additional authors not shown)

Abstract: We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference b… ▽ More We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference between the muon spin precession frequency and its cyclotron frequency. This difference is normalized to the strength of the magnetic field, measured using Nuclear Magnetic Resonance (NMR). The ratio is then corrected for small contributions from beam motion, beam dispersion, and transient magnetic fields. We measure $a_μ= 116 592 057 (25) \times 10^{-11}$ (0.21 ppm). This is the world's most precise measurement of this quantity and represents a factor of $2.2$ improvement over our previous result based on the 2018 dataset. In combination, the two datasets yield $a_μ(\text{FNAL}) = 116 592 055 (24) \times 10^{-11}$ (0.20 ppm). Combining this with the measurements from Brookhaven National Laboratory for both positive and negative muons, the new world average is $a_μ$(exp) $ = 116 592 059 (22) \times 10^{-11}$ (0.19 ppm). △ Less

Submitted 22 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 48 pages, 29 figures; 4 pages of Supplement Material; version accepted for publication in Physical Review D

Report number: FERMILAB-PUB-24-0084-AD-CSAID-PPD

arXiv:2402.12928 [pdf, other]

A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence

Authors: Penghai Zhao, Xin Zhang, Jiayue Cao, Ming-Ming Cheng, Jian Yang, Xiang Li

Abstract: The rapid advancements in Pattern Analysis and Machine Intelligence (PAMI) have led to an overwhelming expansion of scientific knowledge, spawning numerous literature reviews aimed at collecting and synthesizing fragmented information. This paper presents a thorough analysis of these literature reviews within the PAMI field, and tries to address three core research questions: (1) What are the prev… ▽ More The rapid advancements in Pattern Analysis and Machine Intelligence (PAMI) have led to an overwhelming expansion of scientific knowledge, spawning numerous literature reviews aimed at collecting and synthesizing fragmented information. This paper presents a thorough analysis of these literature reviews within the PAMI field, and tries to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews? (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews? (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones? To address the first research question, we begin with a narrative overview to highlight common preferences in composing PAMI reviews, followed by a statistical analysis to quantitatively uncover patterns in these preferences. Our findings reveal several key insights. First, fewer than 20% of PAMI reviews currently comply with PRISMA standards, although this proportion is gradually increasing. Second, there is a moderate positive correlation between the quality of references and the scholarly impact of reviews, emphasizing the importance of reference selection. To further assist researchers in efficiently managing the rapidly growing number of literature reviews, we introduce four novel, real-time, article-level bibliometric indicators that facilitate the screening of numerous reviews. Finally, our comparative analysis reveals that AI-generated reviews currently fall short of human-authored ones in accurately evaluating the academic significance of newly published articles and integrating rich visual elements, which limits their practical utility. Overall, this study provides a deeper understanding of PAMI literature reviews by uncovering key trends, evaluating current practices, and highlighting areas for future improvement. △ Less

Submitted 14 December, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: V2, V3, and V4 with incremental quality improvements. V5 introduces major updates, featuring 27 pages, 16 figures, and 12 tables

arXiv:2402.12741 [pdf, other]

MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion

Authors: Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou

Abstract: Existing text-to-image models still struggle to generate images of multiple objects, especially in handling their spatial positions, relative sizes, overlapping, and attribute bindings. To efficiently address these challenges, we develop a training-free Multimodal-LLM agent (MuLan), as a human painter, that can progressively generate multi-object with intricate planning and feedback control. MuLan… ▽ More Existing text-to-image models still struggle to generate images of multiple objects, especially in handling their spatial positions, relative sizes, overlapping, and attribute bindings. To efficiently address these challenges, we develop a training-free Multimodal-LLM agent (MuLan), as a human painter, that can progressively generate multi-object with intricate planning and feedback control. MuLan harnesses a large language model (LLM) to decompose a prompt to a sequence of sub-tasks, each generating only one object by stable diffusion, conditioned on previously generated objects. Unlike existing LLM-grounded methods, MuLan only produces a high-level plan at the beginning while the exact size and location of each object are determined upon each sub-task by an LLM and attention guidance. Moreover, MuLan adopts a vision-language model (VLM) to provide feedback to the image generated in each sub-task and control the diffusion model to re-generate the image if it violates the original prompt. Hence, each model in every step of MuLan only needs to address an easy sub-task it is specialized for. The multi-step process also allows human users to monitor the generation process and make preferred changes at any intermediate step via text prompts, thereby improving the human-AI collaboration experience. We collect 200 prompts containing multi-objects with spatial relationships and attribute bindings from different benchmarks to evaluate MuLan. The results demonstrate the superiority of MuLan in generating multiple objects over baselines and its creativity when collaborating with human users. The code is available at https://github.com/measure-infinity/mulan-code. △ Less

Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Added the application to human-agent interaction; added discussion with concurrent work

arXiv:2402.11241 [pdf, other]

DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model

Authors: Yu Feng, Xing Shi, Mengli Cheng, Yun Xiong

Abstract: As the task of 2D-to-3D reconstruction has gained significant attention in various real-world scenarios, it becomes crucial to be able to generate high-quality point clouds. Despite the recent success of deep learning models in generating point clouds, there are still challenges in producing high-fidelity results due to the disparities between images and point clouds. While vision transformers (Vi… ▽ More As the task of 2D-to-3D reconstruction has gained significant attention in various real-world scenarios, it becomes crucial to be able to generate high-quality point clouds. Despite the recent success of deep learning models in generating point clouds, there are still challenges in producing high-fidelity results due to the disparities between images and point clouds. While vision transformers (ViT) and diffusion models have shown promise in various vision tasks, their benefits for reconstructing point clouds from images have not been demonstrated yet. In this paper, we first propose a neat and powerful architecture called DiffPoint that combines ViT and diffusion models for the task of point cloud reconstruction. At each diffusion step, we divide the noisy point clouds into irregular patches. Then, using a standard ViT backbone that treats all inputs as tokens (including time information, image embeddings, and noisy patches), we train our model to predict target points based on input images. We evaluate DiffPoint on both single-view and multi-view reconstruction tasks and achieve state-of-the-art results. Additionally, we introduce a unified and flexible feature fusion module for aggregating image features from single or multiple input images. Furthermore, our work demonstrates the feasibility of applying unified architectures across languages and images to improve 3D reconstruction tasks. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11129 [pdf, other]

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

Authors: Haoyu Wang, Ruirui Li, Haoming Jiang, Jinjin Tian, Zhengyang Wang, Chen Luo, Xianfeng Tang, Monica Cheng, Tuo Zhao, Jing Gao

Abstract: Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios. However, these methods often face challenges with complex inputs and encounter difficulties due to noisy knowledge retrieval, notably hindering model effectiveness. To address this issue, we introduce BlendFilter, a novel approach that elevates retrieval-augmen… ▽ More Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios. However, these methods often face challenges with complex inputs and encounter difficulties due to noisy knowledge retrieval, notably hindering model effectiveness. To address this issue, we introduce BlendFilter, a novel approach that elevates retrieval-augmented LLMs by integrating query generation blending with knowledge filtering. BlendFilter proposes the blending process through its query generation method, which integrates both external and internal knowledge augmentation with the original query, ensuring comprehensive information gathering. Additionally, our distinctive knowledge filtering module capitalizes on the intrinsic capabilities of the LLM, effectively eliminating extraneous data. We conduct extensive experiments on three open-domain question answering benchmarks, and the findings clearly indicate that our innovative BlendFilter surpasses state-of-the-art baselines significantly. △ Less

Submitted 15 October, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: EMNLP 2024 main

arXiv:2402.02056 [pdf, other]

AnthroScore: A Computational Linguistic Measure of Anthropomorphism

Authors: Myra Cheng, Kristina Gligoric, Tiziano Piccardi, Dan Jurafsky

Abstract: Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology. We present AnthroScore, an automatic metric of implicit anthropomorphism in language. We use a masked language model to quantify how non-human entities are implicitly framed as human by the surrounding context. We show that AnthroScor… ▽ More Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology. We present AnthroScore, an automatic metric of implicit anthropomorphism in language. We use a masked language model to quantify how non-human entities are implicitly framed as human by the surrounding context. We show that AnthroScore corresponds with human judgments of anthropomorphism and dimensions of anthropomorphism described in social science literature. Motivated by concerns of misleading anthropomorphism in computer science discourse, we use AnthroScore to analyze 15 years of research papers and downstream news articles. In research papers, we find that anthropomorphism has steadily increased over time, and that papers related to language models have the most anthropomorphism. Within ACL papers, temporal increases in anthropomorphism are correlated with key neural advancements. Building upon concerns of scientific misinformation in mass media, we identify higher levels of anthropomorphism in news headlines compared to the research papers they cite. Since AnthroScore is lexicon-free, it can be directly applied to a wide range of text sources. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: EACL 2024 Main Conference

arXiv:2401.17357 [pdf, other]

doi 10.1103/PhysRevX.15.011069

Mixed-state quantum anomaly and multipartite entanglement

Authors: Leonardo A. Lessa, Meng Cheng, Chong Wang

Abstract: Quantum entanglement measures of many-body states have been increasingly useful to characterize phases of matter. Here we explore a surprising connection between mixed state entanglement and 't Hooft anomaly. More specifically, we consider lattice systems in $d$ space dimensions with anomalous symmetry $G$ where the anomaly is characterized by an invariant in the group cohomology… ▽ More Quantum entanglement measures of many-body states have been increasingly useful to characterize phases of matter. Here we explore a surprising connection between mixed state entanglement and 't Hooft anomaly. More specifically, we consider lattice systems in $d$ space dimensions with anomalous symmetry $G$ where the anomaly is characterized by an invariant in the group cohomology $H^{d+2}(G,U(1))$. We show that any mixed state $ρ$ that is strongly symmetric under $G$, in the sense that $Gρ\proptoρ$, is necessarily $(d+2)$-nonseparable, i.e. is not the mixture of tensor products of $d+2$ states in the Hilbert space. Furthermore, such states cannot be prepared from any $(d+2)$-separable states using finite-depth local quantum channels, so the nonseparability is long-ranged in nature. We provide proof of these results in $d\leq1$, and plausibility arguments in $d>1$. The anomaly-nonseparability connection thus allows us to generate simple examples of mixed states with nontrivial long-ranged multipartite entanglement. In particular, in $d=1$ we found an example of intrinsically mixed quantum phase, in the sense that states in this phase cannot be two-way connected to any pure state through finite-depth local quantum channels. We also analyze mixed anomaly involving both strong and weak symmetries, including systems constrained by the Lieb-Schultz-Mattis type of anomaly. We find that, while strong-weak mixed anomaly in general does not constrain quantum entanglement, it does constrain long-range correlations of mixed states in nontrivial ways. Namely, such states are not symmetrically invertible and not gapped Markovian, generalizing familiar properties of anomalous pure states. △ Less

Submitted 29 November, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 27 pages, 9 figures; New results on strong-weak mixed anomaly and other revisions

Journal ref: Phys. Rev. X 15, 011069 (2025)

arXiv:2401.17172 [pdf, other]

doi 10.1016/j.cma.2024.116779

Learning Domain-Independent Green's Function For Elliptic Partial Differential Equations

Authors: Pawan Negi, Maggie Cheng, Mahesh Krishnamurthy, Wenjun Ying, Shuwang Li

Abstract: Green's function characterizes a partial differential equation (PDE) and maps its solution in the entire domain as integrals. Finding the analytical form of Green's function is a non-trivial exercise, especially for a PDE defined on a complex domain or a PDE with variable coefficients. In this paper, we propose a novel boundary integral network to learn the domain-independent Green's function, ref… ▽ More Green's function characterizes a partial differential equation (PDE) and maps its solution in the entire domain as integrals. Finding the analytical form of Green's function is a non-trivial exercise, especially for a PDE defined on a complex domain or a PDE with variable coefficients. In this paper, we propose a novel boundary integral network to learn the domain-independent Green's function, referred to as BIN-G. We evaluate the Green's function in the BIN-G using a radial basis function (RBF) kernel-based neural network. We train the BIN-G by minimizing the residual of the PDE and the mean squared errors of the solutions to the boundary integral equations for prescribed test functions. By leveraging the symmetry of the Green's function and controlling refinements of the RBF kernel near the singularity of the Green function, we demonstrate that our numerical scheme enables fast training and accurate evaluation of the Green's function for PDEs with variable coefficients. The learned Green's function is independent of the domain geometries, forcing terms, and boundary conditions in the boundary integral formulation. Numerical experiments verify the desired properties of the method and the expected accuracy for the two-dimensional Poisson and Helmholtz equations with variable coefficients. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.16181 [pdf, other]

On Decentralized Linearly Separable Computation With the Minimum Computation Cost

Authors: Haoning Chen, Minquan Cheng, Zhenhao Huang, Youlong Wu

Abstract: The distributed linearly separable computation problem finds extensive applications across domains such as distributed gradient coding, distributed linear transform, real-time rendering, etc. In this paper, we investigate this problem in a fully decentralized scenario, where $\mathsf{N}$ workers collaboratively perform the computation task without a central master. Each worker aims to compute a li… ▽ More The distributed linearly separable computation problem finds extensive applications across domains such as distributed gradient coding, distributed linear transform, real-time rendering, etc. In this paper, we investigate this problem in a fully decentralized scenario, where $\mathsf{N}$ workers collaboratively perform the computation task without a central master. Each worker aims to compute a linearly separable computation that can be manifested as $\mathsf{K}_{\mathrm{c}}$ linear combinations of $\mathsf{K}$ messages, where each message is a function of a distinct dataset. We require that each worker successfully fulfill the task based on the transmissions from any $\mathsf{N}_{\mathrm{r}}$ workers, such that the system can tolerate any $\mathsf{N}-\mathsf{N}_{\mathrm{r}}$ stragglers. We focus on the scenario where the computation cost (the number of uncoded datasets assigned to each worker) is minimum, and aim to minimize the communication cost (the number of symbols the fastest $\mathsf{N}_{\mathrm{r}}$ workers transmit). We propose a novel distributed computing scheme that is optimal under the widely used cyclic data assignment. Interestingly, we demonstrate that the side information at each worker is ineffective in reducing the communication cost when $\mathsf{K}_{\mathrm{c}}\leq {\mathsf{K}}\mathsf{N}_{\mathrm{r}}/{\mathsf{N}}$, while it helps reduce the communication cost as $\mathsf{K}_{\mathrm{c}}$ increases. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.09548 [pdf, other]

doi 10.1103/PhysRevB.111.205104

Universal contributions to charge fluctuations in spin chains at finite temperature

Authors: Kang-Le Cai, Meng Cheng

Abstract: At finite temperature, conserved charges undergo thermal fluctuations in a quantum many-body system in the grand canonical ensemble. The full structure of the fluctuations of the total U(1) charge $Q$ can be succinctly captured by the generating function $G(θ)=\left\langle e^{i θQ}\right\rangle$. For a 1D translation-invariant spin chain, in the thermodynamic limit the magnitude $|G(θ)|$ scales wi… ▽ More At finite temperature, conserved charges undergo thermal fluctuations in a quantum many-body system in the grand canonical ensemble. The full structure of the fluctuations of the total U(1) charge $Q$ can be succinctly captured by the generating function $G(θ)=\left\langle e^{i θQ}\right\rangle$. For a 1D translation-invariant spin chain, in the thermodynamic limit the magnitude $|G(θ)|$ scales with the system size $L$ as $\ln |G(θ)|=-α(θ)L+γ(θ)$, where $γ(θ)$ is the scale-invariant contribution and may encode universal information about the underlying system. In this work we investigate the behavior and physical meaning of $γ(θ)$ when the system is periodic. We find that $γ(θ)$ only takes nonzero values at isolated points of $θ$, which is $θ=π$ for all our examples. In two exemplary lattice systems we show that $γ(π)$ takes quantized values when the U(1) symmetry exhibits a specific type of 't Hooft anomaly with other symmetries. In other cases, we investigate how $γ(θ)$ depends on microscopic conditions (such as the filling factor) in field theory and exactly solvable lattice models. △ Less

Submitted 30 April, 2025; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: 21 pages, 5 figures, published version

Journal ref: Phys. Rev. B 111, 205104 (2025)

arXiv:2401.08052 [pdf, other]

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Authors: Ming Cheng, Ming Li

Abstract: Audio-visual learning has demonstrated promising results in many classical speech tasks (e.g., speech separation, automatic speech recognition, wake-word spotting). We believe that introducing visual modality will also benefit speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD) plays an important role in highly accurate speaker diarization. However, previous TS-VAD model… ▽ More Audio-visual learning has demonstrated promising results in many classical speech tasks (e.g., speech separation, automatic speech recognition, wake-word spotting). We believe that introducing visual modality will also benefit speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD) plays an important role in highly accurate speaker diarization. However, previous TS-VAD models take audio features and utilize the speaker's acoustic footprint to distinguish his or her personal speech activities, which is easily affected by overlapped speech in multi-speaker scenarios. Although visual information naturally tolerates overlapped speech, it suffers from spatial occlusion, low resolution, etc. The potential modality-missing problem blocks TS-VAD towards an audio-visual approach. This paper proposes a novel Multi-Input Multi-Output Target-Speaker Voice Activity Detection (MIMO-TSVAD) framework for speaker diarization. The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework. Experimental results show that the MIMO-TSVAD framework demonstrates state-of-the-art performance on the VoxConverse, DIHARD-III, and MISP 2022 datasets under corresponding evaluation metrics, obtaining the Diarization Error Rates (DERs) of 4.18%, 10.10%, and 8.15%, respectively. In addition, it can perform robustly in heavy lip-missing scenarios. △ Less

Submitted 29 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: Under review of IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2401.00330 [pdf, other]

Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Authors: Yinglun Xu, Tarun Suresh, Rohan Gumaste, David Zhu, Ruirui Li, Zhengyang Wang, Haoming Jiang, Xianfeng Tang, Qingyu Yin, Monica Xiao Cheng, Qi Zeng, Chao Zhang, Gagandeep Singh

Abstract: Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling step has been widely adopted for the problem. However, such a method faces challenges from the risk of reward hacking and the complexity of reinforcement learnin… ▽ More Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling step has been widely adopted for the problem. However, such a method faces challenges from the risk of reward hacking and the complexity of reinforcement learning. To overcome the challenge, our insight is that both challenges come from the state-actions not supported in the dataset. Such state-actions are unreliable and increase the complexity of the reinforcement learning problem at the second step. Based on the insight, we develop a novel two-step learning method called PRC: preference-based reinforcement learning with constrained actions. The high-level idea is to limit the reinforcement learning agent to optimize over a constrained action space that excludes the out-of-distribution state-actions. We empirically verify that our method has high learning efficiency on various datasets in robotic control environments. △ Less

Submitted 25 October, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.15661 [pdf, other]

Unlocking the Potential of Large Language Models for Explainable Recommendations

Authors: Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, Qi Liu, Enhong Chen

Abstract: Generating user-friendly explanations regarding why an item is recommended has become increasingly common, largely due to advances in language generation technology, which can enhance user trust and facilitate more informed decision-making when using online services. However, existing explainable recommendation systems focus on using small-size language models. It remains uncertain what impact rep… ▽ More Generating user-friendly explanations regarding why an item is recommended has become increasingly common, largely due to advances in language generation technology, which can enhance user trust and facilitate more informed decision-making when using online services. However, existing explainable recommendation systems focus on using small-size language models. It remains uncertain what impact replacing the explanation generator with the recently emerging large language models (LLMs) would have. Can we expect unprecedented results? In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs. Unlike most existing LLM-based recommendation works, a key characteristic of LLMXRec is its emphasis on the close collaboration between previous recommender models and LLM-based explanation generators. Specifically, by adopting several key fine-tuning techniques, including parameter-efficient instructing tuning and personalized prompt techniques, controllable and fluent explanations can be well generated to achieve the goal of explanation recommendation. Most notably, we provide three different perspectives to evaluate the effectiveness of the explanations. Finally, we conduct extensive experiments over several benchmark recommender models and publicly available datasets. The experimental results not only yield positive results in terms of effectiveness and efficiency but also uncover some previously unknown outcomes. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/GodFire66666/LLM_rec_explanation/. △ Less

Submitted 3 January, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15190 [pdf, other]

SAIC: Integration of Speech Anonymization and Identity Classification

Authors: Ming Cheng, Xingjian Diao, Shitong Cheng, Wenjun Liu

Abstract: Speech anonymization and de-identification have garnered significant attention recently, especially in the healthcare area including telehealth consultations, patient voiceprint matching, and patient real-time monitoring. Speaker identity classification tasks, which involve recognizing specific speakers from audio to learn identity features, are crucial for de-identification. Since rare studies ha… ▽ More Speech anonymization and de-identification have garnered significant attention recently, especially in the healthcare area including telehealth consultations, patient voiceprint matching, and patient real-time monitoring. Speaker identity classification tasks, which involve recognizing specific speakers from audio to learn identity features, are crucial for de-identification. Since rare studies have effectively combined speech anonymization with identity classification, we propose SAIC - an innovative pipeline for integrating Speech Anonymization and Identity Classification. SAIC demonstrates remarkable performance and reaches state-of-the-art in the speaker identity classification task on the Voxceleb1 dataset, with a top-1 accuracy of 96.1%. Although SAIC is not trained or evaluated specifically on clinical data, the result strongly proves the model's effectiveness and the possibility to generalize into the healthcare area, providing insightful guidance for future work. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.13498 [pdf]

doi 10.21468/SciPostPhys.17.1.010

Extracting subleading corrections in entanglement entropy at quantum phase transitions

Authors: Menghan Song, Jiarui Zhao, Zi Yang Meng, Cenke Xu, Meng Cheng

Abstract: We systematically investigate the finite size scaling behavior of the Rényi entanglement entropy (EE) of several representative 2d quantum many-body systems between a subregion and its complement, with smooth boundaries as well as boundaries with corners. In order to reveal the subleading correction, we investigate the quantity ``subtracted EE" $S^s(l) = S(2l) - 2S(l)$ for each model, which is des… ▽ More We systematically investigate the finite size scaling behavior of the Rényi entanglement entropy (EE) of several representative 2d quantum many-body systems between a subregion and its complement, with smooth boundaries as well as boundaries with corners. In order to reveal the subleading correction, we investigate the quantity ``subtracted EE" $S^s(l) = S(2l) - 2S(l)$ for each model, which is designed to cancel out the leading perimeter law. We find that $\mathbf{(1)}$ for a spin-1/2 model on a 2d square lattice whose ground state is the Neel order, the coefficient of the logarithmic correction to the perimeter law is consistent with the prediction based on the Goldstone modes; $\mathbf{(2)}$ for the $(2+1)d$ O(3) Wilson-Fisher quantum critical point (QCP), realized with the bilayer antiferromagnetic Heisenberg model, a logarithmic subleading correction exists when there is sharp corner of the subregion, but for subregion with a smooth boundary our data suggests the absence of the logarithmic correction to the best of our efforts; $\mathbf{(3)}$ for the $(2+1)d$ SU(2) J-Q$_2$ and J-Q$_3$ model for the deconfined quantum critical point (DQCP), we find a logarithmic correction for the EE even with smooth boundary. △ Less

Submitted 16 July, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Journal ref: SciPost Phys. 17, 010 (2024)

arXiv:2312.13311 [pdf, other]

Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks

Authors: Anzhe Cheng, Zhenkun Wang, Chenzhong Yin, Mingxi Cheng, Heng Ping, Xiongye Xiao, Shahin Nazarian, Paul Bogdan

Abstract: Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchron… ▽ More Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: The paper has been accepted by ICASSP2024

arXiv:2312.12722 [pdf, other]

Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu, Ming-Ming Cheng

Abstract: Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. The conv… ▽ More Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. The conventional knowledge distillation-based methods place too strict constraints on the network parameters and features to prevent forgetting, which limits the training of new tasks. To loose this constraint, we proposed a novel fine-grained selective patch-level distillation to adaptively balance plasticity and stability. Some task-agnostic patches can be used to preserve the decision boundary of the old task. While some patches containing the important foreground are favorable for learning the new task. Moreover, we employ a task-agnostic mechanism to generate more realistic prototypes of old tasks with the current task sample for reducing classifier bias for fine-grained knowledge restoration. Extensive experiments on CIFAR100, TinyImageNet and ImageNet-Subset demonstrate the effectiveness of our method. Code is available at https://github.com/scok30/vit-cil. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: to appear at AAAI 2024

arXiv:2312.12667 [pdf, other]

Discovering Malicious Signatures in Software from Structural Interactions

Authors: Chenzhong Yin, Hantang Zhang, Mingxi Cheng, Xiongye Xiao, Xinghe Chen, Xin Ren, Paul Bogdan

Abstract: Malware represents a significant security concern in today's digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed (``zero-day") malware and are limited by customized virtual machine (VM) e… ▽ More Malware represents a significant security concern in today's digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed (``zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network's topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: ICASSP 2024, Accepted

arXiv:2312.09608 [pdf, other]

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

Authors: Senmao Li, Taihang Hu, Joost van de Weijer, Fahad Shahbaz Khan, Tao Liu, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

Abstract: One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable computational resources. In this paper, we take another approach to diffusion model acceleration. We conduct a comprehensive study of the UNet encoder and empirically analy… ▽ More One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable computational resources. In this paper, we take another approach to diffusion model acceleration. We conduct a comprehensive study of the UNet encoder and empirically analyze the encoder features. This provides insights regarding their changes during the inference process. In particular, we find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. This insight motivates us to omit encoder computation at certain adjacent time-steps and reuse encoder features of previous time-steps as input to the decoder in multiple time-steps. Importantly, this allows us to perform decoder computation in parallel, further accelerating the denoising process. Additionally, we introduce a prior noise injection method to improve the texture details in the generated image. Besides the standard text-to-image task, we also validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation. Without utilizing any knowledge distillation technique, our approach accelerates both the Stable Diffusion (SD) and DeepFloyd-IF model sampling by 41$\%$ and 24$\%$ respectively, and DiT model sampling by 34$\%$, while maintaining high-quality generation performance. △ Less

Submitted 15 October, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: NeurIPS 2024

arXiv:2312.08912 [pdf, other]

Dataset Distillation via Adversarial Prediction Matching

Authors: Mingyang Chen, Bo Huang, Junda Lu, Bing Li, Yi Wang, Minhao Cheng, Wei Wang

Abstract: Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large origi… ▽ More Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large original dataset and on the small distilled dataset, as a conduit for condensing information from the raw data into the distilled version. An adversarial framework is proposed to solve the problem efficiently. In contrast to existing distillation methods involving nested optimization or long-range gradient unrolling, our approach hinges on single-level optimization. This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6.5GB of GPU memory. Under the optimal tradeoff strategy, it requires only 2.5$\times$ less memory and 5$\times$ less runtime compared to the state-of-the-art. Empirically, our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets including ImageNet-1K, significantly surpassing state-of-the-art. Additionally, extensive tests reveal that our distilled datasets excel in cross-architecture generalization capabilities. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.06947 [pdf, other]

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

Authors: Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang, Ming-Ming Cheng

Abstract: 3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this… ▽ More 3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this framework, first, we introduce a new SDF-based 3D generator which learns local and global representations with proposed SDF and density consistency losses. This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT). Compared to exiting distillation strategies, it mitigates visual ambiguity and avoids mismatch between texture and geometry, thereby producing stable texture and convincing geometry while editing. Additionally, we create the CatMask-HQ dataset, a large-scale high-resolution cat face annotation for exploration of model generalization and expansion. We perform expensive experiments on both the FFHQ and CatMask-HQ datasets to demonstrate the editing quality and stability of the proposed method. Our method faithfully generates a 3D-aware edited face image based on a modified mask and a text prompt. Our code and models will be publicly released. △ Less

Submitted 5 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 16 pages, 13 figures

arXiv:2312.05830 [pdf, other]

A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation

Authors: Yunheng Li, Zhongyu Li, Shanghua Gao, Qilong Wang, Qibin Hou, Ming-Ming Cheng

Abstract: Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two forms of decoupled modeling: (i) cascaded interaction couples spatial and temporal modeling, which over-smooths motion modeling over the long sequence, and (ii) j… ▽ More Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two forms of decoupled modeling: (i) cascaded interaction couples spatial and temporal modeling, which over-smooths motion modeling over the long sequence, and (ii) joint-shared temporal modeling adopts shared weights to model each joint, ignoring the distinct motion patterns of different joints. We propose a Decoupled Spatio-Temporal Framework (DeST) to address the above issues. Firstly, we decouple the cascaded spatio-temporal interaction to avoid stacking multiple spatio-temporal blocks, while achieving sufficient spatio-temporal interaction. Specifically, DeST performs once unified spatial modeling and divides the spatial features into different groups of subfeatures, which then adaptively interact with temporal features from different layers. Since the different sub-features contain distinct spatial semantics, the model could learn the optimal interaction pattern at each layer. Meanwhile, inspired by the fact that different joints move at different speeds, we propose joint-decoupled temporal modeling, which employs independent trainable weights to capture distinctive temporal features of each joint. On four large-scale benchmarks of different scenes, DeST significantly outperforms current state-of-the-art methods with less computational complexity. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05801 [pdf, other]

Stability and Character of Zero Field Skyrmionic States in Hybrid Magnetic Multilayer Nanodots

Authors: Alexander Kang-Jun Toh, McCoy W. Lim, T. S. Suraj, Xiaoye Chen, Hang Khume Tan, Royston Lim, Xuan Min Cheng, Nelson Lim, Sherry Yap, Durgesh Kumar, S. N. Piramanayagam, Pin Ho, Anjan Soumyanarayanan

Abstract: Ambient magnetic skyrmions stabilized in multilayer nanostructures are of immense interest due to their relevance to magnetic tunnel junction (MTJ) devices for memory and unconventional computing applications. However, existing skyrmionic nanostructures built using conventional metallic or oxide multilayer nanodots are unable to concurrently fulfill the requirements of nanoscale skyrmion stability… ▽ More Ambient magnetic skyrmions stabilized in multilayer nanostructures are of immense interest due to their relevance to magnetic tunnel junction (MTJ) devices for memory and unconventional computing applications. However, existing skyrmionic nanostructures built using conventional metallic or oxide multilayer nanodots are unable to concurrently fulfill the requirements of nanoscale skyrmion stability and feasibility of all-electrical readout and manipulation. Here, we develop a few-repeat hybrid multilayer platform consisting of metallic [Pt/CoB/Ir]3 and oxide [Pt/CoB/MgO] components that are coupled to evolve together as a single, composite stack. Zero-field (ZF) skyrmions with sizes as small as 50 nm are stabilized in the hybrid multilayer nanodots, which are smoothly modulated by up to 2.5x by varying CoB thickness and dot sizes. Meanwhile, skyrmion multiplets are also stabilized by small bias fields. Crucially, we observe higher order 'target' skyrmions with varying magnetization rotations in moderately-sized, low anisotropy nanodots. These results provide a viable route to realize long-sought skyrmionic MTJ devices and new possibilities for multi-state skyrmionic device concepts. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05430 [pdf, other]

FT2TF: First-Person Statement Text-To-Talking Face Generation

Authors: Xingjian Diao, Ming Cheng, Wayner Barrios, SouYoung Jin

Abstract: Talking face generation has gained immense popularity in the computer vision community, with various applications including AR, VR, teleconferencing, digital assistants, and avatars. Traditional methods are mainly audio-driven, which have to deal with the inevitable resource-intensive nature of audio storage and processing. To address such a challenge, we propose FT2TF - First-Person Statement Tex… ▽ More Talking face generation has gained immense popularity in the computer vision community, with various applications including AR, VR, teleconferencing, digital assistants, and avatars. Traditional methods are mainly audio-driven, which have to deal with the inevitable resource-intensive nature of audio storage and processing. To address such a challenge, we propose FT2TF - First-Person Statement Text-To-Talking Face Generation, a novel one-stage end-to-end pipeline for talking face generation driven by first-person statement text. Different from previous work, our model only leverages visual and textual information without any other sources (e.g., audio/landmark/pose) during inference. Extensive experiments are conducted on LRS2 and LRS3 datasets, and results on multi-dimensional evaluation metrics are reported. Both quantitative and qualitative results showcase that FT2TF outperforms existing relevant methods and reaches the state-of-the-art. This achievement highlights our model's capability to bridge first-person statements and dynamic face generation, providing insightful guidance for future work. △ Less

Submitted 19 November, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted at WACV 2025

arXiv:2312.04461 [pdf, other]

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Authors: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

Abstract: Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized t… ▽ More Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/ △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Tech report; Project page: https://photo-maker.github.io/

arXiv:2312.04248 [pdf, other]

TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Authors: Xuying Zhang, Bo-Wen Yin, Yuming Chen, Zheng Lin, Yunheng Li, Qibin Hou, Ming-Ming Cheng

Abstract: Recent progress in the text-driven 3D stylization of a single object has been considerably promoted by CLIP-based methods. However, the stylization of multi-object 3D scenes is still impeded in that the image-text pairs used for pre-training CLIP mostly consist of an object. Meanwhile, the local details of multiple objects may be susceptible to omission due to the existing supervision manner prima… ▽ More Recent progress in the text-driven 3D stylization of a single object has been considerably promoted by CLIP-based methods. However, the stylization of multi-object 3D scenes is still impeded in that the image-text pairs used for pre-training CLIP mostly consist of an object. Meanwhile, the local details of multiple objects may be susceptible to omission due to the existing supervision manner primarily relying on coarse-grained contrast of image-text pairs. To overcome these challenges, we present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles under the contrast supervision at multiple levels. We first propose a Decoupled Graph Attention (DGA) module to distinguishably reinforce the features of 3D surface points. Particularly, a cross-modal graph is constructed to align the object points accurately and noun phrases decoupled from the 3D mesh and textual description. Then, we develop a Cross-Grained Contrast (CGC) supervision system, where a fine-grained loss between the words in the textual description and the randomly rendered images are constructed to complement the coarse-grained loss. Extensive experiments show that our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes. Our code and results will be made publicly available △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.18282 [pdf]

Pressure-Modulated Structural and Magnetic Phase Transitions in Two-Dimensional FeTe: Tetragonal and Hexagonal Polymorphs

Authors: Wuxiao Han, Jiajia Feng, Hongliang Dong, Mo Cheng, Liu Yang, Yunfei Yu, Guoshuai Du, Jiayin Li, Yubing Du, Tiansong Zhang, Zhiwei Wang, Bin Chen, Jianping Shi, Yabin Chen

Abstract: Two-dimensional (2D) Fe-chalcogenides with rich structures, magnetisms and superconductivities are highly desirable to reveal the torturous transition mechanism and explore their potential applications in spintronics and nanoelectronics. Hydrostatic pressure can effectively stimulate novel phase transitions between various ordered states and to plot the seductive phase diagram. Herein, the structu… ▽ More Two-dimensional (2D) Fe-chalcogenides with rich structures, magnetisms and superconductivities are highly desirable to reveal the torturous transition mechanism and explore their potential applications in spintronics and nanoelectronics. Hydrostatic pressure can effectively stimulate novel phase transitions between various ordered states and to plot the seductive phase diagram. Herein, the structural evolution and transport characteristics of 2D FeTe were systematically investigated under extreme conditions through comparing two distinct symmetries, i.e., tetragonal (t-) and hexagonal (h-) FeTe. We found that 2D t-FeTe presented the pressure-induced transition from antiferromagnetic to ferromagnetic states at ~ 3 GPa, corresponding to the tetragonal collapse of layered structure. Contrarily, ferromagnetic order of 2D h-FeTe was retained up to 15 GPa, evidently confirmed by electrical transport and Raman measurements. Furthermore, the detailed P-T phase diagrams of both 2D t-FeTe and h-FeTe were mapped out with the delicate critical conditions. We believe our results can provide a unique platform to elaborate the extraordinary physical properties of Fe-chalcogenides and further to develop their practical applications. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 22 Pages, 5 Figures

arXiv:2311.00388 [pdf, other]

AutoSAM: Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems

Authors: Hao Zhang, Mingyue Cheng, Qi Liu, Zhiding Liu, Junzhe Jiang, Enhong Chen

Abstract: Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the… ▽ More Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the user's interest. For example, purchased items should be given more importance than clicked ones. Hence, we propose a general automatic sampling framework, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM augments the standard sequential recommendation architecture with an additional sampler layer to adaptively learn the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To overcome the challenges of non-differentiable sampling actions and also introduce multiple decision factors for sampling, we further introduce a novel reinforcement learning based method to guide the training of the sampler. We theoretically design multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommender models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed approach. We will make our code publicly available after the acceptance. △ Less

Submitted 2 January, 2025; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.20348 [pdf, other]

Class Incremental Learning with Pre-trained Vision-Language Models

Authors: Xialei Liu, Xusheng Cao, Haori Lu, Jia-wen Xiao, Andrew D. Bagdanov, Ming-Ming Cheng

Abstract: With the advent of large-scale pre-trained models, interest in adapting and exploiting them for continual learning scenarios has grown. In this paper, we propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation instead of only using zero-shot learning of new tasks. We augment a pre-trained CLIP model with additional layers after the Image E… ▽ More With the advent of large-scale pre-trained models, interest in adapting and exploiting them for continual learning scenarios has grown. In this paper, we propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation instead of only using zero-shot learning of new tasks. We augment a pre-trained CLIP model with additional layers after the Image Encoder or before the Text Encoder. We investigate three different strategies: a Linear Adapter, a Self-attention Adapter, each operating on the image embedding, and Prompt Tuning which instead modifies prompts input to the CLIP text encoder. We also propose a method for parameter retention in the adapter layers that uses a measure of parameter importance to better maintain stability and plasticity during incremental learning. Our experiments demonstrate that the simplest solution -- a single Linear Adapter layer with parameter retention -- produces the best results. Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.20239 [pdf, other]

Coded Caching Schemes for Multiaccess Topologies via Combinatorial Design

Authors: Minquan Cheng, Kai Wan, Petros Elia, Giuseppe Caire

Abstract: This paper studies a multiaccess coded caching (MACC) where the connectivity topology between the users and the caches can be described by a class of combinatorial designs. Our model includes as special cases several MACC topologies considered in previous works. The considered MACC network includes a server containing $N$ files, $Γ$ cache nodes and $K$ cacheless users, where each user can access… ▽ More This paper studies a multiaccess coded caching (MACC) where the connectivity topology between the users and the caches can be described by a class of combinatorial designs. Our model includes as special cases several MACC topologies considered in previous works. The considered MACC network includes a server containing $N$ files, $Γ$ cache nodes and $K$ cacheless users, where each user can access $L$ cache nodes. The server is connected to the users via an error-free shared link, while the users can retrieve the cache content of the connected cache-nodes while the users can directly access the content in their connected cache-nodes. Our goal is to minimise the worst-case transmission load on the shared link in the delivery phase. The main limitation of the existing MACC works is that only some specific access topologies are considered, and thus the number of users $K$ should be either linear or exponential to $Γ$. We overcome this limitation by formulating a new access topology derived from two classical combinatorial structures, referred to as the $t$-design and the $t$-group divisible design. In these topologies, $K$ scales linearly, polynomially, or even exponentially with $Γ$. By leveraging the properties of the considered combinatorial structures, we propose two classes of coded caching schemes for a flexible number of users, where the number of users can scale linearly, polynomially or exponentially with the number of cache nodes. In addition, our schemes can unify most schemes for the shared link network and unify many schemes for the multi-access network except for the cyclic wrap-around topology. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 48 pages

arXiv:2310.20167 [pdf]

Phase-Modulated Elastic Properties of Two-Dimensional Magnetic FeTe: Hexagonal and Tetragonal Polymorphs

Authors: Yunfei Yu, Mo Cheng, Zicheng Tao, Wuxiao Han, Guoshuai Du, Yanfeng Guo, Jianping Shi, Yabin Chen

Abstract: Two-dimensional (2D) layered magnets, such as iron chalcogenides, have emerged these years as a new family of unconventional superconductor and provided the key insights to understand the phonon-electron interaction and pairing mechanism. Their mechanical properties are of strategic importance for the potential applications in spintronics and optoelectronics. However, there is still lack of effici… ▽ More Two-dimensional (2D) layered magnets, such as iron chalcogenides, have emerged these years as a new family of unconventional superconductor and provided the key insights to understand the phonon-electron interaction and pairing mechanism. Their mechanical properties are of strategic importance for the potential applications in spintronics and optoelectronics. However, there is still lack of efficient approach to tune the elastic modulus despite the extensive studies. Herein, we report the modulated elastic modulus of 2D magnetic FeTe and its thickness-dependence via phase engineering. The grown 2D FeTe by chemical vapor deposition can present various polymorphs, i.e. tetragonal FeTe (t-FeTe, antiferromagnetic) and hexagonal FeTe (h-FeTe, ferromagnetic). The measured Young's modulus of t-FeTe by nanoindentation method showed an obvious thickness-dependence, from 290.9+-9.2 to 113.0+-8.7 GPa when the thicknesses increased from 13.2 to 42.5 nm, respectively. In comparison, the elastic modulus of h-FeTe remains unchanged. Our results could shed light on the efficient modulation of mechanical properties of 2D magnetic materials and pave the avenues for their practical applications in nanodevices. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 19 pages, 4 figures

arXiv:2310.18439 [pdf, other]

Machine learning detecting Majorana Zero Mode from Zero Bias Peak measurements

Authors: Mouyang Cheng, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mingda Li

Abstract: Majorana zero modes (MZMs), emerging as exotic quasiparticles that carry non-Abelian statistics, hold great promise for achieving fault-tolerant topological quantum computation. A key signature of the presence of MZMs is the zero-bias peaks (ZBPs) from tunneling differential conductance. However, the identification of MZMs from ZBPs has faced tremendous challenges, due to the presence of topologic… ▽ More Majorana zero modes (MZMs), emerging as exotic quasiparticles that carry non-Abelian statistics, hold great promise for achieving fault-tolerant topological quantum computation. A key signature of the presence of MZMs is the zero-bias peaks (ZBPs) from tunneling differential conductance. However, the identification of MZMs from ZBPs has faced tremendous challenges, due to the presence of topological trivial states that generate spurious ZBP signals. In this work, we introduce a machine-learning framework that can discern MZM from other signals using ZBP data. Quantum transport simulation from tight-binding models is used to generate the training data, while persistent cohomology analysis confirms the feasibility of classification via machine learning. In particular, even with added data noise, XGBoost classifier reaches $85\%$ accuracy for 1D tunneling conductance data and $94\%$ for 2D data incorporating Zeeman splitting. Tests on prior ZBP experiments show that some data are more likely to originate from MZM than others. Our model offers a quantitative approach to assess MZMs using ZBP data. Furthermore, our results shed light on the use of machine learning on exotic quantum systems with experimental-computational integration. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.17931 [pdf, other]

Coded Caching Scheme for Partially Connected Linear Networks Via Multi-antenna Placement Delivery Array

Authors: Minquan Cheng, Yun Xie, Zhenhao Huang, Mingming Zhang, Youlong Wu

Abstract: In this paper, we study the coded caching scheme for the $(K,L,M_{\text{T}},M_{\text{U}},N)$ partially connected linear network, where there are $N$ files each of which has an equal size, $K+L-1$ transmitters and $K$ users; each user and transmitter caches at most $M_{\text{U}}$ and $M_{\text{T}}$ files respectively; each user cyclically communicates with $L$ transmitters. The goal is to design ca… ▽ More In this paper, we study the coded caching scheme for the $(K,L,M_{\text{T}},M_{\text{U}},N)$ partially connected linear network, where there are $N$ files each of which has an equal size, $K+L-1$ transmitters and $K$ users; each user and transmitter caches at most $M_{\text{U}}$ and $M_{\text{T}}$ files respectively; each user cyclically communicates with $L$ transmitters. The goal is to design caching and delivery schemes to reduce the transmission latency measured by the metric normalized delivery time (NDT). By delicately designing the data placement of the transmitters and users according to the topology, we show that a combinatorial structure called multiple-antenna placement delivery array (MAPDA), which was originally proposed for the multiple-input single-output broadcast channels, can be also used to design schemes for the partially connected linear network. Then, based on existing MAPDAs and our constructing approach, we propose new schemes that achieve the optimal NDT when $ {M_\text{T}}+ {M_\text{U}}\geq N$ and smaller NDT than that of the existing schemes when (${M_\text{T}}+ {M_\text{U}}\leq N$, $\frac{M_\text{U}}{N}+\frac{M_\text{T}}{N} \frac{L}{K}\left\lceil \frac{K}{L} \right\rceil \geq 1$) or ($ {M_\text{U}}+ {M_\text{T}}< N, \frac{K}{L}\notin\mathbb{Z}^+$). Moreover, our schemes operate in one-shot linear delivery and significantly reduce the subpacketizations compared to the existing scheme, which implies that our schemes have a wider range of applications and lower complexity of implementation. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 13 pages

arXiv:2310.16878 [pdf, other]

Topological holography, quantum criticality, and boundary states

Authors: Sheng-Jie Huang, Meng Cheng

Abstract: Topological holography is a holographic principle that describes the generalized global symmetry of a local quantum system in terms of a topological order in one higher dimension. This framework separates the topological data from the local dynamics of a theory and provides a unified description of the symmetry and duality in gapped and gapless phases of matter. In this work, we develop the topolo… ▽ More Topological holography is a holographic principle that describes the generalized global symmetry of a local quantum system in terms of a topological order in one higher dimension. This framework separates the topological data from the local dynamics of a theory and provides a unified description of the symmetry and duality in gapped and gapless phases of matter. In this work, we develop the topological holographic picture for (1+1)d quantum phases, including both gapped phases as well as a wide range of quantum critical points, including phase transitions between symmetry protected topological (SPT) phases, symmetry enriched quantum critical points, deconfined quantum critical points, and intrinsically gapless SPT phases. Topological holography puts a strong constraint on the emergent symmetry and the anomaly for these critical theories. We show how the partition functions of these critical points can be obtained from dualizing (orbifolding) more familiar critical theories. The topological responses of the defect operators are also discussed in this framework. We further develop a topological holographic picture for conformal boundary states of (1+1)d rational conformal field theories. This framework provides a simple physical picture to understand conformal boundary states and also uncovers the nature of the gapped phases corresponding to the boundary states. △ Less

Submitted 1 April, 2025; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 43 pages, 10 figures, 3 tables. v2: references added. v3: Added a conclusion section and minor revision

arXiv:2310.15371 [pdf, other]

doi 10.1007/978-3-031-34048-2_28

Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

Authors: Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, Jinzhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu

Abstract: Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the g… ▽ More Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the generalization capabilities of conventional centralized DL as a "free lunch", its application in FL is largely underexplored. Notably, constrained by costly labeling, 3D medical segmentation generally relies on data augmentation. In this work, we aim to develop a vicinal feature-level data augmentation (VFDA) scheme to efficiently alleviate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation. We take both the inner- and inter-institute divergence into consideration, without the need for cross-institute transfer of raw data or their mixup. Specifically, we exploit the batch-wise feature statistics (e.g., mean and standard deviation) in each institute to abstractly represent the discrepancy of data, and model each feature statistic probabilistically via a Gaussian prototype, with the mean corresponding to the original statistic and the variance quantifying the augmentation scope. From the vicinal risk minimization perspective, novel feature statistics can be drawn from the Gaussian distribution to fulfill augmentation. The variance is explicitly derived by the data bias in each individual institute and the underlying feature statistics characterized by all participating institutes. The added-on VFDA consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023): Oral Paper

Journal ref: In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds) Information Processing in Medical Imaging. IPMI 2023. Lecture Notes in Computer Science, vol 13939. Springer, Cham

arXiv:2310.13215 [pdf, other]

Zone Evaluation: Revealing Spatial Bias in Object Detection

Authors: Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, Ming-Ming Cheng

Abstract: A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders. For a long time, there has been a lack of effective ways to measure and identify spatial bias, and little is known about where it comes from and what degree it is. To this end, we present a new zone evaluation protocol, exten… ▽ More A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders. For a long time, there has been a lack of effective ways to measure and identify spatial bias, and little is known about where it comes from and what degree it is. To this end, we present a new zone evaluation protocol, extending from the traditional evaluation to a more generalized one, which measures the detection performance over zones, yielding a series of Zone Precisions (ZPs). For the first time, we provide numerical results, showing that the object detectors perform quite unevenly across the zones. Surprisingly, the detector's performance in the 96% border zone of the image does not reach the AP value (Average Precision, commonly regarded as the average detection performance in the entire image zone). To better understand spatial bias, a series of heuristic experiments are conducted. Our investigation excludes two intuitive conjectures about spatial bias that the object scale and the absolute positions of objects barely influence the spatial bias. We find that the key lies in the human-imperceptible divergence in data patterns between objects in different zones, thus eventually forming a visible performance gap between the zones. With these findings, we finally discuss a future direction for object detection, namely, spatial disequilibrium problem, aiming at pursuing a balanced detection ability over the entire image zone. By broadly evaluating 10 popular object detectors and 5 detection datasets, we shed light on the spatial bias of object detectors. We hope this work could raise a focus on detection robustness. The source codes, evaluation protocols, and tutorials are publicly available at https://github.com/Zzh-tju/ZoneEval. △ Less

Submitted 1 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted by IEEE TPAMI

arXiv:2310.12192 [pdf, other]

doi 10.5642/jhummath.YMZO2460

The Braids on your Blanket

Authors: Michelle Cheng, Robert Laugwitz

Abstract: In this expositional essay, we introduce some elements of the study of groups by analysing the braid pattern on a knitted blanket. We determine that the blanket features pure braids with a minimal number of crossings. Moreover, we determine polynomial invariants associated to the links obtained by closing the braid patterns of the blanket. In this expositional essay, we introduce some elements of the study of groups by analysing the braid pattern on a knitted blanket. We determine that the blanket features pure braids with a minimal number of crossings. Moreover, we determine polynomial invariants associated to the links obtained by closing the braid patterns of the blanket. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Expositional article for a general readership. 32 pages, several figures

MSC Class: 00A66 (Primary) 00-01; 20F36; 57K10 (Secondary)

Journal ref: Journal of Humanistic Mathematics, Volume 14 Issue 2 (July 2024), pages 286-337. Available at: https://scholarship.claremont.edu/jhm/vol14/iss2/10

arXiv:2310.11762 [pdf, other]

A Quasi-Wasserstein Loss for Learning Graph Neural Networks

Authors: Minjie Cheng, Hongteng Xu

Abstract: When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to n… ▽ More When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to new learning and prediction paradigms of GNNs. In particular, we design a ``Quasi-Wasserstein'' distance between the observed multi-dimensional node labels and their estimations, optimizing the label transport defined on graph edges. The estimations are parameterized by a GNN in which the optimal label transport may determine the graph edge weights optionally. By reformulating the strict constraint of the label transport to a Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein loss associated with two efficient solvers learning the GNN together with optimal label transport. When predicting node labels, our model combines the output of the GNN with the residual component provided by the optimal label transport, leading to a new transductive prediction paradigm. Experiments show that the proposed QW loss applies to various GNNs and helps to improve their performance in node-level classification and regression tasks. The code of this work can be found at \url{https://github.com/SDS-Lab/QW_Loss}. △ Less

Submitted 13 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.11501 [pdf, other]

CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations

Authors: Myra Cheng, Tiziano Piccardi, Diyi Yang

Abstract: Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the… ▽ More Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the personas that they aim to simulate, failing to capture the multidimensionality of people and perpetuating stereotypes. To bridge these gaps, we present CoMPosT, a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration. We evaluate the level of caricature in scenarios from existing work on LLM simulations. We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: To appear at EMNLP 2023 (Main)

arXiv:2310.08210 [pdf, other]

CLExtract: Recovering Highly Corrupted DVB/GSE Satellite Stream with Contrastive Learning

Authors: Minghao Lin, Minghao Cheng, Dongsheng Luo, Yueqi Chen

Abstract: Since satellite systems are playing an increasingly important role in our civilization, their security and privacy weaknesses are more and more concerned. For example, prior work demonstrates that the communication channel between maritime VSAT and ground segment can be eavesdropped on using consumer-grade equipment. The stream decoder GSExtract developed in this prior work performs well for most… ▽ More Since satellite systems are playing an increasingly important role in our civilization, their security and privacy weaknesses are more and more concerned. For example, prior work demonstrates that the communication channel between maritime VSAT and ground segment can be eavesdropped on using consumer-grade equipment. The stream decoder GSExtract developed in this prior work performs well for most packets but shows incapacity for corrupted streams. We discovered that such stream corruption commonly exists in not only Europe and North Atlantic areas but also Asian areas. In our experiment, using GSExtract, we are only able to decode 2.1\% satellite streams we eavesdropped on in Asia. Therefore, in this work, we propose to use a contrastive learning technique with data augmentation to decode and recover such highly corrupted streams. Rather than rely on critical information in corrupted streams to search for headers and perform decoding, contrastive learning directly learns the features of packet headers at different protocol layers and identifies them in a stream sequence. By filtering them out, we can extract the innermost data payload for further analysis. Our evaluation shows that this new approach can successfully recover 71-99\% eavesdropped data hundreds of times faster speed than GSExtract. Besides, the effectiveness of our approach is not largely damaged when stream corruption becomes more severe. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: SpaceSec'23, 11 pages, 14 figures

arXiv:2310.07885 [pdf, other]

Leader-Follower Neural Networks with Local Error Signals Inspired by Complex Collectives

Authors: Chenzhong Yin, Mingxi Cheng, Xiongye Xiao, Xinghe Chen, Shahin Nazarian, Andrei Irimia, Paul Bogdan

Abstract: The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivat… ▽ More The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivated by the intricacy of these collectives, we propose a neural network (NN) architecture inspired by the rules observed in nature's collective ensembles. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals and optionally incorporating backpropagation (BP) and global loss. We investigate worker behavior and evaluate LFNNs through extensive experimentation. Our LFNNs trained with local error signals achieve significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10 and even surpass BP-enabled baselines. In the case of ImageNet, our LFNN-l demonstrates superior scalability and outperforms previous BP-free algorithms by a significant margin. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.05414 [pdf, other]

doi 10.1016/j.autcon.2024.105369

Ethics of Artificial Intelligence and Robotics in the Architecture, Engineering, and Construction Industry

Authors: Ci-Jyun Liang, Thai-Hoa Le, Youngjib Ham, Bharadwaj R. K. Mantha, Marvin H. Cheng, Jacob J. Lin

Abstract: Artificial intelligence (AI) and robotics research and implementation emerged in the architecture, engineering, and construction (AEC) industry to positively impact project efficiency and effectiveness concerns such as safety, productivity, and quality. This shift, however, warrants the need for ethical considerations of AI and robotics adoption due to its potential negative impacts on aspects suc… ▽ More Artificial intelligence (AI) and robotics research and implementation emerged in the architecture, engineering, and construction (AEC) industry to positively impact project efficiency and effectiveness concerns such as safety, productivity, and quality. This shift, however, warrants the need for ethical considerations of AI and robotics adoption due to its potential negative impacts on aspects such as job security, safety, and privacy. Nevertheless, this did not receive sufficient attention, particularly within the academic community. This research systematically reviews AI and robotics research through the lens of ethics in the AEC community for the past five years. It identifies nine key ethical issues namely job loss, data privacy, data security, data transparency, decision-making conflict, acceptance and trust, reliability and safety, fear of surveillance, and liability, by summarizing existing literature and filtering it further based on its AEC relevance. Furthermore, thirteen research topics along the process were identified based on existing AEC studies that had direct relevance to the theme of ethics in general and their parallels are further discussed. Finally, the current challenges and knowledge gaps are discussed and seven specific future research directions are recommended. This study not only signifies more stakeholder awareness of this important topic but also provides imminent steps towards safer and more efficient realization. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 109 pages, 5 figures, submitted to Automation in Construction

arXiv:2310.05108 [pdf, other]

Enhancing Representations through Heterogeneous Self-Supervised Learning

Authors: Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng

Abstract: Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn… ▽ More Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection. Our source code will be made publicly available. △ Less

Submitted 23 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.05026 [pdf, other]

Low-Resolution Self-Attention for Semantic Segmentation

Authors: Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

Abstract: Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high-resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution S… ▽ More Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high-resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost, i.e., FLOPs. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on the ADE20K, COCO-Stuff, and Cityscapes datasets demonstrate that LRFormer outperforms state-of-the-art models. he code is available at https://github.com/yuhuan-wu/LRFormer. △ Less

Submitted 22 January, 2025; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: added many experiments. 13 pages, 12 tables, 6 figures

arXiv:2310.01875 [pdf, other]

Towards Stable Backdoor Purification through Feature Shift Tuning

Authors: Rui Min, Zeyu Qin, Li Shen, Minhao Cheng

Abstract: It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architectu… ▽ More It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification. To address this, we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification. Specifically, FST encourages feature shifts by actively deviating the classifier weights from the originally compromised weights. Extensive experiments demonstrate that our FST provides consistently stable performance under different attack settings. Without complex parameter adjustments, FST also achieves much lower tuning costs, only 10 epochs. Our codes are available at https://github.com/AISafety-HKUST/stable_backdoor_purification. △ Less

Submitted 21 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 paper. The first two authors contributed equally

arXiv:2310.00854 [pdf, other]

Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

Authors: Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu

Abstract: Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating dev… ▽ More Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of a multi-core CPU based on a defined set of states and their transitions. We compare the performances of a dynamic RC thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the COMBS benchmark suite to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature of 29.01%. Using a set of 8 benchmarks, the comparison of the two algorithms shows a decrease of 29.57% in the peak spatial variance of the chip temperature and 26.26% in the peak chip temperature. We also identify several potential future research directions. △ Less

Submitted 6 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: This version includes revisions to the previous version to improve the clarity and presentation of the work

arXiv:2309.15877

Neuro-Inspired Hierarchical Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all… ▽ More Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. △ Less

Submitted 23 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: I am requesting the withdrawal of this submission due to an inadvertent duplication. The paper was submitted twice under different IDs, which was not intentional. The other submission (arXiv:2404.09403) contains the most updated and comprehensive version of the paper, and I would like to retain that as the sole version on the platform

arXiv:2309.15084 [pdf, other]

The Surveillance AI Pipeline

Authors: Pratyusha Ria Kalluri, William Agnew, Myra Cheng, Kentrell Owens, Luca Soldaini, Abeba Birhane

Abstract: A rapidly growing number of voices argue that AI research, and computer vision in particular, is powering mass surveillance. Yet the direct path from computer vision research to surveillance has remained obscured and difficult to assess. Here, we reveal the Surveillance AI pipeline by analyzing three decades of computer vision research papers and downstream patents, more than 40,000 documents. We… ▽ More A rapidly growing number of voices argue that AI research, and computer vision in particular, is powering mass surveillance. Yet the direct path from computer vision research to surveillance has remained obscured and difficult to assess. Here, we reveal the Surveillance AI pipeline by analyzing three decades of computer vision research papers and downstream patents, more than 40,000 documents. We find the large majority of annotated computer vision papers and patents self-report their technology enables extracting data about humans. Moreover, the majority of these technologies specifically enable extracting data about human bodies and body parts. We present both quantitative and rich qualitative analysis illuminating these practices of human data extraction. Studying the roots of this pipeline, we find that institutions that prolifically produce computer vision research, namely elite universities and "big tech" corporations, are subsequently cited in thousands of surveillance patents. Further, we find consistent evidence against the narrative that only these few rogue entities are contributing to surveillance. Rather, we expose the fieldwide norm that when an institution, nation, or subfield authors computer vision papers with downstream patents, the majority of these papers are used in surveillance patents. In total, we find the number of papers with downstream surveillance patents increased more than five-fold between the 1990s and the 2010s, with computer vision research now having been used in more than 11,000 surveillance patents. Finally, in addition to the high levels of surveillance we find documented in computer vision papers and patents, we unearth pervasive patterns of documents using language that obfuscates the extent of surveillance. Our analysis reveals the pipeline by which computer vision research has powered the ongoing expansion of surveillance. △ Less

Submitted 17 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.14415 [pdf, other]

Initial mass function variability from the integrated light of diverse stellar systems

Authors: Chloe M. Cheng, Alexa Villaume, Michael L. Balogh, Jean P. Brodie, Ignacio Martín-Navarro, Aaron J. Romanowsky, Pieter G. van Dokkum

Abstract: We present a uniform analysis of the stellar initial mass function (IMF) from integrated light spectroscopy of 15 compact stellar systems (11 globular clusters in M31 and 4 ultra compact dwarfs in the Virgo cluster, UCDs) and two brightest Coma cluster galaxies (BCGs), covering a wide range of metallicities ($-$1.7 $<$ [Fe/H] $<$ 0.01) and velocity dispersions (7.4 km~s$^{-1}$ $< σ<$ 275 km~s… ▽ More We present a uniform analysis of the stellar initial mass function (IMF) from integrated light spectroscopy of 15 compact stellar systems (11 globular clusters in M31 and 4 ultra compact dwarfs in the Virgo cluster, UCDs) and two brightest Coma cluster galaxies (BCGs), covering a wide range of metallicities ($-$1.7 $<$ [Fe/H] $<$ 0.01) and velocity dispersions (7.4 km~s$^{-1}$ $< σ<$ 275 km~s$^{-1}$). The S/N $\sim 100$ Å$^{-1}$ Keck LRIS spectra are fitted over the range $4000<λ/\mboxÅ<10,000$ with flexible, full-spectrum stellar population synthesis models. We use the models to fit simultaneously for ages, metallicities, and individual elemental abundances of the population, allowing us to decouple abundance variations from variations in IMF slope. We show that compact stellar systems do not follow the same trends with physical parameters that have been found for early-type galaxies. Most globular clusters in our sample have an IMF consistent with that of the Milky Way, over a wide range of [Fe/H] and [Mg/Fe]. There is more diversity among the UCDs, with some showing evidence for a bottom-heavy IMF, but with no clear correlation with metallicity, abundance, or velocity dispersion. The two Coma BCGs have similar velocity dispersion and metallicity, but we find the IMF of NGC~4874 is consistent with that of the Milky Way while NGC~4889 presents evidence for a significantly bottom-heavy IMF. For this sample, the IMF appears to vary between objects in a way that is not explained by a single metallicity-dependent prescription. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted for publication in MNRAS

Report number: MN-23-2545-MJ

Showing 201–250 of 857 results for author: Cheng, M