Search | arXiv e-print repository

arXiv:2507.02057 [pdf, ps, other]

MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation

Authors: Lu Yan, Zhuo Zhang, Xiangzhe Xu, Shengwei An, Guangyu Shen, Zhou Xuan, Xuan Chen, Xiangyu Zhang

Abstract: Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individ… ▽ More Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individual prompts in isolation, overlooking a critical vulnerability: malicious operations can be systematically decomposed into benign-appearing sub-tasks. In this paper, we introduce the Malware Generation Compiler (MGC), a novel framework that leverages this vulnerability through modular decomposition and alignment-evasive generation. MGC employs a specialized Malware Description Intermediate Representation (MDIR) to bridge high-level malicious intents and benign-appearing code snippets. Extensive evaluation demonstrates that our attack reliably generates functional malware across diverse task specifications and categories, outperforming jailbreaking methods by +365.79% and underground services by +78.07% in correctness on three benchmark datasets. Case studies further show that MGC can reproduce and even enhance 16 real-world malware samples. This work provides critical insights for security researchers by exposing the risks of compositional attacks against aligned AI systems. Demonstrations are available at https://sites.google.com/view/malware-generation-compiler. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2506.10424 [pdf, ps, other]

SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

Authors: Kaiyuan Zhang, Siyuan Cheng, Hanxi Guo, Yuetian Chen, Zian Su, Shengwei An, Yuntao Du, Charles Fleming, Ashish Kundu, Xiangyu Zhang, Ninghui Li

Abstract: Large language models (LLMs) have achieved remarkable success and are widely adopted for diverse applications. However, fine-tuning these models often involves private or sensitive information, raising critical privacy concerns. In this work, we conduct the first comprehensive study evaluating the vulnerability of fine-tuned LLMs to membership inference attacks (MIAs). Our empirical analysis demon… ▽ More Large language models (LLMs) have achieved remarkable success and are widely adopted for diverse applications. However, fine-tuning these models often involves private or sensitive information, raising critical privacy concerns. In this work, we conduct the first comprehensive study evaluating the vulnerability of fine-tuned LLMs to membership inference attacks (MIAs). Our empirical analysis demonstrates that MIAs exploit the loss reduction during fine-tuning, making them highly effective in revealing membership information. These findings motivate the development of our defense. We propose SOFT (\textbf{S}elective data \textbf{O}bfuscation in LLM \textbf{F}ine-\textbf{T}uning), a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection. Our extensive experiments span six diverse domains and multiple LLM architectures and scales. Results show that SOFT effectively reduces privacy risks while maintaining competitive model performance, offering a practical and scalable solution to safeguard sensitive information in fine-tuned LLMs. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: Accepted by the 34th USENIX Security Symposium 2025. Code is available at https://github.com/KaiyuanZh/SOFT

arXiv:2506.07025 [pdf, ps, other]

Soliton eigenvalue control by interaction of circularly polarized lights in a nonlinear fiber

Authors: Peng Gao, Xiaofang Wang, Sha An, Kai Wen, Juanjuan Zheng, Tanping Li, Peng Gao

Abstract: We propose a physical method for controlling soliton eigenvalues in optical fibers, which is realized through the interaction between circularly polarized lights. Using this method, we not only achieve the decomposition of high-order solitons (HOSs) with different orders, but also realize physical processes of reconstructing HOSs for the first time. Compared with existing methods, our approach ens… ▽ More We propose a physical method for controlling soliton eigenvalues in optical fibers, which is realized through the interaction between circularly polarized lights. Using this method, we not only achieve the decomposition of high-order solitons (HOSs) with different orders, but also realize physical processes of reconstructing HOSs for the first time. Compared with existing methods, our approach ensures accurate measurement of the discrete eigenvalues of HOSs while exhibiting higher decomposition efficiency. It is worth noting that the probe soliton, which induces these phenomena, plays a key role. The requirement for a moderate steepness of the probe suggests the presence of an uncertainty principle in the measurement of soliton eigenvalues, similar to the detection of microscopic particles. Our results can deepen the understanding of microscopic properties of solitons and their interaction mechanisms, and moreover provide a promising all-optical solution for the design of eigenvalue-based multiplexers and demultiplexers. △ Less

Submitted 8 June, 2025; originally announced June 2025.

Comments: 7 pages, 4 figures

arXiv:2506.03195 [pdf, ps, other]

Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs

Authors: Yunqi Hong, Sohyun An, Andrew Bai, Neil Y. C. Lin, Cho-Jui Hsieh

Abstract: Despite Multimodal Large Language Models (MLLMs) showing promising results on general zero-shot image classification tasks, fine-grained image classification remains challenging. It demands precise attention to subtle visual details to distinguish between visually similar subcategories--details that MLLMs may easily overlook without explicit guidance. To address this, we introduce AutoSEP, an iter… ▽ More Despite Multimodal Large Language Models (MLLMs) showing promising results on general zero-shot image classification tasks, fine-grained image classification remains challenging. It demands precise attention to subtle visual details to distinguish between visually similar subcategories--details that MLLMs may easily overlook without explicit guidance. To address this, we introduce AutoSEP, an iterative self-supervised prompt learning framework designed to enhance MLLM fine-grained classification capabilities in a fully unsupervised manner. Our core idea is to leverage unlabeled data to learn a description prompt that guides MLLMs in identifying crucial discriminative features within an image, and boosts classification accuracy. We developed an automatic self-enhancing prompt learning framework called AutoSEP to iteratively improve the description prompt using unlabeled data, based on instance-level classification scoring function. AutoSEP only requires black-box access to MLLMs, eliminating the need for any training or fine-tuning. We evaluate our approach on multiple fine-grained classification datasets. It consistently outperforms other unsupervised baselines, demonstrating the effectiveness of our self-supervised optimization framework. Notably, AutoSEP on average improves 13 percent over standard zero-shot classification and 5 percent over the best-performing baselines. Code is available at: https://github.com/yq-hong/AutoSEP △ Less

Submitted 1 June, 2025; originally announced June 2025.

arXiv:2505.21765 [pdf, ps, other]

Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Authors: Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh

Abstract: While recent success of large reasoning models (LRMs) significantly advanced LLMs' reasoning capability by optimizing the final answer accuracy using reinforcement learning, they may also drastically increase the output length due to overthinking, characterized by unnecessarily complex reasoning paths that waste computation and potentially degrade the performance. We hypothesize that such ineffici… ▽ More While recent success of large reasoning models (LRMs) significantly advanced LLMs' reasoning capability by optimizing the final answer accuracy using reinforcement learning, they may also drastically increase the output length due to overthinking, characterized by unnecessarily complex reasoning paths that waste computation and potentially degrade the performance. We hypothesize that such inefficiencies stem from LRMs' limited capability to dynamically select the proper modular reasoning strategies, termed thinking patterns at the right position. To investigate this hypothesis, we propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns, systematically identifying and promoting beneficial patterns that improve the answer while removing detrimental ones. Empirical analysis confirms that our optimized thinking paths yield more concise yet sufficiently informative trajectories, enhancing reasoning efficiency by reducing attention FLOPs by up to 47% while maintaining accuracy for originally correct responses. Moreover, a non-trivial portion of originally incorrect responses are transformed into correct ones, achieving a 15.6% accuracy improvement with reduced length. Motivated by the improvement brought by the optimized thinking paths, we apply a preference optimization technique supported by a pairwise dataset contrasting suboptimal and optimal reasoning paths. Experimental evaluations across multiple mathematical reasoning benchmarks reveal that our method notably reduces computational overhead while simultaneously improving reasoning accuracy, achieving up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Work In Progress

arXiv:2505.11769 [pdf, ps, other]

Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Boosting Off-Road Segmentation via Photometric Distortion and Exponential Moving Average

Authors: Wonjune Kim, Lae-kyoung Lee, Su-Yong An

Abstract: We report on the application of a high-capacity semantic segmentation pipeline to the GOOSE 2D Semantic Segmentation Challenge for unstructured off-road environments. Using a FlashInternImage-B backbone together with a UPerNet decoder, we adapt established techniques, rather than designing new ones, to the distinctive conditions of off-road scenes. Our training recipe couples strong photometric di… ▽ More We report on the application of a high-capacity semantic segmentation pipeline to the GOOSE 2D Semantic Segmentation Challenge for unstructured off-road environments. Using a FlashInternImage-B backbone together with a UPerNet decoder, we adapt established techniques, rather than designing new ones, to the distinctive conditions of off-road scenes. Our training recipe couples strong photometric distortion augmentation (to emulate the wide lighting variations of outdoor terrain) with an Exponential Moving Average (EMA) of weights for better generalization. Using only the GOOSE training dataset, we achieve 88.8\% mIoU on the validation set. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: Winners of the GOOSE 2D Semantic Segmentation Challenge at the IEEE ICRA Workshop on Field Robotics 2025

arXiv:2504.15431 [pdf, other]

Trillion 7B Technical Report

Authors: Sungjun Han, Juyoung Suk, Suyeong An, Hyungguk Kim, Kyuseok Kim, Wonsuk Yang, Seungtaek Choi, Jamin Shin

Abstract: We introduce Trillion-7B, the most token-efficient Korean-centric multilingual LLM available. Our novel Cross-lingual Document Attention (XLDA) mechanism enables highly efficient and effective knowledge transfer from English to target languages like Korean and Japanese. Combined with optimized data mixtures, language-specific filtering, and tailored tokenizer construction, Trillion-7B achieves com… ▽ More We introduce Trillion-7B, the most token-efficient Korean-centric multilingual LLM available. Our novel Cross-lingual Document Attention (XLDA) mechanism enables highly efficient and effective knowledge transfer from English to target languages like Korean and Japanese. Combined with optimized data mixtures, language-specific filtering, and tailored tokenizer construction, Trillion-7B achieves competitive performance while dedicating only 10\% of its 2T training tokens to multilingual data and requiring just 59.4K H100 GPU hours (\$148K) for full training. Comprehensive evaluations across 27 benchmarks in four languages demonstrate Trillion-7B's robust multilingual performance and exceptional cross-lingual consistency. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: Preview version

arXiv:2504.06534 [pdf, other]

Single-Source Shortest Path Problem in Weighted Disk Graphs

Authors: Shinwoo An, Eunjin Oh, Jie Xue

Abstract: In this paper, we present efficient algorithms for the single-source shortest path problem in weighted disk graphs. A disk graph is the intersection graph of a family of disks in the plane. Here, the weight of an edge is defined as the Euclidean distance between the centers of the disks corresponding to the endpoints of the edge. Given a family of $n$ disks in the plane whose radii lie in $[1,Ψ]$… ▽ More In this paper, we present efficient algorithms for the single-source shortest path problem in weighted disk graphs. A disk graph is the intersection graph of a family of disks in the plane. Here, the weight of an edge is defined as the Euclidean distance between the centers of the disks corresponding to the endpoints of the edge. Given a family of $n$ disks in the plane whose radii lie in $[1,Ψ]$ and a source disk, we can compute a shortest path tree from a source vertex in the weighted disk graph in $O(n\log^2 n \log Ψ)$ time. Moreover, in the case that the radii of disks are arbitrarily large, we can compute a shortest path tree from a source vertex in the weighted disk graph in $O(n\log^4 n)$ time. This improves the best-known algorithm running in $O(n\log^6 n)$ time presented in ESA'23. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: In SoCG'25

arXiv:2504.03515 [pdf, other]

Dexterous Manipulation through Imitation Learning: A Survey

Authors: Shan An, Ziyu Meng, Chao Tang, Yuning Zhou, Tengyu Liu, Fangqiang Ding, Shufang Zhang, Yao Mu, Ran Song, Wei Zhang, Zeng-Guang Hou, Hong Zhang

Abstract: Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to op… ▽ More Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning, details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain. △ Less

Submitted 17 May, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

Comments: 22pages, 5 figures

arXiv:2503.06863 [pdf, other]

HIF: Height Interval Filtering for Efficient Dynamic Points Removal

Authors: Shufang Zhang, Tao Jiang, Jiazheng Wu, Ziyu Meng, Ziyang Zhang, Shan An

Abstract: 3D point cloud mapping plays a essential role in localization and autonomous navigation. However, dynamic objects often leave residual traces during the map construction process, which undermine the performance of subsequent tasks. Therefore, dynamic object removal has become a critical challenge in point cloud based map construction within dynamic scenarios. Existing approaches, however, often in… ▽ More 3D point cloud mapping plays a essential role in localization and autonomous navigation. However, dynamic objects often leave residual traces during the map construction process, which undermine the performance of subsequent tasks. Therefore, dynamic object removal has become a critical challenge in point cloud based map construction within dynamic scenarios. Existing approaches, however, often incur significant computational overhead, making it difficult to meet the real-time processing requirements. To address this issue, we introduce the Height Interval Filtering (HIF) method. This approach constructs pillar-based height interval representations to probabilistically model the vertical dimension, with interval probabilities updated through Bayesian inference. It ensures real-time performance while achieving high accuracy and improving robustness in complex environments. Additionally, we propose a low-height preservation strategy that enhances the detection of unknown spaces, reducing misclassification in areas blocked by obstacles (occluded regions). Experiments on public datasets demonstrate that HIF delivers a 7.7 times improvement in time efficiency with comparable accuracy to existing SOTA methods. The code will be publicly available. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2503.05995 [pdf, other]

ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features

Authors: Shan An, Shipeng Dai, Mahrukh Ansari, Yu Liang, Ming Zeng, Konstantinos A. Tsintotas, Changhong Fu, Hong Zhang

Abstract: Accurate hand pose estimation is vital in robotics, advancing dexterous manipulation in human-computer interaction. Toward this goal, this paper presents ReJSHand (which stands for Refined Joint and Skeleton Features), a cutting-edge network formulated for real-time hand pose estimation and mesh reconstruction. The proposed framework is designed to accurately predict 3D hand gestures under real-ti… ▽ More Accurate hand pose estimation is vital in robotics, advancing dexterous manipulation in human-computer interaction. Toward this goal, this paper presents ReJSHand (which stands for Refined Joint and Skeleton Features), a cutting-edge network formulated for real-time hand pose estimation and mesh reconstruction. The proposed framework is designed to accurately predict 3D hand gestures under real-time constraints, which is essential for systems that demand agile and responsive hand motion tracking. The network's design prioritizes computational efficiency without compromising accuracy, a prerequisite for instantaneous robotic interactions. Specifically, ReJSHand comprises a 2D keypoint generator, a 3D keypoint generator, an expansion block, and a feature interaction block for meticulously reconstructing 3D hand poses from 2D imagery. In addition, the multi-head self-attention mechanism and a coordinate attention layer enhance feature representation, streamlining the creation of hand mesh vertices through sophisticated feature mapping and linear transformation. Regarding performance, comprehensive evaluations on the FreiHand dataset demonstrate ReJSHand's computational prowess. It achieves a frame rate of 72 frames per second while maintaining a PA-MPJPE (Position-Accurate Mean Per Joint Position Error) of 6.3 mm and a PA-MPVPE (Position-Accurate Mean Per Vertex Position Error) of 6.4 mm. Moreover, our model reaches scores of 0.756 for F@05 and 0.984 for F@15, surpassing modern pipelines and solidifying its position at the forefront of robotic hand pose estimators. To facilitate future studies, we provide our source code at ~\url{https://github.com/daishipeng/ReJSHand}. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2503.05341 [pdf, other]

Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal

Authors: Abu Bakkar Miah, Dhananjaya Mahapatra, Soumik Aon, Harekrishna Bhunia, Partha Mitra

Abstract: Harmonic Hall measurements were conducted on a series of Ferromagnetic metal/Heavy metal (FM/HM) bilayers with beta-Tungsten (W) as the HM and in-plane magnetized permalloy (Py) as the FM and the efficiencies of the two orthogonal components of the spin orbit-torque were extracted. Two sets of Hall bar-shaped devices were considered where the HM resistivity systematically varied over a wide range… ▽ More Harmonic Hall measurements were conducted on a series of Ferromagnetic metal/Heavy metal (FM/HM) bilayers with beta-Tungsten (W) as the HM and in-plane magnetized permalloy (Py) as the FM and the efficiencies of the two orthogonal components of the spin orbit-torque were extracted. Two sets of Hall bar-shaped devices were considered where the HM resistivity systematically varied over a wide range (sim150-1000 muOmega-cm) while the FM layer remained the same and each set having a different aspect ratio of voltage pickup line width and Hall bar width. Using numerical simulations of current distribution at the region between voltage pickup lines we have normalised the SOT efficiencies and examined their dependence. The current-induced spin-orbit torque efficiency in ferromagnetic metal (FM)/heavy metal (HM) bilayers is quantitatively investigated in this study.beta-W, known for its high spin-orbit coupling, served as the HM layer, while Py, an FM with an in-plane magnetic anisotropy, comprised the other layer. We performed a thorough analysis of the second harmonic Hall resistance (R_{xy}^{2ω}) obtained from Py/beta-W bilayer devices, systematically varying the resistivity (rho_W) of the beta-W layer within the range of 200 to 1000 μΩ-cm by employing a fixed current density (J_W\sim0.8\times10^{11} A/m^2) through beta-W. Through this analysis, we derived the Slonczewski-like efficiency (xi_{SL}) and field-like efficiency (ξ_{FL}) as a function of rho_W. Notably, the device with a resistivity of 980 muOmega-cm exhibited the highest xi_{SL}, yielding a value of -0.42 0.09. These results highlight the promising potential of highly resistiv beta-W as a material of interest in spintronics research. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2503.05202 [pdf, other]

Bridging between reheating and late-time observations in quintessential inflation

Authors: Ok Song An, Jin U Kang, Yong Jin Kim, Ui Ri Mun

Abstract: We propose an idea to build a bridge between reheating and late-time observations in quintessential inflation by backtracking the evolution of the inflaton field from the present time to the end of reheating. This idea is implemented when the potential gradient is negligible compared to the Hubble friction, rendering the inflaton field frozen, till the present time. We find a simple analytic relat… ▽ More We propose an idea to build a bridge between reheating and late-time observations in quintessential inflation by backtracking the evolution of the inflaton field from the present time to the end of reheating. This idea is implemented when the potential gradient is negligible compared to the Hubble friction, rendering the inflaton field frozen, till the present time. We find a simple analytic relation between the reheating temperature and the observational parameters for dark energy, and numerically confirm its validity for typical models of quintessential inflation. This relation is universal and can apply to all quintessential inflation models with any reheating mechanism. It also implies that any quintessential inflation model with a successful reheating with the reheating temperature $1\textrm{MeV}\lesssim T_\textrm{re}\lesssim 10^{15}\textrm{GeV}$ predicts the equation of state of dark energy today extremely close to $-1$, i.e. $-1+10^{-60}\lesssim w_0\lesssim -1+10^{-24}$, unless the inflaton field unfreezes before the present time. △ Less

Submitted 7 March, 2025; originally announced March 2025.

Comments: 27 pages, 10 figures

arXiv:2503.05117 [pdf, other]

HyperGraph ROS: An Open-Source Robot Operating System for Hybrid Parallel Computing based on Computational HyperGraph

Authors: Shufang Zhang, Jiazheng Wu, Jiacheng He, Kaiyi Wang, Shan An

Abstract: This paper presents HyperGraph ROS, an open-source robot operating system that unifies intra-process, inter-process, and cross-device computation into a computational hypergraph for efficient message passing and parallel execution. In order to optimize communication, HyperGraph ROS dynamically selects the optimal communication mechanism while maintaining a consistent API. For intra-process message… ▽ More This paper presents HyperGraph ROS, an open-source robot operating system that unifies intra-process, inter-process, and cross-device computation into a computational hypergraph for efficient message passing and parallel execution. In order to optimize communication, HyperGraph ROS dynamically selects the optimal communication mechanism while maintaining a consistent API. For intra-process messages, Intel-TBB Flow Graph is used with C++ pointer passing, which ensures zero memory copying and instant delivery. Meanwhile, inter-process and cross-device communication seamlessly switch to ZeroMQ. When a node receives a message from any source, it is immediately activated and scheduled for parallel execution by Intel-TBB. The computational hypergraph consists of nodes represented by TBB flow graph nodes and edges formed by TBB pointer-based connections for intra-process communication, as well as ZeroMQ links for inter-process and cross-device communication. This structure enables seamless distributed parallelism. Additionally, HyperGraph ROS provides ROS-like utilities such as a parameter server, a coordinate transformation tree, and visualization tools. Evaluation in diverse robotic scenarios demonstrates significantly higher transmission and throughput efficiency compared to ROS 2. Our work is available at https://github.com/wujiazheng2020a/hyper_graph_ros. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2502.11387 [pdf, other]

RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

Authors: Junru Lu, Jiazheng Li, Guodong Shen, Lin Gui, Siyu An, Yulan He, Di Yin, Xing Sun

Abstract: Role-playing is important for Large Language Models (LLMs) to follow diverse instructions while maintaining role identity and the role's pre-defined ability limits. Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries, but overlook role-playing in instruction-following scenarios. We introduce a fine-grained role-playing and instruction-following compo… ▽ More Role-playing is important for Large Language Models (LLMs) to follow diverse instructions while maintaining role identity and the role's pre-defined ability limits. Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries, but overlook role-playing in instruction-following scenarios. We introduce a fine-grained role-playing and instruction-following composite benchmark, named RoleMRC, including: (1) Multi-turn dialogues between ideal roles and humans, including free chats or discussions upon given passages; (2) Role-playing machine reading comprehension, involving response, refusal, and attempts according to passage answerability and role ability; (3) More complex scenarios with nested, multi-turn and prioritized instructions. The final RoleMRC features a 10.2k role profile meta-pool, 37.9k well-synthesized role-playing instructions, and 1.4k testing samples. We develop a pipeline to quantitatively evaluate the fine-grained role-playing and instruction-following capabilities of several mainstream LLMs, as well as models that are fine-tuned on our data. Moreover, cross-evaluation on external role-playing datasets confirms that models fine-tuned on RoleMRC enhances instruction-following without compromising general role-playing and reasoning capabilities. We also probe the neural-level activation maps of different capabilities over post-tuned LLMs. Access to our RoleMRC, RoleMRC-mix and Codes: https://github.com/LuJunru/RoleMRC. △ Less

Submitted 16 February, 2025; originally announced February 2025.

arXiv:2502.06139 [pdf, other]

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

Authors: Sumin An, Junyoung Sung, Wonpyo Park, Chanjun Park, Paul Hongsuck Seo

Abstract: While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Inje… ▽ More While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit through recurrent compression without retraining the entire model. We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM's ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance. △ Less

Submitted 22 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

Comments: Accepted to NAACL 2025. Project Page: https://ssuminan.github.io/LCIRC/

arXiv:2501.15718 [pdf, other]

doi 10.14722/ndss.2025.230915

CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

Authors: Kaiyuan Zhang, Siyuan Cheng, Guangyu Shen, Bruno Ribeiro, Shengwei An, Pin-Yu Chen, Xiangyu Zhang, Ninghui Li

Abstract: Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client's private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private trai… ▽ More Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client's private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client's gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io. △ Less

Submitted 26 January, 2025; originally announced January 2025.

Comments: Accepted by 32nd Annual Network and Distributed System Security Symposium (NDSS 2025). Code is available at https://censor-gradient.github.io

arXiv:2501.14185 [pdf, other]

Tensor-Based Binary Graph Encoding for Variational Quantum Classifiers

Authors: Shiwen An, Konstantinos Slavakis

Abstract: Quantum computing has been a prominent research area for decades, inspiring transformative fields such as quantum simulation, quantum teleportation, and quantum machine learning (QML), which are undergoing rapid development. Within QML, hybrid classical-quantum algorithms like Quantum Neural Networks (QNNs) and Variational Quantum Classifiers (VQCs) have shown promise in leveraging quantum circuit… ▽ More Quantum computing has been a prominent research area for decades, inspiring transformative fields such as quantum simulation, quantum teleportation, and quantum machine learning (QML), which are undergoing rapid development. Within QML, hybrid classical-quantum algorithms like Quantum Neural Networks (QNNs) and Variational Quantum Classifiers (VQCs) have shown promise in leveraging quantum circuits and classical optimizers to classify classical data efficiently.Simultaneously, classical machine learning has made significant strides in graph classification, employing Graph Neural Networks (GNNs) to analyze systems ranging from large-scale structures like the Large Hadron Collider to molecular and biological systems like proteins and DNA. Combining the advancements in quantum computing and graph classification presents a unique opportunity to develop quantum algorithms capable of extracting features from graphs and performing their classification effectively. In this paper, we propose a novel quantum encoding framework for graph classification using VQCs. Unlike existing approaches such as PCA-VQC, which rely on dimensionality reduction techniques like Principal Component Analysis (PCA) and may lead to information loss, our method preserves the integrity of graph data. Furthermore, our encoding approach is optimized for Noise-Intermediate Scale Quantum (NISQ) devices, requiring a limited number of qubits while achieving comparable or superior classification performance to PCA-VQC. By constructing slightly more complex circuits tailored for graph encoding, we demonstrate that VQCs can effectively classify graphs within the constraints of current quantum hardware. △ Less

Submitted 30 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.09985 [pdf, ps, other]

Fully viable DHOST bounce with extra scalar

Authors: Ok Song An, Jin U Kang, Yong Jin Kim, Ui Ri Mun, Un Gyong Ri

Abstract: In this paper we construct a class of Degenerate Higher-Order Scalar-Tensor (DHOST) theories with an extra scalar field, which admits viable solutions of bouncing universe satisfying the following requirements: (i) absence of Belinski-Khalatnikov-Lifshitz (BKL) instability, ghost and gradient instability, (ii) absence of superluminality, (iii) generation of nearly scale-invariant curvature perturb… ▽ More In this paper we construct a class of Degenerate Higher-Order Scalar-Tensor (DHOST) theories with an extra scalar field, which admits viable solutions of bouncing universe satisfying the following requirements: (i) absence of Belinski-Khalatnikov-Lifshitz (BKL) instability, ghost and gradient instability, (ii) absence of superluminality, (iii) generation of nearly scale-invariant curvature perturbations and very small tensor-to-scalar ratio, and (iv) conventional asymptotics in the distant past and future, where gravity sector is described by General Relativity and the DHOST scalar has a canonical form of Lagrangian. We also expect our models to have sufficiently small non-Gaussianities of primordial curvature perturbations to be compatible with observations. As such, this work exemplifies for the first time the fully viable two-field DHOST bouncing cosmology, which is free of instability and superluminality problems as well as compatible with observations. △ Less

Submitted 13 April, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

Comments: 28 pages, two appendices, 12 figures

arXiv:2412.19031 [pdf, other]

Repository Structure-Aware Training Makes SLMs Better Issue Resolver

Authors: Zexiong Ma, Shengnan An, Zeqi Lin, Yanzhen Zou, Bing Xie

Abstract: Language models have been applied to various software development tasks, but the performance varies according to the scale of the models. Large Language Models (LLMs) outperform Small Language Models (SLMs) in complex tasks like repository-level issue resolving, but raise concerns about privacy and cost. In contrast, SLMs are more accessible but under-perform in complex tasks. In this paper, we in… ▽ More Language models have been applied to various software development tasks, but the performance varies according to the scale of the models. Large Language Models (LLMs) outperform Small Language Models (SLMs) in complex tasks like repository-level issue resolving, but raise concerns about privacy and cost. In contrast, SLMs are more accessible but under-perform in complex tasks. In this paper, we introduce ReSAT (Repository Structure-Aware Training), construct training data based on a large number of issues and corresponding pull requests from open-source communities to enhance the model's understanding of repository structure and issue resolving ability. We construct two types of training data: (1) localization training data, a multi-level progressive localization data to improve code understanding and localization capability; (2) code edit training data, which improves context-based code editing capability. The evaluation results on SWE-Bench-verified and RepoQA demonstrate that ReSAT effectively enhances SLMs' issue-resolving and repository-level long-context understanding capabilities. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2412.14905 [pdf, other]

Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation

Authors: Zexiong Ma, Shengnan An, Zeqi Lin, Yanzhen Zou, Jian-Guang Lou, Bing Xie

Abstract: Large language models (LLMs) are susceptible to generating hallucinated information, despite the integration of retrieval-augmented generation (RAG). Parallel context extension (PCE) is a line of research attempting to effectively integrating parallel (unordered) contexts, while it still suffers from hallucinations when adapted to RAG scenarios. In this paper, we propose DePaC (Dehallucinating Par… ▽ More Large language models (LLMs) are susceptible to generating hallucinated information, despite the integration of retrieval-augmented generation (RAG). Parallel context extension (PCE) is a line of research attempting to effectively integrating parallel (unordered) contexts, while it still suffers from hallucinations when adapted to RAG scenarios. In this paper, we propose DePaC (Dehallucinating Parallel Context Extension), which alleviates the hallucination problem with context-aware negative training and information-calibrated aggregation. DePaC is designed to alleviate two types of in-context hallucination: fact fabrication (i.e., LLMs present claims that are not supported by the contexts) and fact omission (i.e., LLMs fail to present claims that can be supported by the contexts). Specifically, (1) for fact fabrication, we apply the context-aware negative training that fine-tunes the LLMs with negative supervisions, thus explicitly guiding the LLMs to refuse to answer when contexts are not related to questions; (2) for fact omission, we propose the information-calibrated aggregation which prioritizes context windows with higher information increment from their contexts. The experimental results on nine RAG tasks demonstrate that DePaC significantly alleviates the two types of hallucination and consistently achieves better performances on these tasks. △ Less

Submitted 19 December, 2024; originally announced December 2024.

arXiv:2412.11787 [pdf, other]

A Method for Detecting Legal Article Competition for Korean Criminal Law Using a Case-augmented Mention Graph

Authors: Seonho An, Young Yik Rhim, Min-Soo Kim

Abstract: As social systems become increasingly complex, legal articles are also growing more intricate, making it progressively harder for humans to identify any potential competitions among them, particularly when drafting new laws or applying existing laws. Despite this challenge, no method for detecting such competitions has been proposed so far. In this paper, we propose a new legal AI task called Lega… ▽ More As social systems become increasingly complex, legal articles are also growing more intricate, making it progressively harder for humans to identify any potential competitions among them, particularly when drafting new laws or applying existing laws. Despite this challenge, no method for detecting such competitions has been proposed so far. In this paper, we propose a new legal AI task called Legal Article Competition Detection (LACD), which aims to identify competing articles within a given law. Our novel retrieval method, CAM-Re2, outperforms existing relevant methods, reducing false positives by 20.8% and false negatives by 8.3%, while achieving a 98.2% improvement in precision@5, for the LACD task. We release our codes at https://github.com/asmath472/LACD-public. △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: under review

ACM Class: I.2.7

arXiv:2412.05825 [pdf, other]

Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation

Authors: Junha Lee, Sojung An, Sujeong You, Namik Cho

Abstract: Numerical weather prediction (NWP) models are fundamental in meteorology for simulating and forecasting the behavior of various atmospheric variables. The accuracy of precipitation forecasts and the acquisition of sufficient lead time are crucial for preventing hazardous weather events. However, the performance of NWP models is limited by the nonlinear and unpredictable patterns of extreme weather… ▽ More Numerical weather prediction (NWP) models are fundamental in meteorology for simulating and forecasting the behavior of various atmospheric variables. The accuracy of precipitation forecasts and the acquisition of sufficient lead time are crucial for preventing hazardous weather events. However, the performance of NWP models is limited by the nonlinear and unpredictable patterns of extreme weather phenomena driven by temporal dynamics. In this regard, we propose a \textbf{S}elf-\textbf{S}upervised \textbf{L}earning with \textbf{P}robabilistic \textbf{D}ensity \textbf{L}abeling (SSLPDL) for estimating rainfall probability by post-processing NWP forecasts. Our post-processing method uses self-supervised learning (SSL) with masked modeling for reconstructing atmospheric physics variables, enabling the model to learn the dependency between variables. The pre-trained encoder is then utilized in transfer learning to a precipitation segmentation task. Furthermore, we introduce a straightforward labeling approach based on probability density to address the class imbalance in extreme weather phenomena like heavy rain events. Experimental results show that SSLPDL surpasses other precipitation forecasting models in regional precipitation post-processing and demonstrates competitive performance in extending forecast lead times. Our code is available at https://github.com/joonha425/SSLPDL △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: Accepted by WACV 2025

arXiv:2412.04862 [pdf, other]

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Authors: LG AI Research, Soyoung An, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee, Honglak Lee, Jinsik Lee , et al. (8 additional authors not shown)

Abstract: This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) ou… ▽ More This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: [email protected]. △ Less

Submitted 9 December, 2024; v1 submitted 6 December, 2024; originally announced December 2024.

Comments: arXiv admin note: text overlap with arXiv:2408.03541

arXiv:2412.02051 [pdf, ps, other]

Postnikov--Stanley polynomials are Lorentzian

Authors: Serena An, Katherine Tung, Yuchong Zhang

Abstract: Postnikov--Stanley polynomials $D_u^w$ are a generalization of skew dual Schubert polynomials to the setting of arbitrary Weyl groups. We prove that Postnikov--Stanley polynomials are Lorentzian by showing that they are degree polynomials of Richardson varieties. Our result yields an interesting class of Lorentzian polynomials related to the geometry of Richardson varieties, generalizes the result… ▽ More Postnikov--Stanley polynomials $D_u^w$ are a generalization of skew dual Schubert polynomials to the setting of arbitrary Weyl groups. We prove that Postnikov--Stanley polynomials are Lorentzian by showing that they are degree polynomials of Richardson varieties. Our result yields an interesting class of Lorentzian polynomials related to the geometry of Richardson varieties, generalizes the result that dual Schubert polynomials are Lorentzian (Huh--Matherne--Mészáros--St. Dizier 2022), and resolves the conjecture that Postnikov--Stanley polynomials have M-convex support (An--Tung--Zhang 2024). △ Less

Submitted 17 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: 10 pages, 1 figure

arXiv:2412.01471 [pdf, other]

Multi-Granularity Video Object Segmentation

Authors: Sangbeom Lim, Seongchan Kim, Seungjun An, Seokju Cho, Paul Hongsuck Seo, Seungryong Kim

Abstract: Current benchmarks for video segmentation are limited to annotating only salient objects (i.e., foreground instances). Despite their impressive architectural designs, previous works trained on these benchmarks have struggled to adapt to real-world scenarios. Thus, developing a new video segmentation dataset aimed at tracking multi-granularity segmentation target in the video scene is necessary. In… ▽ More Current benchmarks for video segmentation are limited to annotating only salient objects (i.e., foreground instances). Despite their impressive architectural designs, previous works trained on these benchmarks have struggled to adapt to real-world scenarios. Thus, developing a new video segmentation dataset aimed at tracking multi-granularity segmentation target in the video scene is necessary. In this work, we aim to generate multi-granularity video segmentation dataset that is annotated for both salient and non-salient masks. To achieve this, we propose a large-scale, densely annotated multi-granularity video object segmentation (MUG-VOS) dataset that includes various types and granularities of mask annotations. We automatically collected a training set that assists in tracking both salient and non-salient objects, and we also curated a human-annotated test set for reliable evaluation. In addition, we present memory-based mask propagation model (MMPM), trained and evaluated on MUG-VOS dataset, which leads to the best performance among the existing video object segmentation methods and Segment SAM-based video segmentation methods. Project page is available at https://cvlab-kaist.github.io/MUG-VOS. △ Less

Submitted 3 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: Project Page: https://cvlab-kaist.github.io/MUG-VOS

arXiv:2411.18040 [pdf, other]

A New Rarity Assessment of the `Disk of Satellites': the Milky Way System Is the Exception Rather than the Rule in the $Λ$CDM Cosmology

Authors: Chanoul Seo, Suk-Jin Yoon, Sanjaya Paudel, Sung-Ho An, Jun-Sung Moon

Abstract: The majority of satellite galaxies around the Milky Way (MW) show disk-like distributions (the disk of satellites; DoS), which is a small-scale problem of the $Λ$CDM cosmology. The conventional definition of the MW-like DoS is a satellite system with a minor-to-major axis ratio ($c$/$a$) lower than the MW's $c$/$a$ value of 0.181. Here we question the validity of the $c$/$a$-based DoS rarity asses… ▽ More The majority of satellite galaxies around the Milky Way (MW) show disk-like distributions (the disk of satellites; DoS), which is a small-scale problem of the $Λ$CDM cosmology. The conventional definition of the MW-like DoS is a satellite system with a minor-to-major axis ratio ($c$/$a$) lower than the MW's $c$/$a$ value of 0.181. Here we question the validity of the $c$/$a$-based DoS rarity assessment and propose an alternative approach. How satellites are placed around a galaxy is dictated mainly by two factors: the distributions of satellites' orbital poles and distances from the host. Based on this premise, we construct the `satellite distribution generator' code and generate 10$^5$ `spatially and kinematically analogous systems (SKASs)' sharing these two factors. The SKAS can disclose the intrinsic, underlying $c$/$a$ probability distribution function (PDF), from which a present-day $c$/$a$ value is fortuitously determined. We find that the $c$/$a$ PDF of the MW DoS defined by 11 classical satellites is quite broad ($σ_{c/a}$$\sim$0.105), implying that a simple present-day $c$/$a$ value, combined with its highly time-variable nature, cannot fully represent the degree of flatness. Moreover, based on the intrinsic $c$/$a$ PDF, we re-evaluate the rarity of the MW DoS by comparing it with IllustrisTNG50-1 host-satellite systems and find that even with the new measure, the MW DoS remains rare (0.00$\sim$3.40%). We show that the reason behind the rareness is that both orbital poles and distances of the 11 MW satellites are far more plane-friendly than those of simulated host-satellite systems, challenging the current structure and galaxy formation model. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: 23 pages, 15 figures

arXiv:2411.16654 [pdf, ps, other]

Newton polytopes of dual Schubert polynomials

Authors: Serena An, Katherine Tung, Yuchong Zhang

Abstract: The M-convexity of dual Schubert polynomials was first proven by Huh, Matherne, Mészáros, and St. Dizier in 2022. We give a full characterization of the supports of dual Schubert polynomials, which yields an elementary alternative proof of the M-convexity result, and furthermore strengthens it by explicitly characterizing the vertices of their Newton polytopes combinatorially. The M-convexity of dual Schubert polynomials was first proven by Huh, Matherne, Mészáros, and St. Dizier in 2022. We give a full characterization of the supports of dual Schubert polynomials, which yields an elementary alternative proof of the M-convexity result, and furthermore strengthens it by explicitly characterizing the vertices of their Newton polytopes combinatorially. △ Less

Submitted 25 November, 2024; originally announced November 2024.

arXiv:2411.08346 [pdf, ps, other]

doi 10.1063/5.0263240

Evidence of orbital Hall current induced correlation in second harmonic response of longitudinal and transverse voltage in light metal-ferromagnet bilayers

Authors: Dhananjaya Mahapatra, Abu Bakkar Miah, HareKrishna Bhunia, Soumik Aon, Partha Mitra

Abstract: We investigate the effect of orbital current arising from orbital Hall effect in thin films of Nb and Ti in ohmic contact with ferromagnetic Ni in the second harmonic longitudinal and transverse voltages in response to an a.c. current applied to the bilayer structures. Our experiments were analogous to those on Heavy Metal-Ferromagnet bilayers and we extract the Orbital Hall Torque efficiency and… ▽ More We investigate the effect of orbital current arising from orbital Hall effect in thin films of Nb and Ti in ohmic contact with ferromagnetic Ni in the second harmonic longitudinal and transverse voltages in response to an a.c. current applied to the bilayer structures. Our experiments were analogous to those on Heavy Metal-Ferromagnet bilayers and we extract the Orbital Hall Torque efficiency and unidirectional magnetoresistance (UMR). Through second-harmonic measurements, we investigate orbital Hall torque and UMR in bilayer devices composed of ferromagnetic materials (FM), such as Ni and NiFe, paired with light metals (LM), such as Ti and Nb. Our results demonstrate that LM/Ni bilayers exhibit enhanced damping-like torque and unidirectional magnetoresistance (UMR) compared to LM/NiFe bilayers. This enhancement suggests that angular momentum is generated via the orbital Hall effect within the light metal, where it undergoes orbital-to-spin conversion within the Ni ferromagnet, ultimately transferring to the magnetization of the ferromagnetic layer. Torque and UMR are also absent in single-layer devices, highlighting the necessity of the bilayer structure for orbital current generation. △ Less

Submitted 11 June, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

Journal ref: Applied Physics Letters2025

arXiv:2411.05214 [pdf, other]

STAND-Guard: A Small Task-Adaptive Content Moderation Model

Authors: Minjia Wang, Pingping Lin, Siqi Cai, Shengnan An, Shengjie Ma, Zeqi Lin, Congrui Huang, Bixiong Xu

Abstract: Content moderation, the process of reviewing and monitoring the safety of generated content, is important for development of welcoming online platforms and responsible large language models. Content moderation contains various tasks, each with its unique requirements tailored to specific scenarios. Therefore, it is crucial to develop a model that can be easily adapted to novel or customized conten… ▽ More Content moderation, the process of reviewing and monitoring the safety of generated content, is important for development of welcoming online platforms and responsible large language models. Content moderation contains various tasks, each with its unique requirements tailored to specific scenarios. Therefore, it is crucial to develop a model that can be easily adapted to novel or customized content moderation tasks accurately without extensive model tuning. This paper presents STAND-GUARD, a Small Task-Adaptive coNtent moDeration model. The basic motivation is: by performing instruct tuning on various content moderation tasks, we can unleash the power of small language models (SLMs) on unseen (out-of-distribution) content moderation tasks. We also carefully study the effects of training tasks and model size on the efficacy of cross-task fine-tuning mechanism. Experiments demonstrate STAND-Guard is comparable to GPT-3.5-Turbo across over 40 public datasets, as well as proprietary datasets derived from real-world business scenarios. Remarkably, STAND-Guard achieved nearly equivalent results to GPT-4-Turbo on unseen English binary classification tasks △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: 20 pages, 1 figure

arXiv:2411.00813 [pdf, other]

Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation

Authors: Sixu An, Xiangguo Sun, Yicong Li, Yu Yang, Guandong Xu

Abstract: Personality analysis from online short videos has gained prominence due to its applications in personalized recommendation systems, sentiment analysis, and human-computer interaction. Traditional assessment methods, such as questionnaires based on the Big Five Personality Framework, are limited by self-report biases and are impractical for large-scale or real-time analysis. Leveraging the rich, mu… ▽ More Personality analysis from online short videos has gained prominence due to its applications in personalized recommendation systems, sentiment analysis, and human-computer interaction. Traditional assessment methods, such as questionnaires based on the Big Five Personality Framework, are limited by self-report biases and are impractical for large-scale or real-time analysis. Leveraging the rich, multi-modal data present in short videos offers a promising alternative for more accurate personality inference. However, integrating these diverse and asynchronous modalities poses significant challenges, particularly in aligning time-varying data and ensuring models generalize well to new domains with limited labeled data. In this paper, we propose a novel multi-modal personality analysis framework that addresses these challenges by synchronizing and integrating features from multiple modalities and enhancing model generalization through domain adaptation. We introduce a timestamp-based modality alignment mechanism that synchronizes data based on spoken word timestamps, ensuring accurate correspondence across modalities and facilitating effective feature integration. To capture temporal dependencies and inter-modal interactions, we employ Bidirectional Long Short-Term Memory networks and self-attention mechanisms, allowing the model to focus on the most informative features for personality prediction. Furthermore, we develop a gradient-based domain adaptation method that transfers knowledge from multiple source domains to improve performance in target domains with scarce labeled data. Extensive experiments on real-world datasets demonstrate that our framework significantly outperforms existing methods in personality prediction tasks, highlighting its effectiveness in capturing complex behavioral cues and robustness in adapting to new domains. △ Less

Submitted 25 October, 2024; originally announced November 2024.

arXiv:2410.15377 [pdf, other]

Engineering the Environment of a Superconducting Qubit with an Artificial Giant Atom

Authors: Jingjing Hu, Dengfeng Li, Yufan Qie, Zelong Yin, Anton Frisk Kockum, Franco Nori, Shuoming An

Abstract: In quantum computing, precise control of system-environment coupling is essential for high-fidelity gates, measurements, and networking. We present an architecture that employs an artificial giant atom from waveguide quantum electrodynamics to tailor the interaction between a superconducting qubit and its environment. This frequency-tunable giant atom exhibits both frequency and power selectivity… ▽ More In quantum computing, precise control of system-environment coupling is essential for high-fidelity gates, measurements, and networking. We present an architecture that employs an artificial giant atom from waveguide quantum electrodynamics to tailor the interaction between a superconducting qubit and its environment. This frequency-tunable giant atom exhibits both frequency and power selectivity for photons: when resonant with the qubit, it reflects single photons emitted from the qubit while remaining transparent to strong microwave signals for readout and control. This approach surpasses the Purcell limit and significantly extends the qubit's lifetime by ten times while maintaining the readout speed, thereby improving both gate operations and readout. Our architecture holds promise for bridging circuit and waveguide quantum electrodynamics systems in quantum technology applications. △ Less

Submitted 20 October, 2024; originally announced October 2024.

arXiv:2410.07701 [pdf, other]

Autonomous Driving in Unstructured Environments: How Far Have We Come?

Authors: Chen Min, Shubin Si, Xu Wang, Hanzhang Xue, Weizhong Jiang, Yang Liu, Juan Wang, Qingtian Zhu, Qi Zhu, Lun Luo, Fanjie Kong, Jinyu Miao, Xudong Cai, Shuai An, Wei Li, Jilin Mei, Tong Sun, Heng Zhai, Qifeng Liu, Fangzhou Zhao, Liang Chen, Shuai Wang, Erke Shang, Linzhi Shang, Kunlong Zhao , et al. (13 additional authors not shown)

Abstract: Research on autonomous driving in unstructured outdoor environments is less advanced than in structured urban settings due to challenges like environmental diversities and scene complexity. These environments-such as rural areas and rugged terrains-pose unique obstacles that are not common in structured urban areas. Despite these difficulties, autonomous driving in unstructured outdoor environment… ▽ More Research on autonomous driving in unstructured outdoor environments is less advanced than in structured urban settings due to challenges like environmental diversities and scene complexity. These environments-such as rural areas and rugged terrains-pose unique obstacles that are not common in structured urban areas. Despite these difficulties, autonomous driving in unstructured outdoor environments is crucial for applications in agriculture, mining, and military operations. Our survey reviews over 250 papers for autonomous driving in unstructured outdoor environments, covering offline mapping, pose estimation, environmental perception, path planning, end-to-end autonomous driving, datasets, and relevant challenges. We also discuss emerging trends and future research directions. This review aims to consolidate knowledge and encourage further research for autonomous driving in unstructured environments. To support ongoing work, we maintain an active repository with up-to-date literature and open-source projects at: https://github.com/chaytonmin/Survey-Autonomous-Driving-in-Unstructured-Environments. △ Less

Submitted 31 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

Comments: Survey paper; 38 pages

arXiv:2410.04806 [pdf]

Topological beaming of light: Proof-of-concept experiment

Authors: Yu Sung Choi, Ki Young Lee, Soo-Chan An, Minchul Jang, Youngjae Kim, Seung Han Shin, Jae Woong Yoon

Abstract: Beam shaping in nanophotonic systems remains a challenge due to the reliance on complex heuristic optimization procedures. In this work, we experimentally demonstrate a novel approach to topological beam shaping using Jackiw-Rebbi states in metasurfaces. By fabricating thin-film dielectric structures with engineered Dirac-mass distributions, we create domain walls that allow precise control over b… ▽ More Beam shaping in nanophotonic systems remains a challenge due to the reliance on complex heuristic optimization procedures. In this work, we experimentally demonstrate a novel approach to topological beam shaping using Jackiw-Rebbi states in metasurfaces. By fabricating thin-film dielectric structures with engineered Dirac-mass distributions, we create domain walls that allow precise control over beam profiles. We observe the emergence of Jackiw-Rebbi states and confirm their localized characteristics. Notably, we achieve a flat-top beam profile by carefully tailoring the Dirac mass distribution, highlighting the potential of this method for customized beam shaping. This experimental realization establishes our approach as a new mechanism for beam control, rooted in topological physics, and offers an efficient strategy for nanophotonic design. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.02465 [pdf, other]

Revealing the Inherent Instructability of Pre-Trained Language Models

Authors: Seokhyun An, Minji Kim, Hyounghun Kim

Abstract: Instruction tuning -- supervised fine-tuning using instruction-response pairs -- is a key step in making pre-trained large language models (LLMs) instructable. Meanwhile, LLMs perform multitask learning during their pre-training, acquiring extensive knowledge and capabilities. We hypothesize that the pre-training stage can enable them to develop the ability to comprehend and address instructions.… ▽ More Instruction tuning -- supervised fine-tuning using instruction-response pairs -- is a key step in making pre-trained large language models (LLMs) instructable. Meanwhile, LLMs perform multitask learning during their pre-training, acquiring extensive knowledge and capabilities. We hypothesize that the pre-training stage can enable them to develop the ability to comprehend and address instructions. To verify this, we propose Response Tuning (RT), which removes the instruction and its corresponding mapping to the response from instruction tuning. Instead, it focuses solely on establishing the response distribution. Our experiments demonstrate that RT models, trained only on responses, can effectively respond to a wide range of instructions and exhibit helpfulness approaching that of their instruction-tuned counterparts. In addition, we observe that the models can recognize and reject unsafe queries after learning the refusal conditions from training responses. Furthermore, we demonstrate that these observations also hold in an in-context learning setting. These findings support our hypothesis, highlighting the extensive inherent capabilities of pre-trained LLMs. △ Less

Submitted 16 February, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: 31 pages

arXiv:2409.18164 [pdf]

Data-Prep-Kit: getting your data ready for LLM application development

Authors: David Wood, Boris Lublinsky, Alexy Roytman, Shivdeep Singh, Constantin Adam, Abdulhamid Adebayo, Sungeun An, Yuan Chi Chang, Xuan-Hong Dang, Nirmit Desai, Michele Dolfi, Hajar Emami-Gohari, Revital Eres, Takuya Goto, Dhiraj Joshi, Yan Koyfman, Mohammad Nassar, Hima Patel, Paramesvaran Selvam, Yousaf Shah, Saptha Surendran, Daiki Tsuzuku, Petros Zerfos, Shahrokh Daijavad

Abstract: Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortles… ▽ More Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG). △ Less

Submitted 12 November, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: 10 pages, 7 figures

arXiv:2409.16913 [pdf, ps, other]

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

Authors: Wenhao Liu, Siyu An, Junru Lu, Muling Wu, Tianlong Li, Xiaohua Wang, Changze lv, Xiaoqing Zheng, Di Yin, Xing Sun, Xuanjing Huang

Abstract: Role-Playing Agents (RPAs) have shown remarkable performance in various applications, yet they often struggle to recognize and appropriately respond to hard queries that conflict with their role-play knowledge. To investigate RPAs' performance when faced with different types of conflicting requests, we develop an evaluation benchmark that includes contextual knowledge conflicting requests, paramet… ▽ More Role-Playing Agents (RPAs) have shown remarkable performance in various applications, yet they often struggle to recognize and appropriately respond to hard queries that conflict with their role-play knowledge. To investigate RPAs' performance when faced with different types of conflicting requests, we develop an evaluation benchmark that includes contextual knowledge conflicting requests, parametric knowledge conflicting requests, and non-conflicting requests to assess RPAs' ability to identify conflicts and refuse to answer appropriately without over-refusing. Through extensive evaluation, we find that most RPAs behave significant performance gaps toward different conflict requests. To elucidate the reasons, we conduct an in-depth representation-level analysis of RPAs under various conflict scenarios. Our findings reveal the existence of rejection regions and direct response regions within the model's forwarding representation, and thus influence the RPA's final response behavior. Therefore, we introduce a lightweight representation editing approach that conveniently shifts conflicting requests to the rejection region, thereby enhancing the model's refusal accuracy. The experimental results validate the effectiveness of our editing method, improving RPAs' refusal ability of conflicting requests while maintaining their general role-playing capabilities. △ Less

Submitted 13 June, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

Journal ref: Annual Meeting of the Association for Computational Linguistics (ACL), 2025, Findings

arXiv:2409.16202 [pdf, other]

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Authors: Qian-Wen Zhang, Haochen Wang, Fang Li, Siyu An, Lingfeng Qiao, Liangcai Gao, Di Yin, Xing Sun

Abstract: Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios… ▽ More Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios. This limitation arises because educational applications require more than mere test question responses. To bridge this gap, we introduce CJEval, a benchmark based on Chinese Junior High School Exam Evaluations. CJEval consists of 26,136 samples across four application-level educational tasks covering ten subjects. These samples include not only questions and answers but also detailed annotations such as question types, difficulty levels, knowledge concepts, and answer explanations. By utilizing this benchmark, we assessed LLMs' potential applications and conducted a comprehensive analysis of their performance by fine-tuning on various educational tasks. Extensive experiments and discussions have highlighted the opportunities and challenges of applying LLMs in the field of education. △ Less

Submitted 24 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.13403 [pdf, other]

Dynamic parameterized problems on unit disk graphs

Authors: Shinwoo An, Kyungjin Cho, Leo Jang, Byeonghyeon Jung, Yudam Lee, Eunjin Oh, Donghun Shin, Hyeonjun Shin, Chanho Song

Abstract: In this paper, we study fundamental parameterized problems such as $k$-Path/Cycle, Vertex Cover, Triangle Hitting Set, Feedback Vertex Set, and Cycle Packing for dynamic unit disk graphs. Given a vertex set $V$ changing dynamically under vertex insertions and deletions, our goal is to maintain data structures so that the aforementioned parameterized problems on the unit disk graph induced by $V$ c… ▽ More In this paper, we study fundamental parameterized problems such as $k$-Path/Cycle, Vertex Cover, Triangle Hitting Set, Feedback Vertex Set, and Cycle Packing for dynamic unit disk graphs. Given a vertex set $V$ changing dynamically under vertex insertions and deletions, our goal is to maintain data structures so that the aforementioned parameterized problems on the unit disk graph induced by $V$ can be solved efficiently. Although dynamic parameterized problems on general graphs have been studied extensively, no previous work focuses on unit disk graphs. In this paper, we present the first data structures for fundamental parameterized problems on dynamic unit disk graphs. More specifically, our data structure supports $2^{O(\sqrt{k})}$ update time and $O(k)$ query time for $k$-Path/Cycle. For the other problems, our data structures support $O(\log n)$ update time and $2^{O(\sqrt{k})}$ query time, where $k$ denotes the output size. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: To appear in ISAAC 2024

arXiv:2409.04763 [pdf]

Chalcogenide Metasurfaces Enabling Ultra-Wideband Detectors from Visible to Mid-infrared

Authors: Shutao Zhang, Shu An, Mingjin Dai, Qing Yang Steve Wu, Nur Qalishah Adanan, Jun Zhang, Yan Liu, Henry Yit Loong Lee, Nancy Lai Mun Wong, Ady Suwardi, Jun Ding, Robert Edward Simpson, Qi Jie Wang, Joel K. W. Yang, Zhaogang Dong

Abstract: Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this p… ▽ More Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this paper, we designed and fabricated Sb2Te3 metasurface devices to achieve resonant absorption for enabling photodetectors operating across an ultra-wideband spectrum, from visible to mid-IR. Furthermore, relying on asymmetric Sb2Te3 metasurface, we demonstrated the thermoelectric photodetectors with polarization-selectivity. This work provides a potential platform towards the portable ultrawide band spectrometers at room temperature, for environmental sensing applications. △ Less

Submitted 7 September, 2024; originally announced September 2024.

arXiv:2408.09591 [pdf, other]

Pre-assignment problem for unique minimum vertex cover on bounded clique-width graphs

Authors: Shinwoo An, Yeonsu Chang, Kyungjin Cho, O-joung Kwon, Myounghwan Lee, Eunjin Oh, Hyeonjun Shin

Abstract: Horiyama et al. (AAAI 2024) considered the problem of generating instances with a unique minimum vertex cover under certain conditions. The Pre-assignment for Uniquification of Minimum Vertex Cover problem (shortly PAU-VC) is the problem, for given a graph $G$, to find a minimum set $S$ of vertices in $G$ such that there is a unique minimum vertex cover of $G$ containing $S$. We show that PAU-VC i… ▽ More Horiyama et al. (AAAI 2024) considered the problem of generating instances with a unique minimum vertex cover under certain conditions. The Pre-assignment for Uniquification of Minimum Vertex Cover problem (shortly PAU-VC) is the problem, for given a graph $G$, to find a minimum set $S$ of vertices in $G$ such that there is a unique minimum vertex cover of $G$ containing $S$. We show that PAU-VC is fixed-parameter tractable parameterized by clique-width, which improves an exponential algorithm for trees given by Horiyama et al. Among natural graph classes with unbounded clique-width, we show that the problem can be solved in linear time on split graphs and unit interval graphs. △ Less

Submitted 22 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

Comments: 19 pages, 3 figures

arXiv:2408.03541 [pdf, ps, other]

EXAONE 3.0 7.8B Instruction Tuned Language Model

Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly competitive real-world performance with instruction-following capability against other state-of-the-art open models of similar size. Our comparative analysis shows that EXAONE 3.0 excels particularly in Korean, while achieving compelling performance across general tasks and complex reasoning. With its strong real-world effectiveness and bilingual proficiency, we hope that EXAONE keeps contributing to advancements in Expert AI. Our EXAONE 3.0 instruction-tuned model is available at https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct △ Less

Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.00644 [pdf, other]

doi 10.1145/3664647.3681443

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

Authors: Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose

Abstract: Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notable limitation, i.e., focusing only on the accuracy of AU recognition and overlooking explanations of corresponding AU states. In this paper, we propos… ▽ More Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notable limitation, i.e., focusing only on the accuracy of AU recognition and overlooking explanations of corresponding AU states. In this paper, we propose an end-to-end Vision-Language joint learning network for explainable FAU recognition (termed VL-FAU), which aims to reinforce AU representation capability and language interpretability through the integration of joint multimodal tasks. Specifically, VL-FAU brings together language models to generate fine-grained local muscle descriptions and distinguishable global face description when optimising FAU recognition. Through this, the global facial representation and its local AU representations will achieve higher distinguishability among different AUs and different subjects. In addition, multi-level AU representation learning is utilised to improve AU individual attention-aware representation capabilities based on multi-scale combined facial stem feature. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance over the state-of-the-art methods on most of the metrics. In addition, compared with mainstream FAU recognition methods, VL-FAU can provide local- and global-level interpretability language descriptions with the AUs' predictions. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 10 pages, 5 figures, 4 tables

Journal ref: ACM Multimedia 2024

arXiv:2408.00611 [pdf, other]

Using CSNNs to Perform Event-based Data Processing & Classification on ASL-DVS

Authors: Ria Patel, Sujit Tripathy, Zachary Sublett, Seoyoung An, Riya Patel

Abstract: Recent advancements in bio-inspired visual sensing and neuromorphic computing have led to the development of various highly efficient bio-inspired solutions with real-world applications. One notable application integrates event-based cameras with spiking neural networks (SNNs) to process event-based sequences that are asynchronous and sparse, making them difficult to handle. In this project, we de… ▽ More Recent advancements in bio-inspired visual sensing and neuromorphic computing have led to the development of various highly efficient bio-inspired solutions with real-world applications. One notable application integrates event-based cameras with spiking neural networks (SNNs) to process event-based sequences that are asynchronous and sparse, making them difficult to handle. In this project, we develop a convolutional spiking neural network (CSNN) architecture that leverages convolutional operations and recurrent properties of a spiking neuron to learn the spatial and temporal relations in the ASL-DVS gesture dataset. The ASL-DVS gesture dataset is a neuromorphic dataset containing hand gestures when displaying 24 letters (A to Y, excluding J and Z due to the nature of their symbols) from the American Sign Language (ASL). We performed classification on a pre-processed subset of the full ASL-DVS dataset to identify letter signs and achieved 100\% training accuracy. Specifically, this was achieved by training in the Google Cloud compute platform while using a learning rate of 0.0005, batch size of 25 (total of 20 batches), 200 iterations, and 10 epochs. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 8 pages, 14 figures

arXiv:2408.00359 [pdf, other]

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Authors: Jy-yong Sohn, Dohyun Kwon, Seoyeon An, Kangwook Lee

Abstract: Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neur… ▽ More Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons ($m$) needed to arbitrarily change $N$ labels among $K$ samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network $f$ and a neural network $g$ (with $m$ neurons) designed for fine-tuning. When $g$ is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that $N$ samples can be fine-tuned with $m=Θ(N)$ neurons for 2-layer networks, and with $m=Θ(\sqrt{N})$ neurons for 3-layer networks, no matter how large $K$ is. Our results recover the known memorization capacity results when $N = K$ as a special case. △ Less

Submitted 19 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

Comments: 10 pages, 9 figures, UAI 2024

arXiv:2407.13808 [pdf, other]

CoAPT: Context Attribute words for Prompt Tuning

Authors: Gun Lee, Subin An, Sungyong Baik, Soochahn Lee

Abstract: We propose a novel prompt tuning method called CoAPT(Context Attribute words in Prompt Tuning) for few/zero-shot image classification. The core motivation is that attributes are descriptive words with rich information about a given concept. Thus, we aim to enrich text queries of existing prompt tuning methods, improving alignment between text and image embeddings in CLIP embedding space. To do so,… ▽ More We propose a novel prompt tuning method called CoAPT(Context Attribute words in Prompt Tuning) for few/zero-shot image classification. The core motivation is that attributes are descriptive words with rich information about a given concept. Thus, we aim to enrich text queries of existing prompt tuning methods, improving alignment between text and image embeddings in CLIP embedding space. To do so, CoAPT integrates attribute words as additional prompts within learnable prompt tuning and can be easily incorporated into various existing prompt tuning methods. To facilitate the incorporation of attributes into text embeddings and the alignment with image embeddings, soft prompts are trained together with an additional meta-network that generates input-image-wise feature biases from the concatenated feature encodings of the image-text combined queries. Our experiments demonstrate that CoAPT leads to considerable improvements for existing baseline methods on several few/zero-shot image classification tasks, including base-to-novel generalization, cross-dataset transfer, and domain generalization. Our findings highlight the importance of combining hard and soft prompts and pave the way for future research on the interplay between text and image latent spaces in pre-trained models. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 14 pages, 4 figures

arXiv:2407.11372 [pdf, other]

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Authors: Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

Abstract: Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad… ▽ More Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: The 18th European Conference on Computer Vision ECCV 2024

arXiv:2407.03014 [pdf]

Dielectric Fano Nanoantennas for Enabling Sub-Nanosecond Lifetimes in NV-based Single Photon Emitters

Authors: Shu An, Dmitry Kalashnikov, Wenqiao Shi, Zackaria Mahfoud, Ah Bian Chew, Yan Liu, Jing Wu, Di Zhu, Weibo Gao, Cheng-Wei Qiu, Victor Leong, Zhaogang Dong

Abstract: Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide… ▽ More Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide strong emission enhancement compared to plasmonic ones, which suffer from high Ohmic loss. Here, we designed and fabricated a dielectric Fano resonator based on a pair of silicon (Si) ellipses and a disk, which supports the mode hybridization between quasi-bound-states-in-the-continuum (quasi-BIC) and Mie resonance. We demonstrated the performance of the developed resonant system by interfacing it with single photon emitters (SPEs) based on nitrogen-vacancy (NV-) centers in nanodiamonds (NDs). We observed that the interfaced emitters have a Purcell enhancement factor of ~10, with sub-ns emission lifetime and a polarization contrast of 9. Our results indicate a promising method for developing efficient and compact single-photon sources for integrated quantum photonics applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 20 pages, 4 figures

arXiv:2407.02536 [pdf, other]

doi 10.4230/LIPIcs.GIScience.2023.3

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors: Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Abstract: Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob… ▽ More Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. △ Less

Submitted 1 July, 2024; originally announced July 2024.

ACM Class: E.m; F.2; E.1; H.3; I.5; J.0

arXiv:2407.00256 [pdf, other]

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

MSC Class: 68T01

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

Showing 1–50 of 260 results for author: An, S