-
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Authors:
Shihan Dou,
Muling Wu,
Jingwen Xu,
Rui Zheng,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited ca…
▽ More
Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited capability at this stage prevents it from successfully solving these problems. The early suppression of these potentially valuable solution ideas by the policy gradient hinders the model's ability to revisit and re-explore these ideas later. Consequently, although the LLM's capabilities improve in the later stages of training, it still struggles to effectively address these complex problems. To address this exploration issue, we propose a novel algorithm named Retrospective Replay-based Reinforcement Learning (RRL), which introduces a dynamic replay mechanism throughout the training process. RRL enables the model to revisit promising states identified in the early stages, thereby improving its efficiency and effectiveness in exploration. To evaluate the effectiveness of RRL, we conduct extensive experiments on complex reasoning tasks, including mathematical reasoning and code generation, and general dialogue tasks. The results indicate that RRL maintains high exploration efficiency throughout the training period, significantly enhancing the effectiveness of RL in optimizing LLMs for complicated reasoning tasks. Moreover, it also improves the performance of RLHF, making the model both safer and more helpful.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Authors:
ByteDance Seed,
:,
Jiaze Chen,
Tiantian Fan,
Xin Liu,
Lingjun Liu,
Zhiqi Lin,
Mingxuan Wang,
Chengyi Wang,
Xiangpeng Wei,
Wenyuan Xu,
Yufeng Yuan,
Yu Yue,
Lin Yan,
Qiying Yu,
Xiaochen Zuo,
Chi Zhang,
Ruofei Zhu,
Zhecheng An,
Zhihao Bai,
Yu Bao,
Xingyan Bin,
Jiangjie Chen,
Feng Chen,
Hongmin Chen
, et al. (249 additional authors not shown)
Abstract:
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in…
▽ More
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.
△ Less
Submitted 29 April, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Authors:
Ming Zhang,
Yuhui Wang,
Yujiong Shen,
Tingyi Yang,
Changhao Jiang,
Yilong Wu,
Shihan Dou,
Qinhao Chen,
Zhiheng Xi,
Zhihao Zhang,
Yi Dong,
Zhen Wang,
Zhihui Fei,
Mingyang Wan,
Tao Liang,
Guojun Ma,
Qi Zhang,
Tao Gui,
Xuanjing Huang
Abstract:
Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial…
▽ More
Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Detecting Essence Code Clones via Information Theoretic Analysis
Authors:
Lida Zhao,
Shihan Dou,
Yutao Hu,
Yueming Wu,
Jiahui Wu,
Chengwei Liu,
Lyuye Zhang,
Yi Liu,
Jun Sun,
Xuanjing Huang,
Yang Liu
Abstract:
Code cloning, a widespread practice in software development, involves replicating code fragments to save time but often at the expense of software maintainability and quality. In this paper, we address the specific challenge of detecting "essence clones", a complex subtype of Type-3 clones characterized by sharing critical logic despite different peripheral codes. Traditional techniques often fail…
▽ More
Code cloning, a widespread practice in software development, involves replicating code fragments to save time but often at the expense of software maintainability and quality. In this paper, we address the specific challenge of detecting "essence clones", a complex subtype of Type-3 clones characterized by sharing critical logic despite different peripheral codes. Traditional techniques often fail to detect essence clones due to their syntactic focus. To overcome this limitation, we introduce ECScan, a novel detection tool that leverages information theory to assess the semantic importance of code lines. By assigning weights to each line based on its information content, ECScan emphasizes core logic over peripheral code differences. Our comprehensive evaluation across various real-world projects shows that ECScan significantly outperforms existing tools in detecting essence clones, achieving an average F1-score of 85%. It demonstrates robust performance across all clone types and offers exceptional scalability. This study advances clone detection by providing a practical tool for developers to enhance code quality and reduce maintenance burdens, emphasizing the semantic aspects of code through an innovative information-theoretic approach.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Authors:
Yuming Yang,
Yang Nan,
Junjie Ye,
Shihan Dou,
Xiao Wang,
Shuo Li,
Huijie Lv,
Mingqi Wu,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Data diversity is crucial for the instruction tuning of large language models. Existing studies have explored various diversity-aware data selection methods to construct high-quality datasets and enhance model performance. However, the fundamental problem of precisely defining and measuring data diversity remains underexplored, limiting clear guidance for data engineering. To address this, we syst…
▽ More
Data diversity is crucial for the instruction tuning of large language models. Existing studies have explored various diversity-aware data selection methods to construct high-quality datasets and enhance model performance. However, the fundamental problem of precisely defining and measuring data diversity remains underexplored, limiting clear guidance for data engineering. To address this, we systematically analyze 11 existing diversity measurement methods by evaluating their correlation with model performance through extensive fine-tuning experiments. Our results indicate that a reliable diversity measure should properly account for both inter-sample differences and the information distribution in the sample space. Building on this, we propose NovelSum, a new diversity metric based on sample-level "novelty." Experiments on both simulated and real-world data show that NovelSum accurately captures diversity variations and achieves a 0.97 correlation with instruction-tuned model performance, highlighting its value in guiding data engineering practices. With NovelSum as an optimization objective, we further develop a greedy, diversity-oriented data selection strategy that outperforms existing approaches, validating both the effectiveness and practical significance of our metric.
△ Less
Submitted 28 February, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
SMI: An Information-Theoretic Metric for Predicting Model Knowledge Solely from Pre-Training Signals
Authors:
Changhao Jiang,
Ming Zhang,
Junjie Ye,
Xiaoran Fan,
Yifei Cao,
Jiajun Sun,
Zhiheng Xi,
Shihan Dou,
Yi Dong,
Yujiong Shen,
Jingqi Tong,
Zhen Wang,
Tao Liang,
Zhihui Fei,
Mingyang Wan,
Guojun Ma,
Qi Zhang,
Tao Gui,
Xuanjing Huang
Abstract:
The GPT-4 technical report highlights the possibility of predicting model performance on downstream tasks using only pre-training signals, though detailed methodologies are absent. Such predictive capabilities are essential for resource-efficient pre-training and the construction of task-aligned datasets. In this paper, we aim to predict performance in closed-book question answering (QA), a vital…
▽ More
The GPT-4 technical report highlights the possibility of predicting model performance on downstream tasks using only pre-training signals, though detailed methodologies are absent. Such predictive capabilities are essential for resource-efficient pre-training and the construction of task-aligned datasets. In this paper, we aim to predict performance in closed-book question answering (QA), a vital downstream task indicative of a model's internal knowledge. We address three primary challenges: (1) limited access to and understanding of pre-training corpora, (2) limitations of current evaluation methods for pre-trained models, and (3) limitations of frequency-based metrics in predicting model performance. In response to these challenges, we conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models. Subsequently, we develop a multi-template QA evaluation framework incorporating paraphrased question variants. Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics, model size, and QA accuracy, without requiring any additional training. The experimental results demonstrate that SMI outperforms co-occurrence-based baselines, achieving $R^2$ > 0.75 on models with over one billion parameters. Theoretical analysis further reveals the marginal benefits of scaling model size and optimizing data, indicating that the upper limit of specific QA task accuracy is approximately 80%. Our project is available at https://github.com/yuhui1038/SMI.
△ Less
Submitted 13 May, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data
Authors:
Yijiashun Qi,
Quanchao Lu,
Shiyu Dou,
Xiaoxuan Sun,
Muqing Li,
Yankaiqi Li
Abstract:
This study presents a hierarchical mining framework for high-dimensional imbalanced data, leveraging a depth graph model to address the inherent performance limitations of conventional approaches in handling complex, high-dimensional data distributions with imbalanced sample representations. By constructing a structured graph representation of the dataset and integrating graph neural network (GNN)…
▽ More
This study presents a hierarchical mining framework for high-dimensional imbalanced data, leveraging a depth graph model to address the inherent performance limitations of conventional approaches in handling complex, high-dimensional data distributions with imbalanced sample representations. By constructing a structured graph representation of the dataset and integrating graph neural network (GNN) embeddings, the proposed method effectively captures global interdependencies among samples. Furthermore, a hierarchical strategy is employed to enhance the characterization and extraction of minority class feature patterns, thereby facilitating precise and robust imbalanced data mining. Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach, demonstrating substantial improvements over traditional methods in key performance metrics, including pattern discovery count, average support, and minority class coverage. Notably, the method exhibits superior capabilities in minority-class feature extraction and pattern correlation analysis. These findings underscore the potential of depth graph models, in conjunction with hierarchical mining strategies, to significantly enhance the efficiency and accuracy of imbalanced data analysis. This research contributes a novel computational framework for high-dimensional complex data processing and lays the foundation for future extensions to dynamically evolving imbalanced data and multi-modal data applications, thereby expanding the applicability of advanced data mining methodologies to more intricate analytical domains.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models
Authors:
Junliang Du,
Shiyu Dou,
Bohuan Yang,
Jiacheng Hu,
Tai An
Abstract:
This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sam…
▽ More
This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sample categories. The study introduced marginal probability estimation and weighted loss optimization strategies, combined with regularization constraints and structured reasoning methods, effectively improving the generalization ability and robustness of the model. In the experimental stage, a real credit card fraud detection dataset was selected and compared with models such as logistic regression, support vector machine, random forest and XGBoost. The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC, significantly outperforming traditional classification models, demonstrating its strong decision-making ability and applicability in unbalanced data scenarios. Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments and promote its wide application in practical applications such as financial risk control, medical diagnosis, and intelligent monitoring.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
DocFusion: A Unified Framework for Document Parsing Tasks
Authors:
Mingxu Chai,
Ziyu Shen,
Chong Zhang,
Yue Zhang,
Xiao Wang,
Shihan Dou,
Jihua Kang,
Jiazheng Zhang,
Qi Zhang
Abstract:
Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications. However, existing methods often require integrating multiple independent models to handle various parsing tasks, leading to high complexity and maintenance overhead. To address this, we propose DocFusion, a lightweight generative model with o…
▽ More
Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications. However, existing methods often require integrating multiple independent models to handle various parsing tasks, leading to high complexity and maintenance overhead. To address this, we propose DocFusion, a lightweight generative model with only 0.28B parameters. It unifies task representations and achieves collaborative training through an improved objective function. Experiments reveal and leverage the mutually beneficial interaction among recognition tasks, and integrating recognition data significantly enhances detection performance. The final results demonstrate that DocFusion achieves state-of-the-art (SOTA) performance across four key tasks.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games
Authors:
Kefan Su,
Yusen Huo,
Zhilin Zhang,
Shuai Dou,
Chuan Yu,
Jian Xu,
Zongqing Lu,
Bo Zheng
Abstract:
Decision-making in large-scale games is an essential research area in artificial intelligence (AI) with significant real-world impact. However, the limited access to realistic large-scale game environments has hindered research progress in this area. In this paper, we present AuctionNet, a benchmark for bid decision-making in large-scale ad auctions derived from a real-world online advertising pla…
▽ More
Decision-making in large-scale games is an essential research area in artificial intelligence (AI) with significant real-world impact. However, the limited access to realistic large-scale game environments has hindered research progress in this area. In this paper, we present AuctionNet, a benchmark for bid decision-making in large-scale ad auctions derived from a real-world online advertising platform. AuctionNet is composed of three parts: an ad auction environment, a pre-generated dataset based on the environment, and performance evaluations of several baseline bid decision-making algorithms. More specifically, the environment effectively replicates the integrity and complexity of real-world ad auctions through the interaction of several modules: the ad opportunity generation module employs deep generative networks to bridge the gap between simulated and real-world data while mitigating the risk of sensitive data exposure; the bidding module implements diverse auto-bidding agents trained with different decision-making algorithms; and the auction module is anchored in the classic Generalized Second Price (GSP) auction but also allows for customization of auction mechanisms as needed. To facilitate research and provide insights into the environment, we have also pre-generated a substantial dataset based on the environment. The dataset contains 10 million ad opportunities, 48 diverse auto-bidding agents, and over 500 million auction records. Performance evaluations of baseline algorithms such as linear programming, reinforcement learning, and generative models for bid decision-making are also presented as a part of AuctionNet. We believe that AuctionNet is applicable not only to research on bid decision-making in ad auctions but also to the general area of decision-making in large-scale games.
△ Less
Submitted 28 December, 2024; v1 submitted 14 December, 2024;
originally announced December 2024.
-
Multi-Programming Language Sandbox for LLMs
Authors:
Shihan Dou,
Jiazheng Zhang,
Jianxiang Zang,
Yunbo Tao,
Weikang Zhou,
Haoxiang Jia,
Shichun Liu,
Yuming Yang,
Zhiheng Xi,
Shenxi Wu,
Shaoqing Zhang,
Muling Wu,
Changze Lv,
Limao Xiong,
Wenyu Zhan,
Lin Zhang,
Rongxiang Weng,
Jingang Wang,
Xunliang Cai,
Yueming Wu,
Ming Wen,
Rui Zheng,
Tao Ji,
Yixin Cao,
Tao Gui
, et al. (3 additional authors not shown)
Abstract:
We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox also integrates bo…
▽ More
We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox also integrates both traditional and LLM-based code analysis tools, providing a comprehensive analysis of generated code. MPLSandbox can be effortlessly integrated into the training and deployment of LLMs to improve the quality and correctness of their generated code. It also helps researchers streamline their workflows for various LLM-based code-related tasks, reducing the development cost. To validate the effectiveness of MPLSandbox, we integrate it into training and deployment approaches, and also employ it to optimize workflows for a wide range of real-world code-related tasks. Our goal is to enhance researcher productivity on LLM-based code-related tasks by simplifying and automating workflows through delegation to MPLSandbox.
△ Less
Submitted 5 November, 2024; v1 submitted 30 October, 2024;
originally announced October 2024.
-
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Authors:
Enyu Zhou,
Guodong Zheng,
Binghai Wang,
Zhiheng Xi,
Shihan Dou,
Rong Bao,
Wei Shen,
Limao Xiong,
Jessica Fan,
Yurong Mou,
Rui Zheng,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Reward models (RMs) guide the alignment of large language models (LLMs), steering them toward behaviors preferred by humans. Evaluating RMs is the key to better aligning LLMs. However, the current evaluation of RMs may not directly correspond to their alignment performance due to the limited distribution of evaluation data and evaluation methods that are not closely related to alignment objectives…
▽ More
Reward models (RMs) guide the alignment of large language models (LLMs), steering them toward behaviors preferred by humans. Evaluating RMs is the key to better aligning LLMs. However, the current evaluation of RMs may not directly correspond to their alignment performance due to the limited distribution of evaluation data and evaluation methods that are not closely related to alignment objectives. To address these limitations, we propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios and includes both pairwise and Best-of-N (BoN) evaluations to better reflect the effectiveness of RMs in guiding alignment optimization. We demonstrate a positive correlation between our benchmark and the downstream alignment task performance. Based on our benchmark, we conduct extensive analysis on the state-of-the-art RMs, revealing their generalization defects that were not discovered by previous benchmarks, and highlighting the potential of generative RMs. Furthermore, we delve into open questions in reward models, specifically examining the effectiveness of majority voting for the evaluation of reward models and analyzing the impact factors of generative RMs, including the influence of evaluation criteria and instructing methods. Our evaluation code and datasets are available at https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark.
△ Less
Submitted 4 April, 2025; v1 submitted 13 October, 2024;
originally announced October 2024.
-
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
Authors:
Ming Zhang,
Caishuang Huang,
Yilong Wu,
Shichun Liu,
Huiyuan Zheng,
Yurui Dong,
Yujiong Shen,
Shihan Dou,
Jun Zhao,
Junjie Ye,
Qi Zhang,
Tao Gui,
Xuanjing Huang
Abstract:
Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif…
▽ More
Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can significantly enhance the performance of TOD through fine-tuning. However, current datasets primarily cater to user-led systems and are limited to predefined specific scenarios and slots, thereby necessitating improvements in the proactiveness, diversity, and capabilities of TOD. In this study, we present a detailed multi-domain task-oriented data construction process for conversations, and a Chinese dialogue dataset generated based on this process, TransferTOD, which authentically simulates human-computer dialogues in 30 popular life service scenarios. Leveraging this dataset, we trained a model called TransferTOD-7B using full-parameter fine-tuning, showcasing notable abilities in slot filling and questioning. Our work has demonstrated its strong generalization capabilities in various downstream scenarios, significantly enhancing both data utilization efficiency and system performance. The data is released in https://github.com/KongLongGeFDU/TransferTOD.
△ Less
Submitted 12 October, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Authors:
Shihan Dou,
Haoxiang Jia,
Shenxi Wu,
Huiyuan Zheng,
Weikang Zhou,
Muling Wu,
Mingxu Chai,
Jessica Fan,
Caishuang Huang,
Yunbo Tao,
Yan Liu,
Enyu Zhou,
Ming Zhang,
Yuhao Zhou,
Yueming Wu,
Rui Zheng,
Ming Wen,
Rongxiang Weng,
Jingang Wang,
Xunliang Cai,
Tao Gui,
Xipeng Qiu,
Qi Zhang,
Xuanjing Huang
Abstract:
The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar…
▽ More
The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of these existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and four popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions. Additionally, we developed a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. Furthermore, to better understand the performance of LLMs in real-world projects, we manually created a real-world benchmark comprising 140 code generation tasks. Our analysis highlights distinct differences in bug distributions between actual scenarios and existing benchmarks. Finally, we propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback. Experimental results demonstrate that our approach can significantly mitigate bugs and increase the passing rate by 29.2% after two iterations, indicating substantial potential for LLMs to handle more complex problems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
Authors:
Caishuang Huang,
Wanxu Zhao,
Rui Zheng,
Huijie Lv,
Wenyu Zhan,
Shihan Dou,
Sixian Li,
Xiao Wang,
Enyu Zhou,
Junjie Ye,
Yuming Yang,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w…
▽ More
As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, we introduce SafeAligner, a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks. We begin by developing two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses. SafeAligner leverages the disparity in security levels between the responses from these models to differentiate between harmful and beneficial tokens, effectively guiding the safety alignment by altering the output token distribution of the target model. Extensive experiments show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones, thereby ensuring secure alignment with minimal loss to generality.
△ Less
Submitted 24 December, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Authors:
Rong Bao,
Rui Zheng,
Shihan Dou,
Xiao Wang,
Enyu Zhou,
Bo Wang,
Qi Zhang,
Liang Ding,
Dacheng Tao
Abstract:
In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles t…
▽ More
In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles to describe human intentions, and are easily influenced by position bias. To address these issues, we propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback under simple and general principles such as ``best for humanity``. Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference, and finally determine which answer better fits human preferences according to the criticism. Additionally, we use a self-consistency method to further reduce the impact of position bias, and employ semantic perplexity to calculate the preference strength differences between different answers. Experimental results show that our method enables 13B and 70B Llama2-Chat annotators to provide high-quality preference feedback, and the policy models trained based on these preference data achieve significant advantages in benchmark datasets through reinforcement learning.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
MetaRM: Shifted Distributions Alignment via Meta-Learning
Authors:
Shihan Dou,
Yan Liu,
Enyu Zhou,
Tianlong Li,
Haoxiang Jia,
Limao Xiong,
Xin Zhao,
Junjie Ye,
Rui Zheng,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data…
▽ More
The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution. MetaRM is designed to train the RM by minimizing data loss, particularly for data that can improve the differentiation ability to examples of the shifted target distribution. Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization, and also provides the capacity to identify subtle differences in out-of-distribution samples.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection
Authors:
Shihan Dou,
Yueming Wu,
Haoxiang Jia,
Yuhao Zhou,
Yan Liu,
Yang Liu
Abstract:
With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-base…
▽ More
With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-based tools can scale to big code. However, due to the lack of program details, they cannot handle more complicated code clones, i.e., semantic code clones. In this paper, we introduce CC2Vec, a novel code encoding method designed to swiftly identify simple code clones while also enhancing the capability for semantic code clone detection. To retain the program details between tokens, CC2Vec divides them into different categories (i.e., typed tokens) according to the syntactic types and then applies two self-attention mechanism layers to encode them. To resist changes in the code structure of semantic code clones, CC2Vec performs contrastive learning to reduce the differences introduced by different code implementations. We evaluate CC2Vec on two widely used datasets (i.e., BigCloneBench and Google Code Jam) and the results report that our method can effectively detect simple code clones. In addition, CC2Vec not only attains comparable performance to widely used semantic code clone detection systems such as ASTNN, SCDetector, and FCCA by simply fine-tuning, but also significantly surpasses these methods in both detection efficiency.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Authors:
Weikang Zhou,
Xiao Wang,
Limao Xiong,
Han Xia,
Yingshuang Gu,
Mingxu Chai,
Fukang Zhu,
Caishuang Huang,
Shihan Dou,
Zhiheng Xi,
Rui Zheng,
Songyang Gao,
Yicheng Zou,
Hang Yan,
Yifan Le,
Ruohui Wang,
Lijun Li,
Jing Shao,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper…
▽ More
Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs. It builds jailbreak attacks using four components: Selector, Mutator, Constraint, and Evaluator. This modular framework enables researchers to easily construct attacks from combinations of novel and existing components. So far, EasyJailbreak supports 11 distinct jailbreak methods and facilitates the security validation of a broad spectrum of LLMs. Our validation across 10 distinct LLMs reveals a significant vulnerability, with an average breach probability of 60% under various jailbreaking attacks. Notably, even advanced models like GPT-3.5-Turbo and GPT-4 exhibit average Attack Success Rates (ASR) of 57% and 33%, respectively. We have released a wealth of resources for researchers, including a web platform, PyPI published package, screencast video, and experimental outputs.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
Authors:
Huijie Lv,
Xiao Wang,
Yuansen Zhang,
Caishuang Huang,
Shihan Dou,
Junjie Ye,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothes…
▽ More
Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks into a code completion format, enabling users to encrypt queries using personalized encryption functions. To guarantee response generation functionality, we embed a decryption function within the instructions, which allows the LLM to decrypt and execute the encrypted queries successfully. We conduct extensive experiments on 7 LLMs, achieving state-of-the-art average Attack Success Rate (ASR). Remarkably, our method achieves an 86.6\% ASR on GPT-4-1106.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding
Authors:
Haoming Li,
Yusen Huo,
Shuai Dou,
Zhenzhe Zheng,
Zhilin Zhang,
Chuan Yu,
Jian Xu,
Fan Wu
Abstract:
In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs). The current auto-bidding algorithms typically employ reinforcement learning (RL). However, due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed…
▽ More
In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs). The current auto-bidding algorithms typically employ reinforcement learning (RL). However, due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed in online environments. To narrow this gap, we can deploy multiple auto-bidding agents in parallel to collect a large interaction dataset. Offline RL algorithms can then be utilized to train a new policy. The trained policy can subsequently be deployed for further data collection, resulting in an iterative training framework, which we refer to as iterative offline RL. In this work, we identify the performance bottleneck of this iterative offline RL framework, which originates from the ineffective exploration and exploitation caused by the inherent conservatism of offline RL algorithms. To overcome this bottleneck, we propose Trajectory-wise Exploration and Exploitation (TEE), which introduces a novel data collecting and data utilization method for iterative offline RL from a trajectory perspective. Furthermore, to ensure the safety of online exploration while preserving the dataset quality for TEE, we propose Safe Exploration by Adaptive Action Selection (SEAS). Both offline experiments and real-world experiments on Alibaba display advertising platform demonstrate the effectiveness of our proposed method.
△ Less
Submitted 8 April, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Authors:
Nuo Xu,
Jun Zhao,
Can Zu,
Sixian Li,
Lu Chen,
Zhihao Zhang,
Rui Zheng,
Shihan Dou,
Wenjuan Qin,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of huma…
▽ More
Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.
△ Less
Submitted 27 February, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Authors:
Zhiheng Xi,
Wenxiang Chen,
Boyang Hong,
Senjie Jin,
Rui Zheng,
Wei He,
Yiwen Ding,
Shichun Liu,
Xin Guo,
Junzhe Wang,
Honglin Guo,
Wei Shen,
Xiaoran Fan,
Yuhao Zhou,
Shihan Dou,
Xiao Wang,
Xinbo Zhang,
Peng Sun,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for o…
▽ More
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision provides sparse rewards for final results without identifying error locations, whereas process supervision offers step-wise rewards but requires extensive manual annotation. R$^3$ overcomes these limitations by learning from correct demonstrations. Specifically, R$^3$ progressively slides the start state of reasoning from a demonstration's end to its beginning, facilitating easier model exploration at all stages. Thus, R$^3$ establishes a step-wise curriculum, allowing outcome supervision to offer step-level signals and precisely pinpoint errors. Using Llama2-7B, our method surpasses RL baseline on eight reasoning tasks by $4.1$ points on average. Notebaly, in program-based reasoning on GSM8K, it exceeds the baseline by $4.2$ points across three backbone models, and without any extra data, Codellama-7B + R$^3$ performs comparable to larger models or closed-source models.
△ Less
Submitted 17 March, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Authors:
Shihan Dou,
Yan Liu,
Haoxiang Jia,
Limao Xiong,
Enyu Zhou,
Wei Shen,
Junjie Shan,
Caishuang Huang,
Xiao Wang,
Xiaoran Fan,
Zhiheng Xi,
Yuhao Zhou,
Tao Ji,
Rui Zheng,
Qi Zhang,
Xuanjing Huang,
Tao Gui
Abstract:
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit te…
▽ More
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. In addition, we furthermore construct the APPS+ dataset for RL training, which is manually verified to ensure the correctness of unit tests. Experimental results show that our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks. Our dataset APPS+ and StepCoder are available online.
△ Less
Submitted 5 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification
Authors:
Shuguang Dou,
Xiangyang Jiang,
Yuanpeng Tu,
Junyao Gao,
Zefan Qu,
Qingsong Zhao,
Cairong Zhao
Abstract:
The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID). Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requireme…
▽ More
The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID). Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requirements for ReID and human parsing features. ReID focuses on instance part-level differences between pedestrian parts, while human parsing centers on semantic spatial context, reflecting the internal structure of the human body. To address this, DROP decouples features for ReID and human parsing, proposing detail-preserving upsampling to combine varying resolution feature maps. Parsing-specific features for human parsing are decoupled, and human position information is exclusively added to the human parsing branch. In the ReID branch, a part-aware compactness loss is introduced to enhance instance-level part differences. Experimental results highlight the efficacy of DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke, surpassing two mainstream methods. The codebase is accessible at https://github.com/shuguang-52/DROP.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
MouSi: Poly-Visual-Expert Vision-Language Models
Authors:
Xiaoran Fan,
Tao Ji,
Changhao Jiang,
Shuo Li,
Senjie Jin,
Sirui Song,
Junke Wang,
Boyang Hong,
Lu Chen,
Guodong Zheng,
Ming Zhang,
Caishuang Huang,
Rui Zheng,
Zhiheng Xi,
Yuhao Zhou,
Shihan Dou,
Junjie Ye,
Hang Yan,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability…
▽ More
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
Authors:
Songyang Gao,
Qiming Ge,
Wei Shen,
Shihan Dou,
Junjie Ye,
Xiao Wang,
Rui Zheng,
Yicheng Zou,
Zhi Chen,
Hang Yan,
Qi Zhang,
Dahua Lin
Abstract:
The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to d…
▽ More
The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
△ Less
Submitted 1 July, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Revisiting Jailbreaking for Large Language Models: A Representation Engineering Perspective
Authors:
Tianlong Li,
Zhenghua Wang,
Wenhao Liu,
Muling Wu,
Shihan Dou,
Changze Lv,
Xiaohua Wang,
Xiaoqing Zheng,
Xuanjing Huang
Abstract:
The recent surge in jailbreaking attacks has revealed significant vulnerabilities in Large Language Models (LLMs) when exposed to malicious inputs. While various defense strategies have been proposed to mitigate these threats, there has been limited research into the underlying mechanisms that make LLMs vulnerable to such attacks. In this study, we suggest that the self-safeguarding capability of…
▽ More
The recent surge in jailbreaking attacks has revealed significant vulnerabilities in Large Language Models (LLMs) when exposed to malicious inputs. While various defense strategies have been proposed to mitigate these threats, there has been limited research into the underlying mechanisms that make LLMs vulnerable to such attacks. In this study, we suggest that the self-safeguarding capability of LLMs is linked to specific activity patterns within their representation space. Although these patterns have little impact on the semantic content of the generated text, they play a crucial role in shaping LLM behavior under jailbreaking attacks. Our findings demonstrate that these patterns can be detected with just a few pairs of contrastive queries. Extensive experimentation shows that the robustness of LLMs against jailbreaking can be manipulated by weakening or strengthening these patterns. Further visual analysis provides additional evidence for our conclusions, providing new insights into the jailbreaking phenomenon. These findings highlight the importance of addressing the potential misuse of open-source LLMs within the community.
△ Less
Submitted 21 February, 2025; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Authors:
Binghai Wang,
Rui Zheng,
Lu Chen,
Yan Liu,
Shihan Dou,
Caishuang Huang,
Wei Shen,
Senjie Jin,
Enyu Zhou,
Chenyu Shi,
Songyang Gao,
Nuo Xu,
Yuhao Zhou,
Xiaoran Fan,
Zhiheng Xi,
Jun Zhao,
Xiao Wang,
Tao Ji,
Hang Yan,
Lixing Shen,
Zhan Chen,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang
, et al. (2 additional authors not shown)
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they f…
▽ More
Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they face the following challenges in practical applications: (1) Incorrect and ambiguous preference pairs in the dataset may hinder the reward model from accurately capturing human intent. (2) Reward models trained on data from a specific distribution often struggle to generalize to examples outside that distribution and are not suitable for iterative RLHF training.
In this report, we attempt to address these two issues. (1) From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. Experimental results confirm that data with varying preference strengths have different impacts on reward model performance. We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset and fully leverage high-quality preference data. (2) From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization. Furthermore, we employ meta-learning to enable the reward model to maintain the ability to differentiate subtle differences in out-of-distribution samples, and this approach can be utilized for iterative RLHF optimization.
△ Less
Submitted 12 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
Authors:
Junjie Ye,
Guanyu Li,
Songyang Gao,
Caishuang Huang,
Yilong Wu,
Sixian Li,
Xiaoran Fan,
Shihan Dou,
Tao Ji,
Qi Zhang,
Tao Gui,
Xuanjing Huang
Abstract:
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the complex capabilities required for LLMs to effectively use t…
▽ More
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the complex capabilities required for LLMs to effectively use tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. The code and data are available at https://github.com/Junjie-Ye/ToolEyes.
△ Less
Submitted 5 December, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
Authors:
Shihan Dou,
Enyu Zhou,
Yan Liu,
Songyang Gao,
Jun Zhao,
Wei Shen,
Yuhao Zhou,
Zhiheng Xi,
Xiao Wang,
Xiaoran Fan,
Shiliang Pu,
Jiang Zhu,
Rui Zheng,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increase…
▽ More
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this challenge, we propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network, like a plugin version of Mixture of Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks, to alleviate world knowledge-edge forgetting. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM.
△ Less
Submitted 8 March, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Gitor: Scalable Code Clone Detection by Building Global Sample Graph
Authors:
Junjie Shan,
Shihan Dou,
Yueming Wu,
Hairu Wu,
Yang Liu
Abstract:
Code clone detection is about finding out similar code fragments, which has drawn much attention in software engineering since it is important for software maintenance and evolution. Researchers have proposed many techniques and tools for source code clone detection, but current detection methods concentrate on analyzing or processing code samples individually without exploring the underlying conn…
▽ More
Code clone detection is about finding out similar code fragments, which has drawn much attention in software engineering since it is important for software maintenance and evolution. Researchers have proposed many techniques and tools for source code clone detection, but current detection methods concentrate on analyzing or processing code samples individually without exploring the underlying connections among code samples. In this paper, we propose Gitor to capture the underlying connections among different code samples. Specifically, given a source code database, we first tokenize all code samples to extract the pre-defined individual information. After obtaining all samples individual information, we leverage them to build a large global sample graph where each node is a code sample or a type of individual information. Then we apply a node embedding technique on the global sample graph to extract all the samples vector representations. After collecting all code samples vectors, we can simply compare the similarity between any two samples to detect possible clone pairs. More importantly, since the obtained vector of a sample is from a global sample graph, we can combine it with its own code features to improve the code clone detection performance. To demonstrate the effectiveness of Gitor, we evaluate it on a widely used dataset namely BigCloneBench. Our experimental results show that Gitor has higher accuracy in terms of code clone detection and excellent execution time for inputs of various sizes compared to existing state-of-the-art tools. Moreover, we also evaluate the combination of Gitor with other traditional vector-based clone detection methods, the results show that the use of Gitor enables them detect more code clones with higher F1.
△ Less
Submitted 18 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons
Authors:
Tianlong Li,
Shihan Dou,
Changze Lv,
Wenhao Liu,
Jianhan Xu,
Muling Wu,
Zixuan Ling,
Xiaoqing Zheng,
Xuanjing Huang
Abstract:
Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is ineffic…
▽ More
Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is inefficient and costly, while the latter cannot precisely manipulate personality traits at a fine-grained level. To address the above challenges, we have employed a novel Unsupervisedly-Built Personalized Lexicons (UBPL) in a pluggable manner during the decoding phase of LLMs to manipulate their personality traits. UBPL is a lexicon built through an unsupervised approach from a situational judgment test dataset (SJTs4LLM). Users can utilize UBPL to adjust the probability vectors of predicted words in the decoding phase of LLMs, thus influencing the personality expression of LLMs. Extensive experimentation demonstrates the remarkable effectiveness and pluggability of our method for fine-grained manipulation of LLM's personality.
△ Less
Submitted 6 January, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Authors:
Rui Zheng,
Wei Shen,
Yuan Hua,
Wenbin Lai,
Shihan Dou,
Yuhao Zhou,
Zhiheng Xi,
Xiao Wang,
Haoran Huang,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants, there's a growing expectation for them to perform consistently across various domains. However, previous work shows that Reinforcement Learning (RL) often exploi…
▽ More
The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants, there's a growing expectation for them to perform consistently across various domains. However, previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples. This focus on quick reward gains undermines both the stability in training and the model's ability to generalize to new, unseen data. In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data. Experimental results indicate that our approach significantly enhances training stability and model generalization.
△ Less
Submitted 25 December, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Authors:
Wei Shen,
Rui Zheng,
Wenyu Zhan,
Jun Zhao,
Shihan Dou,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming…
▽ More
Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming that humans prefer longer responses. The emergence of length bias often induces the model to favor longer outputs, yet it doesn't equate to an increase in helpful information within these outputs. In this paper, we propose an innovative solution, applying the Product-of-Experts (PoE) technique to separate reward modeling from the influence of sequence length. In our framework, the main expert concentrates on understanding human intents, while the biased expert targets the identification and capture of length bias. To further enhance the learning of bias, we introduce perturbations into the bias-focused expert, disrupting the flow of semantic information. Experimental results validate the effectiveness of our approach, indicating that language model performance is improved, irrespective of sequence length.
△ Less
Submitted 29 November, 2023; v1 submitted 8 October, 2023;
originally announced October 2023.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
Authors:
Zhiheng Xi,
Wenxiang Chen,
Xin Guo,
Wei He,
Yiwen Ding,
Boyang Hong,
Ming Zhang,
Junzhe Wang,
Senjie Jin,
Enyu Zhou,
Rui Zheng,
Xiaoran Fan,
Xiao Wang,
Limao Xiong,
Yuhao Zhou,
Weiran Wang,
Changhao Jiang,
Yicheng Zou,
Xiangyang Liu,
Zhangyue Yin,
Shihan Dou,
Rongxiang Weng,
Wensen Cheng,
Qi Zhang,
Wenjuan Qin
, et al. (4 additional authors not shown)
Abstract:
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra…
▽ More
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.
△ Less
Submitted 19 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Reviving Static Charts into Live Charts
Authors:
Lu Ying,
Yun Wang,
Haotian Li,
Shuguang Dou,
Haidong Zhang,
Xinyang Jiang,
Huamin Qu,
Yingcai Wu
Abstract:
Data charts are prevalent across various fields due to their efficacy in conveying complex data relationships. However, static charts may sometimes struggle to engage readers and efficiently present intricate information, potentially resulting in limited understanding. We introduce "Live Charts," a new format of presentation that decomposes complex information within a chart and explains the infor…
▽ More
Data charts are prevalent across various fields due to their efficacy in conveying complex data relationships. However, static charts may sometimes struggle to engage readers and efficiently present intricate information, potentially resulting in limited understanding. We introduce "Live Charts," a new format of presentation that decomposes complex information within a chart and explains the information pieces sequentially through rich animations and accompanying audio narration. We propose an automated approach to revive static charts into Live Charts. Our method integrates GNN-based techniques to analyze the chart components and extract data from charts. Then we adopt large natural language models to generate appropriate animated visuals along with a voice-over to produce Live Charts from static ones. We conducted a thorough evaluation of our approach, which involved the model performance, use cases, a crowd-sourced user study, and expert interviews. The results demonstrate Live Charts offer a multi-sensory experience where readers can follow the information and understand the data insights better. We analyze the benefits and drawbacks of Live Charts over static charts as a new information consumption experience.
△ Less
Submitted 17 May, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey
Authors:
Shihan Dou,
Junjie Shan,
Haoxiang Jia,
Wenhao Deng,
Zhiheng Xi,
Wei He,
Yueming Wu,
Tao Gui,
Yang Liu,
Xuanjing Huang
Abstract:
Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile, large language models (LLMs) possess diverse code-related knowledge, making them versatile for various software engineering challenges. However, LLMs' performan…
▽ More
Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile, large language models (LLMs) possess diverse code-related knowledge, making them versatile for various software engineering challenges. However, LLMs' performance in code clone detection is unclear and needs more study for accurate assessment. In this paper, we provide the first comprehensive evaluation of LLMs for clone detection, covering different clone types, languages, and prompts. We find advanced LLMs excel in detecting complex semantic clones, surpassing existing methods. Adding intermediate reasoning steps via chain-of-thought prompts noticeably enhances performance. Additionally, representing code as vector embeddings, especially with text encoders, effectively aids clone detection.Lastly, the ability of LLMs to detect code clones differs among various programming languages. Our study suggests that LLMs have potential for clone detection due to their language capabilities, offering insights for developing robust LLM-based methods to enhance software engineering.
△ Less
Submitted 5 August, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Secrets of RLHF in Large Language Models Part I: PPO
Authors:
Rui Zheng,
Shihan Dou,
Songyang Gao,
Yuan Hua,
Wei Shen,
Binghai Wang,
Yan Liu,
Senjie Jin,
Qin Liu,
Yuhao Zhou,
Limao Xiong,
Lu Chen,
Zhiheng Xi,
Nuo Xu,
Wenbin Lai,
Minghao Zhu,
Cheng Chang,
Zhangyue Yin,
Rongxiang Weng,
Wensen Cheng,
Haoran Huang,
Tianxiang Sun,
Hang Yan,
Tao Gui,
Qi Zhang
, et al. (2 additional authors not shown)
Abstract:
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current…
▽ More
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.
△ Less
Submitted 18 July, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
Authors:
Songyang Gao,
Shihan Dou,
Qi Zhang,
Xuanjing Huang,
Jin Ma,
Ying Shan
Abstract:
Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly rel…
▽ More
Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
Authors:
Songyang Gao,
Shihan Dou,
Yan Liu,
Xiao Wang,
Qi Zhang,
Zhongyu Wei,
Jin Ma,
Ying Shan
Abstract:
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effecti…
▽ More
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70\% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT's resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing
Authors:
Songyang Gao,
Shihan Dou,
Junjie Shan,
Qi Zhang,
Xuanjing Huang
Abstract:
Dataset bias, i.e., the over-reliance on dataset-specific literal heuristics, is getting increasing attention for its detrimental effect on the generalization ability of NLU models. Existing works focus on eliminating dataset bias by down-weighting problematic data in the training process, which induce the omission of valid feature information while mitigating bias. In this work, We analyze the ca…
▽ More
Dataset bias, i.e., the over-reliance on dataset-specific literal heuristics, is getting increasing attention for its detrimental effect on the generalization ability of NLU models. Existing works focus on eliminating dataset bias by down-weighting problematic data in the training process, which induce the omission of valid feature information while mitigating bias. In this work, We analyze the causes of dataset bias from the perspective of causal inference and propose CausalAPM, a generalizable literal disentangling framework to ameliorate the bias problem from feature granularity. The proposed approach projects literal and semantic information into independent feature subspaces, and constrains the involvement of literal information in subsequent predictions. Extensive experiments on three NLP benchmarks (MNLI, FEVER, and QQP) demonstrate that our proposed framework significantly improves the OOD generalization performance while maintaining ID performance.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Multi-Level Association Rule Mining for Wireless Network Time Series Data
Authors:
Chen Zhu,
Chengbo Qiu,
Shaoyu Dou,
Minghao Liao
Abstract:
Key performance indicators(KPIs) are of great significance in the monitoring of wireless network service quality. The network service quality can be improved by adjusting relevant configuration parameters(CPs) of the base station. However, there are numerous CPs and different cells may affect each other, which bring great challenges to the association analysis of wireless network data. In this pap…
▽ More
Key performance indicators(KPIs) are of great significance in the monitoring of wireless network service quality. The network service quality can be improved by adjusting relevant configuration parameters(CPs) of the base station. However, there are numerous CPs and different cells may affect each other, which bring great challenges to the association analysis of wireless network data. In this paper, we propose an adjustable multi-level association rule mining framework, which can quantitatively mine association rules at each level with environmental information, including engineering parameters and performance management(PMs), and it has interpretability at each level. Specifically, We first cluster similar cells, then quantify KPIs and CPs, and integrate expert knowledge into the association rule mining model, which improve the robustness of the model. The experimental results in real world dataset prove the effectiveness of our method.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization
Authors:
Shuo Yang,
Yang Jiao,
Shaoyu Dou,
Mana Zheng,
Chen Zhu
Abstract:
The hyperparameter optimization of neural network can be expressed as a bilevel optimization problem. The bilevel optimization is used to automatically update the hyperparameter, and the gradient of the hyperparameter is the approximate gradient based on the best response function. Finding the best response function is very time consuming. In this paper we propose CPMLHO, a new hyperparameter opti…
▽ More
The hyperparameter optimization of neural network can be expressed as a bilevel optimization problem. The bilevel optimization is used to automatically update the hyperparameter, and the gradient of the hyperparameter is the approximate gradient based on the best response function. Finding the best response function is very time consuming. In this paper we propose CPMLHO, a new hyperparameter optimization method using cutting plane method and mixed-level objective function.The cutting plane is added to the inner layer to constrain the space of the response function. To obtain more accurate hypergradient,the mixed-level can flexibly adjust the loss function by using the loss of the training set and the verification set. Compared to existing methods, the experimental results show that our method can automatically update the hyperparameters in the training process, and can find more superior hyperparameters with higher accuracy and faster convergence.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Similarity Distribution based Membership Inference Attack on Person Re-identification
Authors:
Junyao Gao,
Xinyang Jiang,
Huishuai Zhang,
Yifan Yang,
Shuguang Dou,
Dongsheng Li,
Duoqian Miao,
Cheng Deng,
Cairong Zhao
Abstract:
While person Re-identification (Re-ID) has progressed rapidly due to its wide real-world applications, it also causes severe risks of leaking personal information from training data. Thus, this paper focuses on quantifying this risk by membership inference (MI) attack. Most of the existing MI attack algorithms focus on classification models, while Re-ID follows a totally different training and inf…
▽ More
While person Re-identification (Re-ID) has progressed rapidly due to its wide real-world applications, it also causes severe risks of leaking personal information from training data. Thus, this paper focuses on quantifying this risk by membership inference (MI) attack. Most of the existing MI attack algorithms focus on classification models, while Re-ID follows a totally different training and inference paradigm. Re-ID is a fine-grained recognition task with complex feature embedding, and model outputs commonly used by existing MI like logits and losses are not accessible during inference. Since Re-ID focuses on modelling the relative relationship between image pairs instead of individual semantics, we conduct a formal and empirical analysis which validates that the distribution shift of the inter-sample similarity between training and test set is a critical criterion for Re-ID membership inference. As a result, we propose a novel membership inference attack method based on the inter-sample similarity distribution. Specifically, a set of anchor images are sampled to represent the similarity distribution conditioned on a target image, and a neural network with a novel anchor selection module is proposed to predict the membership of the target image. Our experiments validate the effectiveness of the proposed approach on both the Re-ID task and conventional classification task.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Invisible Backdoor Attack with Dynamic Triggers against Person Re-identification
Authors:
Wenli Sun,
Xinyang Jiang,
Shuguang Dou,
Dongsheng Li,
Duoqian Miao,
Cheng Deng,
Cairong Zhao
Abstract:
In recent years, person Re-identification (ReID) has rapidly progressed with wide real-world applications, but also poses significant risks of adversarial attacks. In this paper, we focus on the backdoor attack on deep ReID models. Existing backdoor attack methods follow an all-to-one or all-to-all attack scenario, where all the target classes in the test set have already been seen in the training…
▽ More
In recent years, person Re-identification (ReID) has rapidly progressed with wide real-world applications, but also poses significant risks of adversarial attacks. In this paper, we focus on the backdoor attack on deep ReID models. Existing backdoor attack methods follow an all-to-one or all-to-all attack scenario, where all the target classes in the test set have already been seen in the training set. However, ReID is a much more complex fine-grained open-set recognition problem, where the identities in the test set are not contained in the training set. Thus, previous backdoor attack methods for classification are not applicable for ReID. To ameliorate this issue, we propose a novel backdoor attack on deep ReID under a new all-to-unknown scenario, called Dynamic Triggers Invisible Backdoor Attack (DT-IBA). Instead of learning fixed triggers for the target classes from the training set, DT-IBA can dynamically generate new triggers for any unknown identities. Specifically, an identity hashing network is proposed to first extract target identity information from a reference image, which is then injected into the benign images by image steganography. We extensively validate the effectiveness and stealthiness of the proposed attack on benchmark datasets, and evaluate the effectiveness of several defense methods against our attack.
△ Less
Submitted 10 May, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding
Authors:
Songyang Gao,
Shihan Dou,
Qi Zhang,
Xuanjing Huang
Abstract:
Dataset bias has attracted increasing attention recently for its detrimental effect on the generalization ability of fine-tuned models. The current mainstream solution is designing an additional shallow model to pre-identify biased instances. However, such two-stage methods scale up the computational complexity of training process and obstruct valid feature information while mitigating bias. To ad…
▽ More
Dataset bias has attracted increasing attention recently for its detrimental effect on the generalization ability of fine-tuned models. The current mainstream solution is designing an additional shallow model to pre-identify biased instances. However, such two-stage methods scale up the computational complexity of training process and obstruct valid feature information while mitigating bias. To address this issue, we utilize the representation normalization method which aims at disentangling the correlations between features of encoded sentences. We find it also promising in eliminating the bias problem by providing isotropic data distribution. We further propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations. Our framework is end-to-end with similar time consumption to fine-tuning. Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Focus-Driven Contrastive Learniang for Medical Question Summarization
Authors:
Ming Zhang,
Shuai Dou,
Ziyang Wang,
Yunfang Wu
Abstract:
Automatic medical question summarization can significantly help the system to understand consumer health questions and retrieve correct answers. The Seq2Seq model based on maximum likelihood estimation (MLE) has been applied in this task, which faces two general problems: the model can not capture well question focus and and the traditional MLE strategy lacks the ability to understand sentence-lev…
▽ More
Automatic medical question summarization can significantly help the system to understand consumer health questions and retrieve correct answers. The Seq2Seq model based on maximum likelihood estimation (MLE) has been applied in this task, which faces two general problems: the model can not capture well question focus and and the traditional MLE strategy lacks the ability to understand sentence-level semantics. To alleviate these problems, we propose a novel question focus-driven contrastive learning framework (QFCL). Specially, we propose an easy and effective approach to generate hard negative samples based on the question focus, and exploit contrastive learning at both encoder and decoder to obtain better sentence level representations. On three medical benchmark datasets, our proposed model achieves new state-of-the-art results, and obtains a performance gain of 5.33, 12.85 and 3.81 points over the baseline BART model on three datasets respectively. Further human judgement and detailed analysis prove that our QFCL model learns better sentence representations with the ability to distinguish different sentence meanings, and generates high-quality summaries by capturing question focus.
△ Less
Submitted 14 February, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Task-aware Similarity Learning for Event-triggered Time Series
Authors:
Shaoyu Dou,
Kai Yang,
Yang Jiao,
Chengbo Qiu,
Kui Ren
Abstract:
Time series analysis has achieved great success in diverse applications such as network security, environmental monitoring, and medical informatics. Learning similarities among different time series is a crucial problem since it serves as the foundation for downstream analysis such as clustering and anomaly detection. It often remains unclear what kind of distance metric is suitable for similarity…
▽ More
Time series analysis has achieved great success in diverse applications such as network security, environmental monitoring, and medical informatics. Learning similarities among different time series is a crucial problem since it serves as the foundation for downstream analysis such as clustering and anomaly detection. It often remains unclear what kind of distance metric is suitable for similarity learning due to the complex temporal dynamics of the time series generated from event-triggered sensing, which is common in diverse applications, including automated driving, interactive healthcare, and smart home automation. The overarching goal of this paper is to develop an unsupervised learning framework that is capable of learning task-aware similarities among unlabeled event-triggered time series. From the machine learning vantage point, the proposed framework harnesses the power of both hierarchical multi-scale sequence autoencoders and Gaussian Mixture Model (GMM) to effectively learn the low-dimensional representations from the time series. Finally, the obtained similarity measure can be easily visualized for explaining. The proposed framework aspires to offer a stepping stone that gives rise to a systematic approach to model and learn similarities among a multitude of event-triggered time series. Through extensive qualitative and quantitative experiments, it is revealed that the proposed method outperforms state-of-the-art methods considerably.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Towards Privacy-Preserving Person Re-identification via Person Identify Shift
Authors:
Shuguang Dou,
Xinyang Jiang,
Qingsong Zhao,
Dongsheng Li,
Cairong Zhao
Abstract:
Recently privacy concerns of person re-identification (ReID) raise more and more attention and preserving the privacy of the pedestrian images used by ReID methods become essential. De-identification (DeID) methods alleviate privacy issues by removing the identity-related of the ReID data. However, most of the existing DeID methods tend to remove all personal identity-related information and compr…
▽ More
Recently privacy concerns of person re-identification (ReID) raise more and more attention and preserving the privacy of the pedestrian images used by ReID methods become essential. De-identification (DeID) methods alleviate privacy issues by removing the identity-related of the ReID data. However, most of the existing DeID methods tend to remove all personal identity-related information and compromise the usability of de-identified data on the ReID task. In this paper, we aim to develop a technique that can achieve a good trade-off between privacy protection and data usability for person ReID. To achieve this, we propose a novel de-identification method designed explicitly for person ReID, named Person Identify Shift (PIS). PIS removes the absolute identity in a pedestrian image while preserving the identity relationship between image pairs. By exploiting the interpolation property of variational auto-encoder, PIS shifts each pedestrian image from the current identity to another with a new identity, resulting in images still preserving the relative identities. Experimental results show that our method has a better trade-off between privacy-preserving and model performance than existing de-identification methods and can defend against human and model attacks for data privacy.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.