Search | arXiv e-print repository

XBOUND: Exploring the Capability Boundaries of Device-Control Agents through Trajectory Tree Exploration

Authors: Shaoqing Zhang, Kehai Chen, Zhuosheng Zhang, Rumei Li, Rongxiang Weng, Yang Xiang, Liqiang Nie, Min Zhang

Abstract: Recent advancements in vision-language models (VLMs) have spurred increased interest in Device-Control Agents (DC agents), such as utilizing in-the-wild device control to manage graphical user interfaces. Conventional methods for assessing the capabilities of DC agents, such as computing step-wise action accuracy and overall task success rates, provide a macroscopic view of DC agents' performance;… ▽ More Recent advancements in vision-language models (VLMs) have spurred increased interest in Device-Control Agents (DC agents), such as utilizing in-the-wild device control to manage graphical user interfaces. Conventional methods for assessing the capabilities of DC agents, such as computing step-wise action accuracy and overall task success rates, provide a macroscopic view of DC agents' performance; however, they fail to offer microscopic insights into potential errors that may occur in real-world applications. Conducting a finer-grained performance evaluation of DC agents presents significant challenges. This study introduces a new perspective on evaluation methods for DC agents by proposing the XBOUND evaluation method, which employs the calculation of a novel Explore Metric to delineate the capability boundaries of DC agents. Compared to previous evaluation methods, XBOUND focuses on individual states to assess the proficiency of DC agents in mastering these states. Furthermore, we have developed a ``pseudo'' episode tree dataset derived from Android Control test data. Utilizing this dataset and XBOUND, we comprehensively evaluate the OS-Atlas and UI-TARS series, examining both the overall and specific performance across five common tasks. Additionally, we select representative cases to highlight the current deficiencies and limitations inherent in both series. Code is available at https://github.com/sqzhang-lazy/XBOUND. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.10597 [pdf, other]

Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment

Authors: Jiazheng Zhang, Wenqing Jing, Zizhuo Zhang, Zhiheng Xi, Shihan Dou, Rongxiang Weng, Jiahuan Li, Jingang Wang, Mingxu Chai, Shibo Hong, Tao Gui, Qi Zhang

Abstract: Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human values. However, noisy preferences in human feedback can lead to reward misgeneralization - a phenomenon where reward models learn spurious correlations or overfit to noisy preferences, which poses important challenges to the generalization of RMs. This paper systematically analyzes the characteristics of p… ▽ More Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human values. However, noisy preferences in human feedback can lead to reward misgeneralization - a phenomenon where reward models learn spurious correlations or overfit to noisy preferences, which poses important challenges to the generalization of RMs. This paper systematically analyzes the characteristics of preference pairs and aims to identify how noisy preferences differ from human-aligned preferences in reward modeling. Our analysis reveals that noisy preferences are difficult for RMs to fit, as they cause sharp training fluctuations and irregular gradient updates. These distinctive dynamics suggest the feasibility of identifying and excluding such noisy preferences. Empirical studies demonstrate that policy LLM optimized with a reward model trained on the full preference dataset, which includes substantial noise, performs worse than the one trained on a subset of exclusively high quality preferences. To address this challenge, we propose an online Collaborative Reward Modeling (CRM) framework to achieve robust preference learning through peer review and curriculum learning. In particular, CRM maintains two RMs that collaboratively filter potential noisy preferences by peer-reviewing each other's data selections. Curriculum learning synchronizes the capabilities of two models, mitigating excessive disparities to promote the utility of peer review. Extensive experiments demonstrate that CRM significantly enhances RM generalization, with up to 9.94 points improvement on RewardBench under an extreme 40\% noise. Moreover, CRM can seamlessly extend to implicit-reward alignment methods, offering a robust and versatile alignment strategy. △ Less

Submitted 18 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2504.18818 [pdf, other]

Frequency-Integrated Transformer for Arbitrary-Scale Super-Resolution

Authors: Xufei Wang, Fei Ge, Jinchen Zhu, Mingjian Zhang, Qi Wu, Jifeng Ren Shizhuang Weng

Abstract: Methods based on implicit neural representation have demonstrated remarkable capabilities in arbitrary-scale super-resolution (ASSR) tasks, but they neglect the potential value of the frequency domain, leading to sub-optimal performance. We proposes a novel network called Frequency-Integrated Transformer (FIT) to incorporate and utilize frequency information to enhance ASSR performance. FIT employ… ▽ More Methods based on implicit neural representation have demonstrated remarkable capabilities in arbitrary-scale super-resolution (ASSR) tasks, but they neglect the potential value of the frequency domain, leading to sub-optimal performance. We proposes a novel network called Frequency-Integrated Transformer (FIT) to incorporate and utilize frequency information to enhance ASSR performance. FIT employs Frequency Incorporation Module (FIM) to introduce frequency information in a lossless manner and Frequency Utilization Self-Attention module (FUSAM) to efficiently leverage frequency information by exploiting spatial-frequency interrelationship and global nature of frequency. FIM enriches detail characterization by incorporating frequency information through a combination of Fast Fourier Transform (FFT) with real-imaginary mapping. In FUSAM, Interaction Implicit Self-Attention (IISA) achieves cross-domain information synergy by interacting spatial and frequency information in subspace, while Frequency Correlation Self-attention (FCSA) captures the global context by computing correlation in frequency. Experimental results demonstrate FIT yields superior performance compared to existing methods across multiple benchmark datasets. Visual feature map proves the superiority of FIM in enriching detail characterization. Frequency error map validates IISA productively improve the frequency fidelity. Local attribution map validates FCSA effectively captures global context. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: 11pages,8figures

arXiv:2504.01801 [pdf, other]

Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training

Authors: Zhijun Wang, Jiahuan Li, Hao Zhou, Rongxiang Weng, Jingang Wang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

Abstract: Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training corpus. We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. We conduct an… ▽ More Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training corpus. We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. We conduct an analysis to investigate code-switching in the pre-training corpus, examining its presence and categorizing it into four types within two quadrants. We then assess its impact on multilingual performance. These types of code-switching data are unbalanced in proportions and demonstrate different effects on facilitating language transfer. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching. We continuously scale up the synthetic code-switching data and observe remarkable improvements in both benchmarks and representation space. Extensive experiments indicate that incorporating synthetic code-switching data enables better language alignment and generalizes well to high, medium, and low-resource languages with pre-training corpora of varying qualities. △ Less

Submitted 22 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

arXiv:2502.05878 [pdf, ps, other]

Retrieval-augmented Large Language Models for Financial Time Series Forecasting

Authors: Mengxi Xiao, Zihao Jiang, Lingfei Qian, Zhengyu Chen, Yueru He, Yijing Xu, Yuecheng Jiang, Dong Li, Ruey-Ling Weng, Min Peng, Jimin Huang, Sophia Ananiadou, Qianqian Xie

Abstract: Accurately forecasting stock price movements is critical for informed financial decision-making, supporting applications ranging from algorithmic trading to risk management. However, this task remains challenging due to the difficulty of retrieving subtle yet high-impact patterns from noisy financial time-series data, where conventional retrieval methods, whether based on generic language models o… ▽ More Accurately forecasting stock price movements is critical for informed financial decision-making, supporting applications ranging from algorithmic trading to risk management. However, this task remains challenging due to the difficulty of retrieving subtle yet high-impact patterns from noisy financial time-series data, where conventional retrieval methods, whether based on generic language models or simplistic numeric similarity, often fail to capture the intricate temporal dependencies and context-specific signals essential for precise market prediction. To bridge this gap, we introduce FinSrag, the first retrieval-augmented generation (RAG) framework with a novel domain-specific retriever FinSeer for financial time-series forecasting. FinSeer leverages a candidate selection mechanism refined by LLM feedback and a similarity-driven training objective to align queries with historically influential sequences while filtering out financial noise. Such training enables FinSeer to identify the most relevant time-series data segments for downstream forecasting tasks, unlike embedding or distance-based retrieval methods used in existing RAG frameworks. The retrieved patterns are then fed into StockLLM, a 1B-parameter LLM fine-tuned for stock movement prediction, which serves as the generative backbone. Beyond the retrieval method, we enrich the retrieval corpus by curating new datasets that integrate a broader set of financial indicators, capturing previously overlooked market dynamics. Experiments demonstrate that FinSeer outperforms existing textual retrievers and traditional distance-based retrieval approaches in enhancing the prediction accuracy of StockLLM, underscoring the importance of domain-specific retrieval frameworks in handling the complexity of financial time-series data. △ Less

Submitted 6 June, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

Comments: 11 pages, 4 figures

arXiv:2502.05559 [pdf, other]

Channel Estimation for RIS-Aided MU-MIMO mmWave Systems with Practical Hybrid Architecture

Authors: Liuchang Zhuo, Cunhua Pan, Hong Ren, Ruisong Weng, Shi Jin, A. Lee Swindlehurst, Jiangzhou Wang

Abstract: This paper proposes a correlation-based three-stage channel estimation strategy with low pilot overhead for reconfigurable intelligent surface (RIS)-aided millimeter wave (mmWave) multi-user (MU) MIMO systems, in which both users and base station (BS) are equipped with a hybrid RF architecture. In Stage I, all users jointly transmit pilots and recover the uncompressed received signals to estimate… ▽ More This paper proposes a correlation-based three-stage channel estimation strategy with low pilot overhead for reconfigurable intelligent surface (RIS)-aided millimeter wave (mmWave) multi-user (MU) MIMO systems, in which both users and base station (BS) are equipped with a hybrid RF architecture. In Stage I, all users jointly transmit pilots and recover the uncompressed received signals to estimate the angle of arrival (AoA) at the BS using the discrete Fourier transform (DFT). Based on the observation that the overall cascaded MIMO channel can be decomposed into multiple sub-channels, the cascaded channel for a typical user is estimated in Stage II. Specifically, using the invariance of angles and the linear correlation of gains related to different cascaded subchannels, we use compressive sensing (CS), least squares (LS), and a one-dimensional search to estimate the Angles of Departure (AoDs), based on which the overall cascaded channel is obtained. In Stage III, the remaining users independently transmit pilots to estimate their individual cascaded channel with the same approach as in Stage II, which exploits the equivalent common RIS-BS channel obtained in Stage II to reduce the pilot overhead. In addition, the hybrid combining matrix and the RIS phase shift matrix are designed to reduce the noise power, thereby further improving the estimation performance. Simulation results demonstrate that the proposed algorithm can achieve high estimation accuracy especially when the number of antennas at the users is small, and reduce pilot overhead by more than five times compared with the existing benchmark approach. △ Less

Submitted 8 February, 2025; originally announced February 2025.

Comments: 13 pages, 7 figures, 1 table

arXiv:2502.05551 [pdf, ps, other]

FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

Authors: Xuemiao Zhang, Feiyu Duan, Liangyu Xu, Yongwei Zhou, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai

Abstract: Large language models (LLMs) have significantly advanced human language understanding and generation, with pretraining data quality and organization being crucial to their performance. Multi-stage pretraining is a promising approach, but existing methods often lack quantitative criteria for data partitioning and instead rely on intuitive heuristics. In this paper, we propose the novel Four-quadRAn… ▽ More Large language models (LLMs) have significantly advanced human language understanding and generation, with pretraining data quality and organization being crucial to their performance. Multi-stage pretraining is a promising approach, but existing methods often lack quantitative criteria for data partitioning and instead rely on intuitive heuristics. In this paper, we propose the novel Four-quadRAnt Multi-stage prEtraining strategy (FRAME), guided by the established principle of organizing the pretraining process into four stages to achieve significant loss reductions four times. This principle is grounded in two key findings: first, training on high Perplexity (PPL) data followed by low PPL data, and second, training on low PPL difference (PD) data followed by high PD data, both causing the loss to drop significantly twice and performance enhancements. By partitioning data into four quadrants and strategically organizing them, FRAME achieves a remarkable 16.8% average improvement over random across MMLU and CMMLU for the 3B model, effectively boosting LLM performance. △ Less

Submitted 31 May, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

arXiv:2502.00761 [pdf, other]

FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training

Authors: Liangyu Xu, Xuemiao Zhang, Feiyu Duan, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai

Abstract: Selecting high-quality data can improve the pretraining efficiency of large language models (LLMs). Existing methods generally rely on heuristic techniques or single quality signals, limiting their ability to evaluate data quality comprehensively. In this work, we propose FIRE, a flexible and scalable framework for integrating multiple data quality raters, which allows for a comprehensive assessme… ▽ More Selecting high-quality data can improve the pretraining efficiency of large language models (LLMs). Existing methods generally rely on heuristic techniques or single quality signals, limiting their ability to evaluate data quality comprehensively. In this work, we propose FIRE, a flexible and scalable framework for integrating multiple data quality raters, which allows for a comprehensive assessment of data quality across various dimensions. FIRE aligns multiple quality signals into a unified space, and integrates diverse data quality raters to provide a comprehensive quality signal for each data point. Further, we introduce a progressive data selection scheme based on FIRE that iteratively refines the selection of high-quality data points. Extensive experiments show that FIRE outperforms other data selection methods and significantly boosts pretrained model performance across a wide range of downstream tasks, while requiring less than 37.5\% of the training data needed by the Random baseline to reach the target performance. △ Less

Submitted 22 May, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

Comments: 21 pages, 11 figures

arXiv:2501.13126 [pdf, other]

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data

Authors: Xuemiao Zhang, Liangyu Xu, Feiyu Duan, Yongwei Zhou, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai

Abstract: Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need for pretraining with different data at various training stages. To achieve it, we propose the Perplexity Difference (PD) based Preference Curriculum learning (… ▽ More Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need for pretraining with different data at various training stages. To achieve it, we propose the Perplexity Difference (PD) based Preference Curriculum learning (PDPC) framework, which always perceives and uses the data preferred by LLMs to train and boost them. First, we introduce the PD metric to quantify the difference in how challenging a sample is for weak versus strong models. Samples with high PD are more challenging for weak models to learn and are more suitable to be arranged in the later stage of pretraining. Second, we propose the preference function to approximate and predict the data preference of the LLM at any training step, so as to complete the arrangement of the dataset offline and ensure continuous training without interruption. Experimental results on 1.3B and 3B models demonstrate that PDPC significantly surpasses baselines. Notably, the 3B model trained on 1T tokens achieves an increased average accuracy of over 8.1% across MMLU and CMMLU. △ Less

Submitted 17 February, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: 18 pages, 13 figures

arXiv:2412.10423 [pdf, other]

Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM

Authors: Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Rongxiang Weng, Muyun Yang, Tiejun Zhao, Min Zhang

Abstract: Despite being empowered with alignment mechanisms, large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks that can compromise their alignment mechanisms. This vulnerability poses significant risks to real-world applications. Existing work faces challenges in both training efficiency and generalization capabilities (i.e., Reinforcement Learning from Human Feedback an… ▽ More Despite being empowered with alignment mechanisms, large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks that can compromise their alignment mechanisms. This vulnerability poses significant risks to real-world applications. Existing work faces challenges in both training efficiency and generalization capabilities (i.e., Reinforcement Learning from Human Feedback and Red-Teaming). Developing effective strategies to enable LLMs to resist continuously evolving jailbreak attempts represents a significant challenge. To address this challenge, we propose a novel defensive paradigm called GuidelineLLM, which assists LLMs in recognizing queries that may have harmful content. Before LLMs respond to a query, GuidelineLLM first identifies potential risks associated with the query, summarizes these risks into guideline suggestions, and then feeds these guidelines to the responding LLMs. Importantly, our approach eliminates the necessity for additional safety fine-tuning of the LLMs themselves; only the GuidelineLLM requires fine-tuning. This characteristic enhances the general applicability of GuidelineLLM across various LLMs. Experimental results demonstrate that GuidelineLLM can significantly reduce the attack success rate (ASR) against LLM (an average reduction of 34.17\% ASR) while maintaining the usefulness of LLM in handling benign queries. The code is available at https://github.com/sqzhang-lazy/GuidelineLLM. △ Less

Submitted 14 April, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

Comments: AAAI 2025

arXiv:2412.00491 [pdf]

CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models

Authors: Yan Wang, Jimin Huang, Huan He, Vincent Zhang, Yujia Zhou, Xubing Hao, Pritham Ram, Lingfei Qian, Qianqian Xie, Ruey-Ling Weng, Fongci Lin, Yan Hu, Licong Cui, Xiaoqian Jiang, Hua Xu, Na Hong

Abstract: Common Data Elements (CDEs) standardize data collection and sharing across studies, enhancing data interoperability and improving research reproducibility. However, implementing CDEs presents challenges due to the broad range and variety of data elements. This study aims to develop an effective and efficient mapping tool to bridge the gap between local data elements and National Institutes of Heal… ▽ More Common Data Elements (CDEs) standardize data collection and sharing across studies, enhancing data interoperability and improving research reproducibility. However, implementing CDEs presents challenges due to the broad range and variety of data elements. This study aims to develop an effective and efficient mapping tool to bridge the gap between local data elements and National Institutes of Health (NIH) CDEs. We propose CDEMapper, a large language model (LLM) powered mapping tool designed to assist in mapping local data elements to NIH CDEs. CDEMapper has three core modules: (1) CDE indexing and embeddings. NIH CDEs were indexed and embedded to support semantic search; (2) CDE recommendations. The tool combines Elasticsearch (BM25 similarity methods) with state of the art GPT services to recommend candidate CDEs and their permissible values; and (3) Human review. Users review and select the NIH CDEs and values that best match their data elements and value sets. We evaluate the tool recommendation accuracy against manually annotated mapping results. CDEMapper offers a publicly available, LLM-powered, and intuitive user interface that consolidates essential and advanced mapping services into a streamlined pipeline. It provides a step by step, quality assured mapping workflow designed with a user-centered approach. The evaluation results demonstrated that augmenting BM25 with GPT embeddings and a ranker consistently enhances CDEMapper mapping accuracy in three different mapping settings across four evaluation datasets. This work opens up the potential of using LLMs to assist with CDE recommendation and human curation when aligning local data elements with NIH CDEs. Additionally, this effort enhances clinical research data interoperability and helps researchers better understand the gaps between local data elements and NIH CDEs. △ Less

Submitted 30 November, 2024; originally announced December 2024.

Comments: 11 pages,4 figures

arXiv:2411.16579 [pdf, other]

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Authors: Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang

Abstract: Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors su… ▽ More Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and train-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of $76,321$ responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's self-training process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase its potential. Our code and datasets are at \href{https://mathcritique.github.io/}{https://mathcritique.github.io/}. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Comments: Preprint

arXiv:2411.10020 [pdf, other]

Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?

Authors: Yan Hu, Xu Zuo, Yujia Zhou, Xueqing Peng, Jimin Huang, Vipina K. Keloth, Vincent J. Zhang, Ruey-Ling Weng, Qingyu Chen, Xiaoqian Jiang, Kirk E. Roberts, Hua Xu

Abstract: Backgrounds: Information extraction (IE) is critical in clinical natural language processing (NLP). While large language models (LLMs) excel on generative tasks, their performance on extractive tasks remains debated. Methods: We investigated Named Entity Recognition (NER) and Relation Extraction (RE) using 1,588 clinical notes from four sources (UT Physicians, MTSamples, MIMIC-III, and i2b2). We d… ▽ More Backgrounds: Information extraction (IE) is critical in clinical natural language processing (NLP). While large language models (LLMs) excel on generative tasks, their performance on extractive tasks remains debated. Methods: We investigated Named Entity Recognition (NER) and Relation Extraction (RE) using 1,588 clinical notes from four sources (UT Physicians, MTSamples, MIMIC-III, and i2b2). We developed an annotated corpus covering 4 clinical entities and 16 modifiers, and compared instruction-tuned LLaMA-2 and LLaMA-3 against BERT in terms of performance, generalizability, computational resources, and throughput to BERT. Results: LLaMA models outperformed BERT across datasets. With sufficient training data, LLaMA showed modest improvements (1% on NER, 1.5-3.7% on RE); improvements were larger with limited training data. On unseen i2b2 data, LLaMA-3-70B outperformed BERT by 7% (F1) on NER and 4% on RE. However, LLaMA models required more computing resources and ran up to 28 times slower. We implemented "Kiwi," a clinical IE package featuring both models, available at https://kiwi.clinicalnlp.org/. Conclusion: This study is among the first to develop and evaluate a comprehensive clinical IE system using open-source LLMs. Results indicate that LLaMA models outperform BERT for clinical NER and RE but with higher computational costs and lower throughputs. These findings highlight that choosing between LLMs and traditional deep learning methods for clinical IE applications should remain task-specific, taking into account both performance metrics and practical considerations such as available computing resources and the intended use case scenarios. △ Less

Submitted 7 January, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

arXiv:2410.23074 [pdf, other]

Multi-Programming Language Sandbox for LLMs

Authors: Shihan Dou, Jiazheng Zhang, Jianxiang Zang, Yunbo Tao, Weikang Zhou, Haoxiang Jia, Shichun Liu, Yuming Yang, Zhiheng Xi, Shenxi Wu, Shaoqing Zhang, Muling Wu, Changze Lv, Limao Xiong, Wenyu Zhan, Lin Zhang, Rongxiang Weng, Jingang Wang, Xunliang Cai, Yueming Wu, Ming Wen, Rui Zheng, Tao Ji, Yixin Cao, Tao Gui , et al. (3 additional authors not shown)

Abstract: We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox also integrates bo… ▽ More We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox also integrates both traditional and LLM-based code analysis tools, providing a comprehensive analysis of generated code. MPLSandbox can be effortlessly integrated into the training and deployment of LLMs to improve the quality and correctness of their generated code. It also helps researchers streamline their workflows for various LLM-based code-related tasks, reducing the development cost. To validate the effectiveness of MPLSandbox, we integrate it into training and deployment approaches, and also employ it to optimize workflows for a wide range of real-world code-related tasks. Our goal is to enhance researcher productivity on LLM-based code-related tasks by simplifying and automating workflows through delegation to MPLSandbox. △ Less

Submitted 5 November, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

Comments: 25 pages, 14 figures

arXiv:2409.06411 [pdf, other]

Length Desensitization in Direct Preference Optimization

Authors: Wei Liu, Yang Bai, Chengcheng Han, Rongxiang Weng, Jun Xu, Xuezhi Cao, Jingang Wang, Xunliang Cai

Abstract: Direct Preference Optimization (DPO) is widely utilized in the Reinforcement Learning from Human Feedback (RLHF) phase to align Large Language Models (LLMs) with human preferences, thereby enhancing both their harmlessness and efficacy. However, it has been observed that DPO tends to over-optimize for verbosity, which can detrimentally affect both performance and user experience. In this paper, we… ▽ More Direct Preference Optimization (DPO) is widely utilized in the Reinforcement Learning from Human Feedback (RLHF) phase to align Large Language Models (LLMs) with human preferences, thereby enhancing both their harmlessness and efficacy. However, it has been observed that DPO tends to over-optimize for verbosity, which can detrimentally affect both performance and user experience. In this paper, we conduct an in-depth theoretical analysis of DPO's optimization objective and reveal a strong correlation between its implicit reward and data length. This correlation misguides the optimization direction, resulting in length sensitivity during the DPO training and leading to verbosity. To address this issue, we propose a length-desensitization improvement method for DPO, termed LD-DPO. The proposed method aims to desensitize DPO to data length by decoupling explicit length preference, which is relatively insignificant, from the other implicit preferences, thereby enabling more effective learning of the intrinsic preferences. We utilized two settings (Base and Instruct) of Llama2-13B, Llama3-8B, and Qwen2-7B for experimental validation on various benchmarks including MT-Bench and AlpacaEval 2. The experimental results indicate that LD-DPO consistently outperforms DPO and other baseline methods, achieving more concise responses with a 10-40% reduction in length compared to DPO. We conducted in-depth experimental analyses to demonstrate that LD-DPO can indeed achieve length desensitization and align the model more closely with human-like preferences. △ Less

Submitted 27 November, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: 21 pages, 9 figures

arXiv:2408.11878 [pdf, ps, other]

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Authors: Jimin Huang, Mengxi Xiao, Dong Li, Zihao Jiang, Yuzhe Yang, Yifei Zhang, Lingfei Qian, Yan Wang, Xueqing Peng, Yang Ren, Ruoyu Xiang, Zhengyu Chen, Xiao Zhang, Yueru He, Weiguang Han, Shunian Chen, Lihang Shen, Daniel Kim, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram , et al. (19 additional authors not shown)

Abstract: Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, t… ▽ More Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, time-series, and chart data, excelling in zero-shot, few-shot, and fine-tuning settings. The suite includes FinLLaMA, pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs for strong cross-modal reasoning. We comprehensively evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings, introducing two new multimodal evaluation datasets. Our results show that Open-FinLLMs outperforms afvanced financial and general LLMs such as GPT-4, across financial NLP, decision-making, and multi-modal tasks, highlighting their potential to tackle real-world challenges. To foster innovation and collaboration across academia and industry, we release all codes (https://anonymous.4open.science/r/PIXIU2-0D70/B1D7/LICENSE) and models under OSI-approved licenses. △ Less

Submitted 6 June, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 33 pages, 13 figures

arXiv:2407.06153 [pdf, other]

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of these existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and four popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions. Additionally, we developed a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. Furthermore, to better understand the performance of LLMs in real-world projects, we manually created a real-world benchmark comprising 140 code generation tasks. Our analysis highlights distinct differences in bug distributions between actual scenarios and existing benchmarks. Finally, we propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback. Experimental results demonstrate that our approach can significantly mitigate bugs and increase the passing rate by 29.2% after two iterations, indicating substantial potential for LLMs to handle more complex problems. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 17 pages, 7 figures

arXiv:2403.02942 [pdf, other]

Channel Estimation for mmWave MIMO-OFDM Systems in High-Mobility Scenarios: Instantaneous Model or Statistical Model?

Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Gui Zhou, Ruisong Weng, Jiangzhou Wang

Abstract: Classical linear statistical models, like the first-order auto-regressive (AR) model, are commonly used as channel model in high-mobility scenarios. However, compared to sub-6G, the effect of Doppler frequency shifts is more significant at millimeter wave (mmWave) frequencies, and the effectiveness of the statistical channel model in high-mobility mmWave scenarios should be reconsidered. In this p… ▽ More Classical linear statistical models, like the first-order auto-regressive (AR) model, are commonly used as channel model in high-mobility scenarios. However, compared to sub-6G, the effect of Doppler frequency shifts is more significant at millimeter wave (mmWave) frequencies, and the effectiveness of the statistical channel model in high-mobility mmWave scenarios should be reconsidered. In this paper, we investigate the channel estimation for mmWave multiple-input multiple-output-(MIMO) orthogonal frequency division multiplexing (OFDM) systems in high-mobility scenarios, with the focus on the comparison between the instantaneous channel model and the statistical channel model. For the instantaneous model, by leveraging the low-rank nature of mmWave channels and the multidimensional characteristics of MIMO-OFDM signals across space, time, and frequency, the received signals are structured as a fourth-order tensor fitting a low-rank CANDECOMP/PARAFAC (CP) model. Then, to solve the CP decomposition problem, an estimation of signal parameters via rotational invariance techniques (ESPRIT)-type decomposition based channel estimation method is proposed by exploring the Vandermonde structure of factor matrix, and the channel parameters are then estimated from the factor matrices. We analyze the uniqueness condition of the CP decomposition and develop a concise derivation of the Cramer-Rao bound (CRB) for channel parameters. Simulations show that our method outperforms the existing benchmarks. Furthermore, the results based on the wireless environment generated by Wireless InSite verify that the channel estimation based on the instantaneous channel model performs better than that based on the statistical channel model. Therefore, the instantaneous channel model is recommended for designing channel estimation algorithm for mmWave systems in high-mobility scenarios. △ Less

Submitted 27 August, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.05847 [pdf, other]

Reconfigurable Intelligent Surface-Aided Dual-Function Radar and Communication Systems With MU-MIMO Communication

Authors: Yasheng Jin, Hong Ren, Cunhua Pan, Zhiyuan Yu, Ruisong Weng, Boshi Wang, Gui Zhou, Yongchao He, Maged Elkashlan

Abstract: In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem i… ▽ More In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem into an equivalent one. Then, we utilize an alternating optimization (AO) { algorithm} to decouple the optimization variables and decompose this challenging problem into two subproblems. Given reflecting coefficients, a penalty-based algorithm is utilized to deal with the non-convex radar signal-to-noise ratio (SNR) constraints. For the given beamforming matrix of the BS, we apply majorization-minimization (MM) to transform the problem into a quadratic constraint quadratic programming (QCQP) problem, which is ultimately solved using a semidefinite relaxation (SDR)-based algorithm. Simulation results illustrate the advantage of deploying RIS in the considered multi-user MIMO (MU-MIMO) ISAC systems. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04532 [pdf, other]

Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems

Authors: Mengyu Liu, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Kangda Zhi, Yongchao He

Abstract: Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur… ▽ More Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigurable intelligent surface (RIS) has been introduced into RCC systems to reduce the interference. However, the "multiplicative fading" effect introduced by passive RIS limits its performance. To tackle this issue, we consider a double active RIS-assisted RCC system, which focuses on the design of the radar's beamforming vector and the active RISs' reflecting coefficient matrices, to maximize the achievable data rate of the communication system. The considered system needs to meet the radar detection constraint and the power budgets at the radar and the RISs. Since the problem is non-convex, we propose an algorithm based on the penalty dual decomposition (PDD) framework. Specifically, we initially introduce auxiliary variables to reformulate the coupled variables into equation constraints and incorporate these constraints into the objective function through the PDD framework. Then, we decouple the equivalent problem into several subproblems by invoking the block coordinate descent (BCD) method. Furthermore, we employ the Lagrange dual method to alternately optimize these subproblems. Simulation results verify the effectiveness of the proposed algorithm. Furthermore, the results also show that under the same power budget, deploying double active RISs in RCC systems can achieve higher data rate than those with single active RIS and double passive RISs. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02122 [pdf, other]

Secure Wireless Communication in Active RIS-Assisted DFRC System

Authors: Yang Zhang, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Tuo Wu, Yongchao He

Abstract: This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-r… ▽ More This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-ratio (SINR) constraint of the radar echo and the power consumption constraints at the DFRC-BS and active RIS. An alternating optimization (AO) algorithm based on semi-definite relaxation (SDR) and majorizationminimization (MM) is applied to solve the SR-maximization problem by alternately optimizing the beamforming matrix and the reflecting coefficients. Specifically, we first apply the SDR and successive convex approximation (SCA) methods to transform the two subproblems into more tractable forms, then the MM method is applied to derive a concave surrogate function and iteratively solve the subproblems. Finally, simulation results indicate that the active RIS can better confront the impact of "multiplicative fading" and outperforms traditional passive RIS in terms of both secure data rate and radar sensing performance. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures

arXiv:2310.10386 [pdf, other]

Rating of players by Laplace approximation and dynamic modeling

Authors: Hsuan-Fu Hua, Ching-Ju Chang, Tse-Ching Lin, Ruby Chiu-Hsing Weng

Abstract: The Elo rating system is a simple and widely used method for calculating players' skills from paired comparisons data. Many have extended it in various ways. Yet the question of updating players' variances remains to be further explored. In this paper, we address the issue of variance update by using the Laplace approximation for posterior distribution, together with a random walk model for the dy… ▽ More The Elo rating system is a simple and widely used method for calculating players' skills from paired comparisons data. Many have extended it in various ways. Yet the question of updating players' variances remains to be further explored. In this paper, we address the issue of variance update by using the Laplace approximation for posterior distribution, together with a random walk model for the dynamics of players' strengths, and a lower bound on players' variances. The random walk model is motivated by the Glicko system, but here we assume nonidentically distributed increments to take care of player heterogeneity. Experiments on men's professional matches showed that the prediction accuracy slightly improves when the variance update is performed. They also showed that new players' strengths may be better captured with the variance update. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.07864 [pdf, other]

The Rise and Potential of Large Language Model Based Agents: A Survey

Authors: Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin , et al. (4 additional authors not shown)

Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra… ▽ More For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List. △ Less

Submitted 19 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 86 pages, 12 figures

arXiv:2307.04964 [pdf, other]

Secrets of RLHF in Large Language Models Part I: PPO

Authors: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang , et al. (2 additional authors not shown)

Abstract: Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current… ▽ More Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs. △ Less

Submitted 18 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2303.10966 [pdf, other]

Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Authors: Rongxiang Weng, Qiang Wang, Wensen Cheng, Changfeng Zhu, Min Zhang

Abstract: Neural machine translation (NMT) has achieved remarkable success in producing high-quality translations. However, current NMT systems suffer from a lack of reliability, as their outputs that are often affected by lexical or syntactic changes in inputs, resulting in large variations in quality. This limitation hinders the practicality and trustworthiness of NMT. A contributing factor to this proble… ▽ More Neural machine translation (NMT) has achieved remarkable success in producing high-quality translations. However, current NMT systems suffer from a lack of reliability, as their outputs that are often affected by lexical or syntactic changes in inputs, resulting in large variations in quality. This limitation hinders the practicality and trustworthiness of NMT. A contributing factor to this problem is that NMT models trained with the one-to-one paradigm struggle to handle the source diversity phenomenon, where inputs with the same meaning can be expressed differently. In this work, we treat this problem as a bilevel optimization problem and present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it. Specifically, the NMT model with CAML (named CoNMT) first learns a consistent meta representation of semantically equivalent sentences in the outer loop. Subsequently, a mapping from the meta representation to the output sentence is learned in the inner loop, allowing the NMT model to translate semantically equivalent sentences to the same target sentence. We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task. The results demonstrate that CoNMT effectively improves overall translation quality and reliably handles diverse inputs. △ Less

Submitted 19 September, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

arXiv:2209.08738 [pdf, other]

Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Authors: Qiang Wang, Rongxiang Weng, Ming Chen

Abstract: K-Nearest Neighbor Neural Machine Translation (kNN-MT) successfully incorporates external corpus by retrieving word-level representations at test time. Generally, kNN-MT borrows the off-the-shelf context representation in the translation task, e.g., the output of the last decoder layer, as the query vector of the retrieval task. In this work, we highlight that coupling the representations of these… ▽ More K-Nearest Neighbor Neural Machine Translation (kNN-MT) successfully incorporates external corpus by retrieving word-level representations at test time. Generally, kNN-MT borrows the off-the-shelf context representation in the translation task, e.g., the output of the last decoder layer, as the query vector of the retrieval task. In this work, we highlight that coupling the representations of these two tasks is sub-optimal for fine-grained retrieval. To alleviate it, we leverage supervised contrastive learning to learn the distinctive retrieval representation derived from the original context representation. We also propose a fast and effective approach to constructing hard negative samples. Experimental results on five domains show that our approach improves the retrieval accuracy and BLEU score compared to vanilla kNN-MT. △ Less

Submitted 19 September, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

Comments: Accepted by COLING 2022

arXiv:2209.01438 [pdf, other]

Active Reconfigurable Intelligent Surface for Mobile Edge Computing

Authors: Zhangjie Peng, Ruisong Weng, Zhenkun Zhang, Cunhua Pan, Jiangzhou Wang

Abstract: This paper investigates an active reconfigurable intelligent surface (RIS)-aided mobile edge computing (MEC) system. Compared with passive RIS, the active RIS is equipped with active reflective amplifier, which can effectively circumvent the "double path loss" attenuation. We propose a joint computing and communication design to minimize the maximum computational latency (MCL), subject to both the… ▽ More This paper investigates an active reconfigurable intelligent surface (RIS)-aided mobile edge computing (MEC) system. Compared with passive RIS, the active RIS is equipped with active reflective amplifier, which can effectively circumvent the "double path loss" attenuation. We propose a joint computing and communication design to minimize the maximum computational latency (MCL), subject to both the phase shift constraints and the edge computing capability constraints. Specifically, the original problem is decoupled into four subproblems, and then the block coordinate descent (BCD) method and the successive convex approximation (SCA) method are applied to alternately optimize the subproblems. The simulation results show that with the same power budget, the performance gain achieved by the active RIS is much larger than that by the passive RIS. △ Less

Submitted 6 September, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: Accepted by IEEE Wireless Communications Letters. Keywords: Mobile edge computing (MEC), latency minimization, Internet of things, reconfigurable intelligent surface (RIS), active RIS

arXiv:2205.15495 [pdf, other]

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Authors: Peng Dai, Yiqiang Feng, Renliang Weng, Changshui Zhang

Abstract: The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes… ▽ More The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes the powerful self-attention mechanism of Transformer to learn discriminative features for each tracklet; (2) The decoder adopts the standard cross-attention mechanism to model the affinities between the tracklets and the detections by taking both spatial-temporal and appearance features into account. TransSTAM has two major advantages: (1) It is solely based on the encoder-decoder architecture and enjoys a compact network design, hence being computationally efficient; (2) It can effectively learn spatial-temporal and appearance features within one model, hence achieving better tracking accuracy. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA with respect to previous state-of-the-art approaches on all the benchmarks. Our code is available at \url{https://github.com/icicle4/TranSTAM}. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.02405 [pdf]

doi 10.1063/5.0080334

Enhanced optoelectronic performance and photogating effect in quasi-one-dimensional BiSeI wires

Authors: H. J. Hu, W. L. Zhen, S. R. Weng, Y. D. Li, R. Niu, Z. L. Yue, F. Xu, L. Pi, C. J. Zhang, W. K. Zhu

Abstract: Quasi-one-dimensional (quasi-1D) materials are a newly arising topic in low-dimensional researches. As a result of reduced dimensionality and enhanced anisotropy, the quasi-1D structure gives rise to novel properties and promising applications such as photodetectors. However, it remains an open question whether performance crossover will occur when the channel material is downsized. Here we report… ▽ More Quasi-one-dimensional (quasi-1D) materials are a newly arising topic in low-dimensional researches. As a result of reduced dimensionality and enhanced anisotropy, the quasi-1D structure gives rise to novel properties and promising applications such as photodetectors. However, it remains an open question whether performance crossover will occur when the channel material is downsized. Here we report on the fabrication and testing of photodetectors based on exfoliated quasi-1D BiSeI thin wires. Compared with the device on bulk crystal, a significantly enhanced photoresponse is observed, which is manifested by a series of performance parameters, including ultrahigh responsivity (7 x 10$^4$ A W$^{-1}$), specific detectivity (2.5 x 10$^{14}$ Jones) and external quantum efficiency (1.8 x 10$^7$%) when $V_{\textrm {ds}}$ = 3 V, $λ$ = 515 nm and $P$ = 0.01 mW cm$^{-2}$. The conventional photoconductive effect is unlikely to account for such a superior photoresponse, which is ultimately understood in terms of the increased specific surface area and the photogating effect caused by trapping states. This work provides a perspective for the modulation of optoelectronic properties and performance in quasi-1D materials. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: 23 pages, 4 figures and SI

Journal ref: Appl. Phys. Lett. 120, 201101 (2022)

arXiv:2204.06812 [pdf, other]

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

Authors: Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Weihua Luo, Jun Xie, Rong Jin

Abstract: The principal task in supervised neural machine translation (NMT) is to learn to generate target sentences conditioned on the source inputs from a set of parallel sentence pairs, and thus produce a model capable of generalizing to unseen instances. However, it is commonly observed that the generalization performance of the model is highly influenced by the amount of parallel data used in training.… ▽ More The principal task in supervised neural machine translation (NMT) is to learn to generate target sentences conditioned on the source inputs from a set of parallel sentence pairs, and thus produce a model capable of generalizing to unseen instances. However, it is commonly observed that the generalization performance of the model is highly influenced by the amount of parallel data used in training. Although data augmentation is widely used to enrich the training data, conventional methods with discrete manipulations fail to generate diverse and faithful training samples. In this paper, we present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT), which augments each training instance with an adjacency semantic region that could cover adequate variants of literal expression under the same meaning. We conduct extensive experiments on both rich-resource and low-resource settings involving various language pairs, including WMT14 English-{German,French}, NIST Chinese-English and multiple low-resource IWSLT translation tasks. The provided empirical evidences show that CsaNMT sets a new level of performance among existing augmentation techniques, improving on the state-of-the-art by a large margin. The core codes are contained in Appendix E. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted by ACL 2022 (main conference)

arXiv:2203.11471 [pdf, other]

Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization

Authors: Yu Zhan, Fenghai Li, Renliang Weng, Wongun Choi

Abstract: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. Accurate and generalizable absolute 3D human pose estimation from monocular 2D pose input is an ill-posed problem. To address this challenge, we convert the input from pixel space to 3D normalized rays. This conversion makes our approach robust to camera intrinsic parameter chang… ▽ More In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. Accurate and generalizable absolute 3D human pose estimation from monocular 2D pose input is an ill-posed problem. To address this challenge, we convert the input from pixel space to 3D normalized rays. This conversion makes our approach robust to camera intrinsic parameter changes. To deal with the in-the-wild camera extrinsic parameter variations, Ray3D explicitly takes the camera extrinsic parameters as an input and jointly models the distribution between the 3D pose rays and camera extrinsic parameters. This novel network design is the key to the outstanding generalizability of Ray3D approach. To have a comprehensive understanding of how the camera intrinsic and extrinsic parameter variations affect the accuracy of absolute 3D key-point localization, we conduct in-depth systematic experiments on three single person 3D benchmarks as well as one synthetic benchmark. These experiments demonstrate that our method significantly outperforms existing state-of-the-art models. Our code and the synthetic dataset are available at https://github.com/YxZhxn/Ray3D . △ Less

Submitted 27 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022

arXiv:2203.04478 [pdf, other]

3SD: Self-Supervised Saliency Detection With No Labels

Authors: Rajeev Yasarla, Renliang Weng, Wongun Choi, Vishal Patel, Amir Sadeghian

Abstract: We present a conceptually simple self-supervised method for saliency detection. Our method generates and uses pseudo-ground truth labels for training. The generated pseudo-GT labels don't require any kind of human annotations (e.g., pixel-wise labels or weak labels like scribbles). Recent works show that features extracted from classification tasks provide important saliency cues like structure an… ▽ More We present a conceptually simple self-supervised method for saliency detection. Our method generates and uses pseudo-ground truth labels for training. The generated pseudo-GT labels don't require any kind of human annotations (e.g., pixel-wise labels or weak labels like scribbles). Recent works show that features extracted from classification tasks provide important saliency cues like structure and semantic information of salient objects in the image. Our method, called 3SD, exploits this idea by adding a branch for a self-supervised classification task in parallel with salient object detection, to obtain class activation maps (CAM maps). These CAM maps along with the edges of the input image are used to generate the pseudo-GT saliency maps to train our 3SD network. Specifically, we propose a contrastive learning-based training on multiple image patches for the classification task. We show the multi-patch classification with contrastive loss improves the quality of the CAM maps compared to naive classification on the entire image. Experiments on six benchmark datasets demonstrate that without any labels, our 3SD method outperforms all existing weakly supervised and unsupervised methods, and its performance is on par with the fully-supervised methods. Code is available at :https://github.com/rajeevyasarla/3SD △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2202.11860 [pdf, ps, other]

Robust Transmission Design for RIS-assisted Secure Multiuser Communication Systems in the Presence of Hardware Impairments

Authors: Zhangjie Peng, Ruisong Weng, Cunhua Pan, Gui Zhou, Marco Di Renzo, A. Lee Swindlehurst

Abstract: This paper investigates reconfigurable intelligent surface (RIS)-assisted secure multiuser communication systems subject to hardware impairments (HIs). We jointly optimize the beamforming vectors at the base station (BS) and the phase shifts of the reflecting elements at the RIS so as to maximize the weighted minimum secrecy rate (WMSR), subject to both transmission power constraints at the BS and… ▽ More This paper investigates reconfigurable intelligent surface (RIS)-assisted secure multiuser communication systems subject to hardware impairments (HIs). We jointly optimize the beamforming vectors at the base station (BS) and the phase shifts of the reflecting elements at the RIS so as to maximize the weighted minimum secrecy rate (WMSR), subject to both transmission power constraints at the BS and unit-modulus constraints at the RIS. To address the formulated optimization problem, we first decouple it into two tractable subproblems and then use the block coordinate descent (BCD) method to alternately optimize the subproblems. Two different methods are proposed to solve the two obtained subproblems. The first method transforms each subproblem into a second order cone programming (SOCP) problem, which can be directly solved using CVX. The second method leverages the Minorization- Maximization (MM) algorithm. Specifically, we first derive a concave approximation function, which is a lower bound of the original objective function, and then the two subproblems are transformed into two simple surrogate problems with closedform solutions. Simulation results verify the performance gains of the proposed robust transmission method over existing nonrobust designs. In addition, the MM algorithm is shown to have much lower complexity than the SOCP-based algorithm. △ Less

Submitted 10 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Revised version in IEEE TWC. Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)

arXiv:2110.13385 [pdf, other]

doi 10.1109/BigData59044.2023.10386970

IIP-Transformer: Intra-Inter-Part Transformer for Skeleton-Based Action Recognition

Authors: Qingtian Wang, Jianlin Peng, Shuze Shi, Tingxi Liu, Jiabin He, Renliang Weng

Abstract: Recently, Transformer-based networks have shown great promise on skeleton-based action recognition tasks. The ability to capture global and local dependencies is the key to success while it also brings quadratic computation and memory cost. Another problem is that previous studies mainly focus on the relationships among individual joints, which often suffers from the noisy skeleton joints introduc… ▽ More Recently, Transformer-based networks have shown great promise on skeleton-based action recognition tasks. The ability to capture global and local dependencies is the key to success while it also brings quadratic computation and memory cost. Another problem is that previous studies mainly focus on the relationships among individual joints, which often suffers from the noisy skeleton joints introduced by the noisy inputs of sensors or inaccurate estimations. To address the above issues, we propose a novel Transformer-based network (IIP-Transformer). Instead of exploiting interactions among individual joints, our IIP-Transformer incorporates body joints and parts interactions simultaneously and thus can capture both joint-level (intra-part) and part-level (inter-part) dependencies efficiently and effectively. From the data aspect, we introduce a part-level skeleton data encoding that significantly reduces the computational complexity and is more robust to joint-level skeleton noise. Besides, a new part-level data augmentation is proposed to improve the performance of the model. On two large-scale datasets, NTU-RGB+D 60 and NTU RGB+D 120, the proposed IIP-Transformer achieves the-state-of-art performance with more than 8x less computational complexity than DSTA-Net, which is the SOTA Transformer-based method. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 10 pages, 7 figures

arXiv:2103.07889 [pdf, other]

Learning a Proposal Classifier for Multiple Object Tracking

Authors: Peng Dai, Renliang Weng, Wongun Choi, Changshui Zhang, Zhangping He, Wei Ding

Abstract: The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. However, it is not trivial to solve the data-association problem in an end-to-end fashion. In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity… ▽ More The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. However, it is not trivial to solve the data-association problem in an end-to-end fashion. In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph. This framework is similar to the two-stage object detector Faster RCNN, and can solve the MOT problem in a data-driven way. For proposal generation, we propose an iterative graph clustering method to reduce the computational cost while maintaining the quality of the generated proposals. For proposal scoring, we deploy a trainable graph-convolutional-network (GCN) to learn the structural patterns of the generated proposals and rank them according to the estimated quality scores. For trajectory inference, a simple deoverlapping strategy is adopted to generate tracking output while complying with the constraints that no detection can be assigned to more than one track. We experimentally demonstrate that the proposed method achieves a clear performance improvement in both MOTA and IDF1 with respect to previous state-of-the-art on two public benchmarks. Our code is available at https://github.com/daip13/LPC_MOT.git. △ Less

Submitted 25 March, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021, Poster, EEE/CVF Conference on Computer Vision and Pattern Recognition

arXiv:2012.01915 [pdf, other]

Origin-Aware Next Destination Recommendation with Personalized Preference Attention

Authors: Nicholas Lim, Bryan Hooi, See-Kiong Ng, Xueou Wang, Yong Liang Goh, Renrong Weng, Rui Tan

Abstract: Next destination recommendation is an important task in the transportation domain of taxi and ride-hailing services, where users are recommended with personalized destinations given their current origin location. However, recent recommendation works do not satisfy this origin-awareness property, and only consider learning from historical destination locations, without origin information. Thus, the… ▽ More Next destination recommendation is an important task in the transportation domain of taxi and ride-hailing services, where users are recommended with personalized destinations given their current origin location. However, recent recommendation works do not satisfy this origin-awareness property, and only consider learning from historical destination locations, without origin information. Thus, the resulting approaches are unable to learn and predict origin-aware recommendations based on the user's current location, leading to sub-optimal performance and poor real-world practicality. Hence, in this work, we study the origin-aware next destination recommendation task. We propose the Spatial-Temporal Origin-Destination Personalized Preference Attention (STOD-PPA) encoder-decoder model to learn origin-origin (OO), destination-destination (DD), and origin-destination (OD) relationships by first encoding both origin and destination sequences with spatial and temporal factors in local and global views, then decoding them through personalized preference attention to predict the next destination. Experimental results on seven real-world user trajectory taxi datasets show that our model significantly outperforms baseline and state-of-the-art methods. △ Less

Submitted 11 January, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: To appear in the Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM), 2021

arXiv:2010.07024 [pdf, other]

doi 10.1145/3340531.3411876

STP-UDGAT: Spatial-Temporal-Preference User Dimensional Graph Attention Network for Next POI Recommendation

Authors: Nicholas Lim, Bryan Hooi, See-Kiong Ng, Xueou Wang, Yong Liang Goh, Renrong Weng, Jagannadan Varadarajan

Abstract: Next Point-of-Interest (POI) recommendation is a longstanding problem across the domains of Location-Based Social Networks (LBSN) and transportation. Recent Recurrent Neural Network (RNN) based approaches learn POI-POI relationships in a local view based on independent user visit sequences. This limits the model's ability to directly connect and learn across users in a global view to recommend sem… ▽ More Next Point-of-Interest (POI) recommendation is a longstanding problem across the domains of Location-Based Social Networks (LBSN) and transportation. Recent Recurrent Neural Network (RNN) based approaches learn POI-POI relationships in a local view based on independent user visit sequences. This limits the model's ability to directly connect and learn across users in a global view to recommend semantically trained POIs. In this work, we propose a Spatial-Temporal-Preference User Dimensional Graph Attention Network (STP-UDGAT), a novel explore-exploit model that concurrently exploits personalized user preferences and explores new POIs in global spatial-temporal-preference (STP) neighbourhoods, while allowing users to selectively learn from other users. In addition, we propose random walks as a masked self-attention option to leverage the STP graphs' structures and find new higher-order POI neighbours during exploration. Experimental results on six real-world datasets show that our model significantly outperforms baseline and state-of-the-art methods. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: To appear in Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM), 2020

arXiv:2010.04411 [pdf, other]

Uncertainty-Aware Semantic Augmentation for Neural Machine Translation

Authors: Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo

Abstract: As a sequence-to-sequence generation task, neural machine translation (NMT) naturally contains intrinsic uncertainty, where a single sentence in one language has multiple valid counterparts in the other. However, the dominant methods for NMT only observe one of them from the parallel corpora for the model training but have to deal with adequate variations under the same meaning at inference. This… ▽ More As a sequence-to-sequence generation task, neural machine translation (NMT) naturally contains intrinsic uncertainty, where a single sentence in one language has multiple valid counterparts in the other. However, the dominant methods for NMT only observe one of them from the parallel corpora for the model training but have to deal with adequate variations under the same meaning at inference. This leads to a discrepancy of the data distribution between the training and the inference phases. To address this problem, we propose uncertainty-aware semantic augmentation, which explicitly captures the universal semantic information among multiple semantically-equivalent source sentences and enhances the hidden representations with this information for better translations. Extensive experiments on various translation tasks reveal that our approach significantly outperforms the strong baselines and the existing methods. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Comments: Accepted to EMNLP 2020, 12 pages, 2 figures, 9 tables

arXiv:2007.15960 [pdf, other]

On Learning Universal Representations Across Languages

Authors: Xiangpeng Wei, Rongxiang Weng, Yue Hu, Luxi Xing, Heng Yu, Weihua Luo

Abstract: Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks. However, existing approaches essentially capture the co-occurrence among tokens through involving the masked language model (MLM) objective with token-level cross entropy. In this work, we extend these approaches to learn sentence-le… ▽ More Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks. However, existing approaches essentially capture the co-occurrence among tokens through involving the masked language model (MLM) objective with token-level cross entropy. In this work, we extend these approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation. Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to (1) learn universal representations for parallel sentences distributed in one or multiple languages and (2) distinguish the semantically-related words from a shared cross-lingual vocabulary for each sentence. We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation. Experimental results show that the HiCTL outperforms the state-of-the-art XLM-R by an absolute gain of 4.2% accuracy on the XTREME benchmark as well as achieves substantial improvements on both of the high-resource and low-resource English-to-X translation tasks over strong baselines. △ Less

Submitted 21 March, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

Comments: Accepted to ICLR 2021

arXiv:2004.14021 [pdf, other]

Multiscale Collaborative Deep Models for Neural Machine Translation

Authors: Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, Weihua Luo

Abstract: Recent evidence reveals that Neural Machine Translation (NMT) models with deeper neural networks can be more effective but are difficult to train. In this paper, we present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously. We explicitly boost the gradient back-propagation from top to bottom levels by introducing… ▽ More Recent evidence reveals that Neural Machine Translation (NMT) models with deeper neural networks can be more effective but are difficult to train. In this paper, we present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously. We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models. Then, instead of forcing the whole encoder stack directly learns a desired representation, we let each encoder block learns a fine-grained representation and enhance it by encoding spatial dependencies using a context-scale collaboration. We provide empirical evidence showing that the MSC nets are easy to optimize and can obtain improvements of translation quality from considerably increased depth. On IWSLT translation tasks with three translation directions, our extremely deep models (with 72-layer encoders) surpass strong baselines by +2.2~+3.1 BLEU points. In addition, our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models. △ Less

Submitted 10 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: ACL 2020

arXiv:2004.02196 [pdf, other]

AR: Auto-Repair the Synthetic Data for Neural Machine Translation

Authors: Shanbo Cheng, Shaohui Kuang, Rongxiang Weng, Heng Yu, Changfeng Zhu, Weihua Luo

Abstract: Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality. However, as a well-known shortcoming, synthetic parallel data is noisy because they are ge… ▽ More Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality. However, as a well-known shortcoming, synthetic parallel data is noisy because they are generated by an imperfect NMT system. As a result, the improvements in translation quality bring by the synthetic parallel data are greatly diminished. In this paper, we propose a novel Auto- Repair (AR) framework to improve the quality of synthetic data. Our proposed AR model can learn the transformation from low quality (noisy) input sentence to high quality sentence based on large scale monolingual data with BT and FT techniques. The noise in synthetic parallel data will be sufficiently eliminated by the proposed AR model and then the repaired synthetic parallel data can help the NMT models to achieve larger improvements. Experimental results show that our approach can effective improve the quality of synthetic parallel data and the NMT model with the repaired synthetic data achieves consistent improvements on both WMT14 EN!DE and IWSLT14 DE!EN translation tasks. △ Less

Submitted 5 April, 2020; originally announced April 2020.

arXiv:2002.10101 [pdf, other]

GRET: Global Representation Enhanced Transformer

Authors: Rongxiang Weng, Haoran Wei, Shujian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jiajun Chen

Abstract: Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However… ▽ More Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: Accepted by AAAI 2020

arXiv:1912.01774 [pdf, other]

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Authors: Rongxiang Weng, Heng Yu, Shujian Huang, Shanbo Cheng, Weihua Luo

Abstract: Pre-training and fine-tuning have achieved great success in the natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that t… ▽ More Pre-training and fine-tuning have achieved great success in the natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an APT framework for acquiring knowledge from the pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts. △ Less

Submitted 3 December, 2019; originally announced December 2019.

arXiv:1908.07688 [pdf, other]

Improving Neural Machine Translation with Pre-trained Representation

Authors: Rongxiang Weng, Heng Yu, Shujian Huang, Weihua Luo, Jiajun Chen

Abstract: Monolingual data has been demonstrated to be helpful in improving the translation quality of neural machine translation (NMT). The current methods stay at the usage of word-level knowledge, such as generating synthetic parallel data or extracting information from word embedding. In contrast, the power of sentence-level contextual knowledge which is more complex and diverse, playing an important ro… ▽ More Monolingual data has been demonstrated to be helpful in improving the translation quality of neural machine translation (NMT). The current methods stay at the usage of word-level knowledge, such as generating synthetic parallel data or extracting information from word embedding. In contrast, the power of sentence-level contextual knowledge which is more complex and diverse, playing an important role in natural language generation, has not been fully exploited. In this paper, we propose a novel structure which could leverage monolingual data to acquire sentence-level contextual representations. Then, we design a framework for integrating both source and target sentence-level representations into NMT model to improve the translation quality. Experimental results on Chinese-English, German-English machine translation tasks show that our proposed model achieves improvement over strong Transformer baselines, while experiments on English-Turkish further demonstrate the effectiveness of our approach in the low-resource scenario. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: In Progress

arXiv:1907.07328 [pdf, other]

Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Authors: Peng Wu, Shujian Huang, Rongxiang Weng, Zaixiang Zheng, Jianbing Zhang, Xiaohui Yan, Jiajun Chen

Abstract: Relation detection is a core step in many natural language process applications including knowledge base question answering. Previous efforts show that single-fact questions could be answered with high accuracy. However, one critical problem is that current approaches only get high accuracy for questions whose relations have been seen in the training data. But for unseen relations, the performance… ▽ More Relation detection is a core step in many natural language process applications including knowledge base question answering. Previous efforts show that single-fact questions could be answered with high accuracy. However, one critical problem is that current approaches only get high accuracy for questions whose relations have been seen in the training data. But for unseen relations, the performance will drop rapidly. The main reason for this problem is that the representations for unseen relations are missing. In this paper, we propose a simple mapping method, named representation adapter, to learn the representation mapping for both seen and unseen relations based on previously learned relation embedding. We employ the adversarial objective and the reconstruction objective to improve the mapping performance. We re-organize the popular SimpleQuestion dataset to reveal and evaluate the problem of detecting unseen relations. Experiments show that our method can greatly improve the performance of unseen relations while the performance for those seen part is kept comparable to the state-of-the-art. Our code and data are available at https://github.com/wudapeng268/KBQA-Adapter. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: 10 pages, 5 figures, accepted by ACL 2019

arXiv:1907.03468 [pdf, other]

Correct-and-Memorize: Learning to Translate from Interactive Revisions

Authors: Rongxiang Weng, Hao Zhou, Shujian Huang, Lei Li, Yifan Xia, Jiajun Chen

Abstract: State-of-the-art machine translation models are still not on par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model-translation errors are equal -- some are critical while others are minor. In the meanwhile, the same translation mistakes occur repeatedly in a similar conte… ▽ More State-of-the-art machine translation models are still not on par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model-translation errors are equal -- some are critical while others are minor. In the meanwhile, the same translation mistakes occur repeatedly in a similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed \method enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods. △ Less

Submitted 13 August, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: Accepted at IJCAI 2019

arXiv:1810.10317 [pdf, other]

Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation

Authors: Zaixiang Zheng, Shujian Huang, Zewei Sun, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen

Abstract: Previous studies show that incorporating external information could improve the translation quality of Neural Machine Translation (NMT) systems. However, there are inevitably noises in the external information, severely reducing the benefit that the existing methods could receive from the incorporation. To tackle the problem, this study pays special attention to the discrimination of the noises du… ▽ More Previous studies show that incorporating external information could improve the translation quality of Neural Machine Translation (NMT) systems. However, there are inevitably noises in the external information, severely reducing the benefit that the existing methods could receive from the incorporation. To tackle the problem, this study pays special attention to the discrimination of the noises during the incorporation. We argue that there exist two kinds of noise in this external information, i.e. global noise and local noise, which affect the translations for the whole sentence and for some specific words, respectively. Accordingly, we propose a general framework that learns to jointly discriminate both the global and local noises, so that the external information could be better leveraged. Our model is trained on the dataset derived from the original parallel corpus without any external labeled data or annotation. Experimental results in various real-world scenarios, language pairs, and neural architectures indicate that discriminating noises contributes to significant improvements in translation quality by being able to better incorporate the external information, even in very noisy conditions. △ Less

Submitted 19 November, 2018; v1 submitted 24 October, 2018; originally announced October 2018.

Comments: 8 pages

arXiv:1809.01290 [pdf]

doi 10.1063/1.5094231

Origin of planar Hall effect in type-II Weyl semimetal MoTe2

Authors: D. D. Liang, Y. J. Wang, W. L. Zhen, J. Yang, S. R. Weng, X. Yan, Y. Y. Han, W. Tong, L. Pi, W. K. Zhu, C. J. Zhang

Abstract: Besides the negative longitudinal magnetoresistance (MR), planar Hall effect (PHE) is a newly emerging experimental tool to test the chiral anomaly or nontrivial Berry curvature in Weyl semimetals (WSMs). However, the origins of PHE in various systems are not fully distinguished and understood. Here we perform a systematic study on the PHE and anisotropic MR (AMR) of Td-MoTe2, a type-II WSM. Altho… ▽ More Besides the negative longitudinal magnetoresistance (MR), planar Hall effect (PHE) is a newly emerging experimental tool to test the chiral anomaly or nontrivial Berry curvature in Weyl semimetals (WSMs). However, the origins of PHE in various systems are not fully distinguished and understood. Here we perform a systematic study on the PHE and anisotropic MR (AMR) of Td-MoTe2, a type-II WSM. Although the PHE and AMR curves can be well fitted by the theoretical formulas, we demonstrate that the anisotropic resistivity arises from the orbital MR (OMR), instead of the negative MR as expected in the chiral anomaly effect. In contrast, the absence of negative MR indicates that the large OMR dominates over the chiral anomaly effect. This explains why it is difficult to measure negative MR in type-II WSMs. We argue that the measured PHE can be related with the chiral anomaly only when the negative MR is simultaneously observed. △ Less

Submitted 4 September, 2018; originally announced September 2018.

Comments: 14 pages, 4 figures

Journal ref: AIP Advances 9, 055015 (2019)

arXiv:1807.06229 [pdf]

doi 10.1103/PhysRevMaterials.3.014201

Current jetting distorted planar Hall effect in a Weyl semimetal with ultrahigh mobility

Authors: J. Yang, W. L. Zhen, D. D. Liang, Y. J. Wang, X. Yan, S. R. Weng, J. R. Wang, W. Tong, L. Pi, W. K. Zhu, C. J. Zhang

Abstract: A giant planar Hall effect (PHE) and anisotropic magnetoresistance (AMR) is observed in TaP, a nonmagnetic Weyl semimetal with ultrahigh mobility. The perpendicular resistivity (i.e., the planar magnetic field applied normal to the current) far exceeds the zero-field resistivity, which thus rules out the possible origin of negative longitudinal magnetoresistance. The giant PHE/AMR is finally attri… ▽ More A giant planar Hall effect (PHE) and anisotropic magnetoresistance (AMR) is observed in TaP, a nonmagnetic Weyl semimetal with ultrahigh mobility. The perpendicular resistivity (i.e., the planar magnetic field applied normal to the current) far exceeds the zero-field resistivity, which thus rules out the possible origin of negative longitudinal magnetoresistance. The giant PHE/AMR is finally attributed to the large anisotropic orbital magnetoresistance that stems from the ultrahigh mobility. Furthermore, the mobility-enhanced current jetting effects are found to strongly deform the line shape of the curves, and their evolution with the changing magnetic field and temperature is also studied. Although the giant PHE/AMR suggests promising applications in spintronics, the enhanced current jetting shows the other side of the coin, which needs to be considered in the future device design. △ Less

Submitted 14 January, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: Physical Review Materials

Journal ref: Phys. Rev. Materials 3, 014201 (2019)

arXiv:1712.10200 [pdf]

doi 10.1088/0256-307X/35/9/097101

Non-stoichiometry effects on the extreme magnetoresistance in Weyl semimetal WTe2

Authors: J. X. Gong, J. Yang, M. Ge, Y. J. Wang, D. D. Liang, L. Luo, X. Yan, W. L. Zhen, S. R. Weng, L. Pi, C. J. Zhang, W. K. Zhu

Abstract: Non-stoichiometry effect on the extreme magnetoresistance is systematically investigated for the Weyl semimetal WTe2. Magnetoresistance and Hall resistivity are measured for the as-grown samples with a slight difference in Te vacancies and the annealed samples with increased Te vacancies. The fittings to a two-carrier model show that the magnetoresistance is strongly dependent on the residual resi… ▽ More Non-stoichiometry effect on the extreme magnetoresistance is systematically investigated for the Weyl semimetal WTe2. Magnetoresistance and Hall resistivity are measured for the as-grown samples with a slight difference in Te vacancies and the annealed samples with increased Te vacancies. The fittings to a two-carrier model show that the magnetoresistance is strongly dependent on the residual resistivity ratio (i.e., the degree of non-stoichiometry), which is eventually understood in terms of electron doping which not only breaks the balance between electron-type and hole-type carrier densities but also reduces the average carrier mobility. Thus, compensation effect and ultrahigh mobility are probably the main driving force of the extreme magnetoresistance in WTe2. △ Less

Submitted 29 April, 2018; v1 submitted 29 December, 2017; originally announced December 2017.

Journal ref: Chin. Phys. Lett. 35, 097101 (2018)

Showing 1–50 of 52 results for author: Weng, R