Skip to main content

Showing 1–50 of 242 results for author: Wei, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02705  [pdf, ps, other

    cs.CV

    SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment

    Authors: Qi Xu, Dongxu Wei, Lingzhe Zhao, Wenpu Li, Zhangchi Huang, Shunping Ji, Peidong Liu

    Abstract: Simultaneous understanding and 3D reconstruction plays an important role in developing end-to-end embodied intelligent systems. To achieve this, recent approaches resort to 2D-to-3D feature alignment paradigm, which leads to limited 3D understanding capability and potential semantic information loss. In light of this, we propose SIU3R, the first alignment-free framework for generalizable simultane… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  2. arXiv:2506.21589  [pdf, ps, other

    cs.CL

    A General Method for Detecting Information Generated by Large Language Models

    Authors: Minjia Mao, Dongjun Wei, Xiao Fang, Michael Chau

    Abstract: The proliferation of large language models (LLMs) has significantly transformed the digital information landscape, making it increasingly challenging to distinguish between human-written and LLM-generated content. Detecting LLM-generated information is essential for preserving trust on digital platforms (e.g., social media and e-commerce sites) and preventing the spread of misinformation, a topic… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  3. arXiv:2506.12963  [pdf, ps, other

    cs.AI cs.LG

    Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

    Authors: Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

    Abstract: Recent advances in large reasoning models (LRMs) have enabled strong chain-of-thought (CoT) generation through test-time computation. While these multi-step reasoning capabilities represent a major milestone in language model performance, they also introduce new safety risks. In this work, we present the first systematic study to revisit the problem of machine unlearning in the context of LRMs. Ma… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  4. arXiv:2506.06782  [pdf, other

    cs.LG cs.AI cs.CV

    Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World

    Authors: Qinting Jiang, Chuyang Ye, Dongyan Wei, Bingli Wang, Yuan Xue, Jingyan Jiang, Zhi Wang

    Abstract: Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. We observe that feature distributions across different domains inher… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  5. arXiv:2506.05586  [pdf, ps, other

    cs.LG cs.AI

    CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions

    Authors: Isha Puri, Amit Dhurandhar, Tejaswini Pedapati, Kartikeyan Shanmugam, Dennis Wei, Kush R. Varshney

    Abstract: In recent years there has been a considerable amount of research on local post hoc explanations for neural networks. However, work on building interpretable neural architectures has been relatively sparse. In this paper, we present a novel neural architecture, CoFrNet, inspired by the form of continued fractions which are known to have many attractive properties in number theory, such as fast conv… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS) 2021, vol 34, pp 21668-21690

  6. arXiv:2506.01339  [pdf, ps, other

    cs.LG

    Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning

    Authors: Changsheng Wang, Yihua Zhang, Jinghan Jia, Parikshit Ram, Dennis Wei, Yuguang Yao, Soumyadeep Pal, Nathalie Baracaldo, Sijia Liu

    Abstract: Machine unlearning offers a promising solution to privacy and safety concerns in large language models (LLMs) by selectively removing targeted knowledge while preserving utility. However, current methods are highly sensitive to downstream fine-tuning, which can quickly recover forgotten information-even from unrelated tasks. To address this, we introduce invariance into unlearning for the first ti… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  7. arXiv:2505.24141  [pdf, ps, other

    cs.CV cs.AI

    The Butterfly Effect in Pathology: Exploring Security in Pathology Foundation Models

    Authors: Jiashuai Liu, Yingjia Shang, Yingkang Zhan, Di Zhang, Yi Niu, Dong Wei, Xian Wu, Zeyu Gao, Chen Li, Yefeng Zheng

    Abstract: With the widespread adoption of pathology foundation models in both research and clinical decision support systems, exploring their security has become a critical concern. However, despite their growing impact, the vulnerability of these models to adversarial attacks remains largely unexplored. In this work, we present the first systematic investigation into the security of pathology foundation mo… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.17671  [pdf, ps, other

    cs.CL

    MIDB: Multilingual Instruction Data Booster for Enhancing Multilingual Instruction Synthesis

    Authors: Yilun Liu, Chunguang Zhao, Xinhua Yang, Hongyong Zeng, Shimin Tao, Weibin Meng, Minggui He, Chang Su, Yan Yu, Hongxia Ma, Li Zhang, Daimeng Wei, Hao Yang

    Abstract: Despite doubts on data quality, instruction synthesis has been widely applied into instruction tuning (IT) of LLMs as an economic and rapid alternative. Recent endeavors focus on improving data quality for synthesized instruction pairs in English and have facilitated IT of English-centric LLMs. However, data quality issues in multilingual synthesized instruction pairs are even more severe, since t… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  9. arXiv:2505.13554  [pdf, other

    cs.CL cs.AI

    Combining the Best of Both Worlds: A Method for Hybrid NMT and LLM Translation

    Authors: Zhanglin Wu, Daimeng Wei, Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Zongyao Li, Yuanchang Luo, Jinlong Yang, Zhiqiang Rao, Hao Yang

    Abstract: Large language model (LLM) shows promising performances in a variety of downstream tasks, such as machine translation (MT). However, using LLMs for translation suffers from high computational costs and significant latency. Based on our evaluation, in most cases, translations using LLMs are comparable to that generated by neural machine translation (NMT) systems. Only in particular scenarios, LLM a… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 9 pages, 2 figures, 9 tables, ACL 2025

  10. arXiv:2505.08909  [pdf, other

    cs.CV cs.LG math.FA math.OC

    Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Inverse Problems

    Authors: Deliang Wei, Peng Chen, Haobo Xu, Jiale Yao, Fang Li, Tieyong Zeng

    Abstract: Plug-and-play (PnP) methods with deep denoisers have shown impressive results in imaging problems. They typically require strong convexity or smoothness of the fidelity term and a (residual) non-expansive denoiser for convergence. These assumptions, however, are violated in Poisson inverse problems, and non-expansiveness can hinder denoising performance. To address these challenges, we propose a c… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 31 pages

    MSC Class: 94A08; 47H10; 47J26; 46N10; 47N10

  11. arXiv:2504.19480  [pdf, other

    cs.LG cs.AI

    An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination

    Authors: Dixiao Wei, Peng Yi, Jinlong Lei, Yiguang Hong, Yuchuan Du

    Abstract: Reinforcement Learning (RL) has demonstrated excellent decision-making potential in platoon coordination problems. However, due to the variability of coordination goals, the complexity of the decision problem, and the time-consumption of trial-and-error in manual design, finding a well performance reward function to guide RL training to solve complex platoon coordination problems remains challengi… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  12. arXiv:2504.14804  [pdf, ps, other

    cs.CL cs.AI

    Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends

    Authors: Jiaxin GUO, Xiaoyu Chen, Zhiqiang Rao, Jinlong Yang, Zongyao Li, Hengchao Shang, Daimeng Wei, Hao Yang

    Abstract: With the rapid development of deep learning technologies, the field of machine translation has witnessed significant progress, especially with the advent of large language models (LLMs) that have greatly propelled the advancement of document-level translation. However, accurately evaluating the quality of document-level translation remains an urgent issue. This paper first introduces the developme… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  13. arXiv:2504.05614  [pdf, other

    cs.CL

    Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation Refinement

    Authors: Yichen Dong, Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Recent research has shown that large language models (LLMs) can enhance translation quality through self-refinement. In this paper, we build on this idea by extending the refinement from sentence-level to document-level translation, specifically focusing on document-to-document (Doc2Doc) translation refinement. Since sentence-to-sentence (Sent2Sent) and Doc2Doc translation address different aspect… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Under Review

  14. arXiv:2504.05122  [pdf, other

    cs.CL

    DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation

    Authors: Xinglin Lyu, Wei Tang, Yuang Li, Xiaofeng Zhao, Ming Zhu, Junhui Li, Yunfei Lu, Min Zhang, Daimeng Wei, Hao Yang, Min Zhang

    Abstract: Document-level context is crucial for handling discourse challenges in text-to-text document-level machine translation (MT). Despite the increased discourse challenges introduced by noise from automatic speech recognition (ASR), the integration of document-level context in speech translation (ST) remains insufficiently explored. In this paper, we develop DoCIA, an online framework that enhances ST… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  15. arXiv:2504.02873  [pdf, other

    cs.CL

    Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

    Authors: Dongjun Wei, Minjia Mao, Xiao Fang, Michael Chau

    Abstract: The malicious usage of large language models (LLMs) has motivated the detection of LLM-generated texts. Previous work in topological data analysis shows that the persistent homology dimension (PHD) of text embeddings can serve as a more robust and promising score than other zero-shot methods. However, effectively detecting short LLM-generated texts remains a challenge. This paper presents Short-PH… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  16. arXiv:2503.16185  [pdf

    cs.CV

    MapGlue: Multimodal Remote Sensing Image Matching

    Authors: Peihao Wu, Yongxiang Yao, Wenfei Zhang, Dong Wei, Yi Wan, Yansheng Li, Yongjun Zhang

    Abstract: Multimodal remote sensing image (MRSI) matching is pivotal for cross-modal fusion, localization, and object detection, but it faces severe challenges due to geometric, radiometric, and viewpoint discrepancies across imaging modalities. Existing unimodal datasets lack scale and diversity, limiting deep learning solutions. This paper proposes MapGlue, a universal MRSI matching framework, and MapData… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: The dataset and code are available at https://github.com/PeihaoWu/MapGlue

  17. arXiv:2503.12152  [pdf, other

    cs.CL

    Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion

    Authors: Bin Liu, Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Recent studies in prompting large language model (LLM) for document-level machine translation (DMT) primarily focus on the inter-sentence context by flatting the source document into a long sequence. This approach relies solely on the sequence of sentences within the document. However, the complexity of document-level sequences is greater than that of shorter sentence-level sequences, which may li… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  18. arXiv:2503.06669  [pdf, other

    cs.RO cs.CV cs.LG

    AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

    Authors: AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang , et al. (27 additional authors not shown)

    Abstract: We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loo… ▽ More

    Submitted 30 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Project website: https://agibot-world.com/. Github repo: https://github.com/OpenDriveLab/AgiBot-World. The author list is ordered alphabetically by surname, with detailed contributions provided in the appendix

  19. arXiv:2503.00237  [pdf, other

    cs.AI

    Agentic AI Needs a Systems Theory

    Authors: Erik Miehling, Karthikeyan Natesan Ramamurthy, Kush R. Varshney, Matthew Riemer, Djallel Bouneffouf, John T. Richards, Amit Dhurandhar, Elizabeth M. Daly, Michael Hind, Prasanna Sattigeri, Dennis Wei, Ambrish Rawat, Jasmina Gajcin, Werner Geyer

    Abstract: The endowment of AI with reasoning capabilities and some degree of agency is widely viewed as a path toward more capable and generalizable systems. Our position is that the current development of agentic AI requires a more holistic, systems-theoretic perspective in order to fully understand their capabilities and mitigate any emergent risks. The primary motivation for our position is that AI devel… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  20. arXiv:2503.00060  [pdf, other

    cs.CV cs.AI

    SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit

    Authors: Youbing Hu, Yun Cheng, Anqi Lu, Dawei Wei, Zhijun Li

    Abstract: The Vision Transformer (ViT) excels in global modeling but faces deployment challenges on resource-constrained devices due to the quadratic computational complexity of its attention mechanism. To address this, we propose the Semantic-Aware Clustering Vision Transformer (SAC-ViT), a non-iterative approach to enhance ViT's computational efficiency. SAC-ViT operates in two stages: Early Exit (EE) and… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

  21. arXiv:2502.19735  [pdf, other

    cs.CL

    R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

    Authors: Minggui He, Yilun Liu, Shimin Tao, Yuanchang Luo, Hongyong Zeng, Chang Su, Li Zhang, Hongxia Ma, Daimeng Wei, Weibin Meng, Hao Yang, Boxing Chen, Osamu Yoshie

    Abstract: Despite recent breakthroughs in reasoning-enhanced large language models (LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine translation (MT), where human translators naturally employ structured, multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored. Existing methods either design a fixed CoT tailored for a specific MT sub-task (e.g., literature translatio… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  22. arXiv:2502.16137  [pdf, other

    cs.CL cs.AI

    Chain-of-Description: What I can understand, I can put into words

    Authors: Jiaxin Guo, Daimeng Wei, Zongyao Li, Hengchao Shang, Yuanchang Luo, Hao Yang

    Abstract: In this paper, we propose a novel strategy defined as Chain-of-Description (CoD) Prompting, tailored for Multi-Modal Large Language Models. This approach involves having the model first provide a detailed description of the multi-modal input before generating an answer to the question. When applied to models such as Qwen2-Audio, Qwen2-VL, and Qwen2.5-VL, CoD Prompting significantly enhances perfor… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  23. arXiv:2502.11718  [pdf, ps, other

    cs.CL cs.CV

    "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

    Authors: Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, Ziming Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Zhicheng Zheng, Xiaoyong Zhu, Bo Zheng

    Abstract: The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major t… ▽ More

    Submitted 30 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 26 pages, 21 figures

  24. arXiv:2502.05228  [pdf

    quant-ph cs.AI eess.SY

    Multi-Objective Mobile Damped Wave Algorithm (MOMDWA): A Novel Approach For Quantum System Control

    Authors: Juntao Yu, Jiaquan Yu, Dedai Wei, Xinye Sha, Shengwei Fu, Miuyu Qiu, Yurun Jin, Kaichen Ouyang

    Abstract: In this paper, we introduce a novel multi-objective optimization algorithm, the Multi-Objective Mobile Damped Wave Algorithm (MOMDWA), specifically designed to address complex quantum control problems. Our approach extends the capabilities of the original Mobile Damped Wave Algorithm (MDWA) by incorporating multiple objectives, enabling a more comprehensive optimization process. We applied MOMDWA… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  25. arXiv:2501.08523  [pdf, other

    cs.CL cs.AI

    Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation

    Authors: Jiaxin Guo, Yuanchang Luo, Daimeng Wei, Ling Zhang, Zongyao Li, Hengchao Shang, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Zhanglin Wu, Hao Yang

    Abstract: The field of artificial intelligence has witnessed significant advancements in natural language processing, largely attributed to the capabilities of Large Language Models (LLMs). These models form the backbone of Agents designed to address long-context dependencies, particularly in Document-level Machine Translation (DocMT). DocMT presents unique challenges, with quality, consistency, and fluency… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  26. arXiv:2412.18299  [pdf, other

    cs.CL cs.AI

    M-Ped: Multi-Prompt Ensemble Decoding for Large Language Models

    Authors: Jiaxin Guo, Daimeng Wei, Yuanchang Luo, Shimin Tao, Hengchao Shang, Zongyao Li, Shaojun Li, Jinlong Yang, Zhanglin Wu, Zhiqiang Rao, Hao Yang

    Abstract: With the widespread application of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), enhancing their performance has become a research hotspot. This paper presents a novel multi-prompt ensemble decoding approach designed to bolster the generation quality of LLMs by leveraging the aggregation of outcomes from multiple prompts. Given a unique input $X$, we submit $n$ va… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  27. arXiv:2412.14462  [pdf, other

    cs.CV

    Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

    Authors: Jixuan He, Wanhua Li, Ye Liu, Junsik Kim, Donglai Wei, Hanspeter Pfister

    Abstract: As a common image editing operation, image composition involves integrating foreground objects into background scenes. In this paper, we expand the application of the concept of Affordance from human-centered image composition tasks to a more general object-scene composition framework, addressing the complex interplay between foreground objects and background scenes. Following the principle of Aff… ▽ More

    Submitted 20 April, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Code is available at: https://github.com/KaKituken/affordance-aware-any. Project page at: https://kakituken.github.io/affordance-any.github.io/

  28. arXiv:2412.13599  [pdf, other

    cs.CV cs.CL

    Unlocking the Potential of Weakly Labeled Data: A Co-Evolutionary Learning Framework for Abnormality Detection and Report Generation

    Authors: Jinghan Sun, Dong Wei, Zhe Xu, Donghuan Lu, Hong Liu, Hong Wang, Sotirios A. Tsaftaris, Steven McDonagh, Yefeng Zheng, Liansheng Wang

    Abstract: Anatomical abnormality detection and report generation of chest X-ray (CXR) are two essential tasks in clinical practice. The former aims at localizing and characterizing cardiopulmonary radiological findings in CXRs, while the latter summarizes the findings in a detailed report for further diagnosis and treatment. Existing methods often focused on either task separately, ignoring their correlatio… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  29. arXiv:2412.11529  [pdf, other

    cs.CV

    Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings

    Authors: Panwang Xia, Lei Yu, Yi Wan, Qiong Wu, Peiqi Chen, Liheng Zhong, Yongxiang Yao, Dong Wei, Xinyi Liu, Lixiang Ru, Yingying Zhang, Jiangwei Lao, Jingdong Chen, Ming Yang, Yongjun Zhang

    Abstract: Cross-View Geo-Localization tackles the challenge of image geo-localization in GNSS-denied environments, including disaster response scenarios, urban canyons, and dense forests, by matching street-view query images with geo-tagged aerial-view reference images. However, current research often relies on benchmarks and methods that assume center-aligned settings or account for only limited decentrali… ▽ More

    Submitted 2 January, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

  30. arXiv:2412.07259  [pdf, other

    cs.AI

    Goal-Driven Reasoning in DatalogMTL with Magic Sets

    Authors: Shaoyu Wang, Kaiyue Zhao, Dongliang Wei, Przemysław Andrzej Wałęga, Dingmin Wang, Hongming Cai, Pan Hu

    Abstract: DatalogMTL is a powerful rule-based language for temporal reasoning. Due to its high expressive power and flexible modeling capabilities, it is suitable for a wide range of applications, including tasks from industrial and financial sectors. However, due to its high computational complexity, practical reasoning in DatalogMTL is highly challenging. To address this difficulty, we introduce a new rea… ▽ More

    Submitted 27 February, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

  31. arXiv:2412.06273  [pdf, other

    cs.CV cs.GR

    Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

    Authors: Dongxu Wei, Zhiqi Li, Peidong Liu

    Abstract: Prior works employing pixel-based Gaussian representation have demonstrated efficacy in feed-forward sparse-view reconstruction. However, such representation necessitates cross-view overlap for accurate depth estimation, and is challenged by object occlusions and frustum truncations. As a result, these methods require scene-centric data acquisition to maintain cross-view overlap and complete scene… ▽ More

    Submitted 26 February, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted by CVPR2025

  32. arXiv:2412.03906  [pdf, other

    cs.LG stat.ML

    Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

    Authors: Dennis Wei, Inkit Padhi, Soumya Ghosh, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Maria Chang

    Abstract: Training data attribution (TDA) is the task of attributing model behavior to elements in the training data. This paper draws attention to the common setting where one has access only to the final trained model, and not the training algorithm or intermediate information from training. To serve as a gold standard for TDA in this "final-model-only" setting, we propose further training, with appropria… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: 28 pages, 8 figures

  33. arXiv:2412.01075  [pdf, other

    cs.LG cs.AI

    Multi-Agent Deep Reinforcement Learning for Distributed and Autonomous Platoon Coordination via Speed-regulation over Large-scale Transportation Networks

    Authors: Dixiao Wei, Peng Yi, Jinlong Lei, Xingyi Zhu

    Abstract: Truck platooning technology enables a group of trucks to travel closely together, with which the platoon can save fuel, improve traffic flow efficiency, and improve safety. In this paper, we consider the platoon coordination problem in a large-scale transportation network, to promote cooperation among trucks and optimize the overall efficiency. Involving the regulation of both speed and departure… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  34. arXiv:2411.19845  [pdf, other

    cs.CV cs.LG eess.SP

    A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications

    Authors: Liqiang Zhang, Ye Tian, Dongyan Wei

    Abstract: With the development of smart cities, the demand for continuous pedestrian navigation in large-scale urban environments has significantly increased. While global navigation satellite systems (GNSS) provide low-cost and reliable positioning services, they are often hindered in complex urban canyon environments. Thus, exploring opportunistic signals for positioning in urban areas has become a key so… ▽ More

    Submitted 14 December, 2024; v1 submitted 29 November, 2024; originally announced November 2024.

  35. arXiv:2410.21411  [pdf, other

    cs.CV

    SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization

    Authors: Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister

    Abstract: Social relation reasoning aims to identify relation categories such as friends, spouses, and colleagues from images. While current methods adopt the paradigm of training a dedicated network end-to-end using labeled image data, they are limited in terms of generalizability and interpretability. To address these issues, we first present a simple yet well-crafted framework named {\name}, which combin… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024. Project page: https://mengzibin.github.io/SocialGPT.github.io/

  36. arXiv:2410.16484  [pdf, other

    cs.LG

    Identifying Sub-networks in Neural Networks via Functionally Similar Representations

    Authors: Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Dennis Wei

    Abstract: Providing human-understandable insights into the inner workings of neural networks is an important step toward achieving more explainable and trustworthy AI. Existing approaches to such mechanistic interpretability typically require substantial prior knowledge and manual effort, with strategies tailored to specific tasks. In this work, we take a step toward automating the understanding of the netw… ▽ More

    Submitted 1 February, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

  37. arXiv:2409.16539  [pdf, other

    cs.AI

    Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation

    Authors: Yuanchang Luo, Jiaxin Guo, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Hao Yang

    Abstract: This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, sp… ▽ More

    Submitted 29 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: 7 pages, 2 figures, wmt24

  38. arXiv:2409.16331  [pdf, other

    cs.CL cs.AI

    Exploring the traditional NMT model and Large Language Model for chat translation

    Authors: Jinlong Yang, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Shaojun Li, Yuhao Xie, Yuanchang Luo, Jiawei Zheng, Bin Wei, Hao Yang

    Abstract: This paper describes the submissions of Huawei Translation Services Center(HW-TSC) to WMT24 chat translation shared task on English$\leftrightarrow$Germany (en-de) bidirection. The experiments involved fine-tuning models using chat data and exploring various strategies, including Minimum Bayesian Risk (MBR) decoding and self-training. The results show significant performance improvements in certai… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 7 pages, 6 Tables, WMT24

  39. arXiv:2409.15924  [pdf, other

    cs.CL cs.AI

    Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain

    Authors: Yuanchang Luo, Zhanglin Wu, Daimeng Wei, Hengchao Shang, Zongyao Li, Jiaxin Guo, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Yuhao Xie, Jiawei Zheng Bin Wei, Hao Yang

    Abstract: This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). We participated in three translation tasks: spanish to aragonese (es-arg), spanish to aranese (es-arn), and spanish to asturian (es-ast). For these three translation tasks, we use training strategies such as multilingual transfer, r… ▽ More

    Submitted 29 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: 6 pages,wmt24. arXiv admin note: substantial text overlap with arXiv:2409.14842; text overlap with arXiv:2409.14800

  40. arXiv:2409.15879  [pdf, other

    cs.CL cs.AI

    Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning

    Authors: Bin Wei, Jiawei Zhen, Zongyao Li, Zhanglin Wu, Daimeng Wei, Jiaxin Guo, Zhiqiang Rao, Shaojun Li, Yuanchang Luo, Hengchao Shang, Jinlong Yang, Yuhao Xie, Hao Yang

    Abstract: This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop a reliable machine translation system for low-resource Indian languages, we employed two distinct knowledge transfer strategies, taking into account the characteristics of the language scripts and the support available from existing open-source m… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 6 pages, wmt24. arXiv admin note: substantial text overlap with arXiv:2409.14800

  41. arXiv:2409.14842  [pdf, other

    cs.AI cs.CL

    HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks

    Authors: Zhanglin Wu, Yuanchang Luo, Daimeng Wei, Jiawei Zheng, Bin Wei, Zongyao Li, Hengchao Shang, Jiaxin Guo, Shaojun Li, Weidong Zhang, Ning Xie, Hao Yang

    Abstract: This paper presents the submission of Huawei Translation Services Center (HW-TSC) to machine translation tasks of the 20th China Conference on Machine Translation (CCMT 2024). We participate in the bilingual machine translation task and multi-domain machine translation task. For these two translation tasks, we use training strategies such as regularized dropout, bidirectional training, data divers… ▽ More

    Submitted 8 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: 13 pages, 2 figures, 6 Tables, CCMT2024. arXiv admin note: substantial text overlap with arXiv:2409.14800

  42. arXiv:2409.14800  [pdf, other

    cs.AI

    Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

    Authors: Zhanglin Wu, Daimeng Wei, Zongyao Li, Hengchao Shang, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Ning Xie, Hao Yang

    Abstract: This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT24 general machine translation (MT) shared task, where we participate in the English to Chinese (en2zh) language pair. Similar to previous years' work, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated traini… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 10 pages, 4 figures, 2 Tables, EMNLP2024

  43. arXiv:2409.08597  [pdf, other

    cs.SD cs.CL eess.AS

    LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation

    Authors: Shaojun Li, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Xianghui He, Min Zhang, Hao Yang

    Abstract: Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy. However, existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents. To address this, we propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-bas… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: submitted to ICASSP 2025

  44. arXiv:2408.08056  [pdf, other

    cs.LG

    DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

    Authors: Chuyang Ye, Dongyan Wei, Zhendong Liu, Yuanyi Pang, Yixi Lin, Jiarong Liao, Qinting Jiang, Xianghua Fu, Qing Li, Jingyan Jiang

    Abstract: Test-time adaptation (TTA) effectively addresses distribution shifts between training and testing data by adjusting models on test samples, which is crucial for improving model inference in real-world applications. However, traditional TTA methods typically follow a fixed pattern to address the dynamic data patterns (low-diversity or high-diversity patterns) often leading to performance degradatio… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  45. arXiv:2407.15693  [pdf, ps, other

    math.AP cs.LG math.FA math.ST

    Fisher-Rao Gradient Flow: Geodesic Convexity and Functional Inequalities

    Authors: José A. Carrillo, Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Dongyi Wei

    Abstract: The dynamics of probability density functions has been extensively studied in science and engineering to understand physical phenomena and facilitate algorithmic design. Of particular interest are dynamics that can be formulated as gradient flows of energy functionals under the Wasserstein metric. The development of functional inequalities, such as the log-Sobolev inequality, plays a pivotal role… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 38 pages

  46. arXiv:2407.02005  [pdf, other

    cs.CL cs.SD eess.AS

    An End-to-End Speech Summarization Using Large Language Model

    Authors: Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang

    Abstract: Abstractive Speech Summarization (SSum) aims to generate human-like text summaries from spoken content. It encounters difficulties in handling long speech input and capturing the intricate cross-modal mapping between long speech inputs and short text summaries. Research on large language models (LLMs) and multimodal information fusion has provided new insights for addressing these challenges. In t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  47. arXiv:2406.11780  [pdf, other

    cs.LG cs.AI cs.CL

    Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

    Authors: Swanand Ravindra Kadhe, Farhan Ahmed, Dennis Wei, Nathalie Baracaldo, Inkit Padhi

    Abstract: Large language models (LLMs) have shown to pose social and ethical risks such as generating toxic language or facilitating malicious use of hazardous knowledge. Machine unlearning is a promising approach to improve LLM safety by directly removing harmful behaviors and knowledge. In this paper, we propose "SPlit, UNlearn, MerGE" (SPUNGE), a framework that can be used with any unlearning method to a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  48. arXiv:2406.09696  [pdf, other

    eess.IV cs.CV

    MoME: Mixture of Multimodal Experts for Cancer Survival Prediction

    Authors: Conghao Xiong, Hao Chen, Hao Zheng, Dong Wei, Yefeng Zheng, Joseph J. Y. Sung, Irwin King

    Abstract: Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making. There are two main challenges in this task: significant heterogeneity and complex inter- and intra-modal interactions between the two modalities. Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 + 1/2 pages, early accepted to MICCAI2024

  49. arXiv:2406.08666  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Interventional Causal Discovery in a Mixture of DAGs

    Authors: Burak Varıcı, Dmitriy Katz-Rogozhnikov, Dennis Wei, Prasanna Sattigeri, Ali Tajer

    Abstract: Causal interactions among a group of variables are often modeled by a single causal graph. In some domains, however, these interactions are best described by multiple co-existing causal graphs, e.g., in dynamical systems or genomics. This paper addresses the hitherto unknown role of interventions in learning causal interactions among variables governed by a mixture of causal systems, each modeled… ▽ More

    Submitted 2 December, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 camera-ready version

  50. arXiv:2406.05413  [pdf, other

    cs.LG cs.AI cs.CV cs.MM

    Discover Your Neighbors: Advanced Stable Test-Time Adaptation in Dynamic World

    Authors: Qinting Jiang, Chuyang Ye, Dongyan Wei, Yuan Xue, Jingyan Jiang, Zhi Wang

    Abstract: Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for multimedia applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. This work provides a new perspective on analyzing batch n… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages