Skip to main content

Showing 1–50 of 1,026 results for author: Tao, D

.
  1. arXiv:2505.10049  [pdf, ps, other

    cs.CV

    Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field

    Authors: Jinlong Fan, Xuepu Zeng, Jing Zhang, Mingming Gong, Yuxiang Yang, Dacheng Tao

    Abstract: Dynamic scene representation and reconstruction have undergone transformative advances in recent years, catalyzed by breakthroughs in neural radiance fields and 3D Gaussian splatting techniques. While initially developed for static environments, these methodologies have rapidly evolved to address the complexities inherent in 4D dynamic scenes through an expansive body of research. Coupled with inn… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.06679  [pdf, other

    cs.CV

    Jailbreaking the Text-to-Video Generative Models

    Authors: Jiayang Liu, Siyuan Liang, Shiqian Zhao, Rongcheng Tu, Wenbo Zhou, Xiaochun Cao, Dacheng Tao, Siew Kei Lam

    Abstract: Text-to-video generative models have achieved significant progress, driven by the rapid advancements in diffusion models, with notable examples including Pika, Luma, Kling, and Sora. Despite their remarkable generation ability, their vulnerability to jailbreak attack, i.e. to generate unsafe content, including pornography, violence, and discrimination, raises serious safety concerns. Existing effo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  3. arXiv:2505.06413  [pdf, ps, other

    cs.CV cs.AI

    Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving

    Authors: Ming Liu, Siyuan Liang, Koushik Howlader, Liwen Wang, Dacheng Tao, Wensheng Zhang

    Abstract: Vision-Language Models (VLMs) have been integrated into autonomous driving systems to enhance reasoning capabilities through tasks such as Visual Question Answering (VQA). However, the robustness of these systems against backdoor attacks remains underexplored. In this paper, we propose a natural reflection-based backdoor attack targeting VLM systems in autonomous driving scenarios, aiming to induc… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  4. arXiv:2505.01822  [pdf, other

    cs.LG cs.AI

    Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

    Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  5. arXiv:2505.01665  [pdf, other

    cs.LG

    Adaptively Point-weighting Curriculum Learning

    Authors: Wensheng Li, Hao Wang, Ruifeng Zhou, Hanting Guan, Chao Zhang, Dacheng Tao

    Abstract: Curriculum learning (CL) is referred to as a training strategy that makes easy samples learned first and then fits hard samples. It imitates the process of humans learning knowledge, and has become a potential manner of effectively training deep networks. In this study, we develop the adaptively point-weighting (APW) curriculum learning algorithm, which adaptively assigns the weight to every train… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  6. arXiv:2505.01043  [pdf, other

    cs.LG cs.AI

    Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

    Authors: Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved impressive performance across various domains. However, the substantial hardware resources required for their training present a significant barrier to efficiency and scalability. To mitigate this challenge, low-precision training techniques have been widely adopted, leading to notable advancements in training efficiency. Despite these gains, low-precisio… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  7. arXiv:2504.21659  [pdf, other

    cs.AI cs.CL

    AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

    Authors: Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen

    Abstract: Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive r… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  8. arXiv:2504.20644  [pdf, other

    cs.LG

    Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection

    Authors: Ziqing Fan, Siyuan Du, Shengchao Hu, Pingjie Wang, Li Shen, Ya Zhang, Dacheng Tao, Yanfeng Wang

    Abstract: Selecting high-quality pre-training data for large language models (LLMs) is crucial for enhancing their overall performance under limited computation budget, improving both training and sample efficiency. Recent advancements in file selection primarily rely on using an existing or trained proxy model to assess the similarity of samples to a target domain, such as high quality sources BookCorpus a… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  9. arXiv:2504.20380  [pdf, other

    cs.RO

    LPVIMO-SAM: Tightly-coupled LiDAR/Polarization Vision/Inertial/Magnetometer/Optical Flow Odometry via Smoothing and Mapping

    Authors: Derui Shan, Peng Guo, Wenshuo Li, Du Tao

    Abstract: We propose a tightly-coupled LiDAR/Polarization Vision/Inertial/Magnetometer/Optical Flow Odometry via Smoothing and Mapping (LPVIMO-SAM) framework, which integrates LiDAR, polarization vision, inertial measurement unit, magnetometer, and optical flow in a tightly-coupled fusion. This framework enables high-precision and highly robust real-time state estimation and map construction in challenging… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: submitted to IROS2025

  10. TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System

    Authors: Zheng Wei, Jing Xing, Yida Gu, Wenjing Huang, Dong Dai, Guangming Tan, Dingwen Tao

    Abstract: Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- posed. Nevertheless, due to the long update path and substantial amount of random I/O involved in erasure code update processes, the resulting long latency and low throughput often fail to meet the requir… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 14 pages, 8 figures, accepted by ACM HPDC 2025

  11. arXiv:2504.17490  [pdf, ps, other

    cs.LG cs.AI

    Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning

    Authors: Mingqi Yuan, Qi Wang, Guozheng Ma, Bo Li, Xin Jin, Yunbo Wang, Xiaokang Yang, Wenjun Zeng, Dacheng Tao

    Abstract: Developing lifelong learning agents is crucial for artificial general intelligence. However, deep reinforcement learning (RL) systems often suffer from plasticity loss, where neural networks gradually lose their ability to adapt during training. Despite its significance, this field lacks unified benchmarks and evaluation protocols. We introduce Plasticine, the first open-source framework for bench… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 23 pages

  12. Ultrafast ultrasound coded vector Doppler imaging of blood flow velocity and resistivity

    Authors: Shaoyuan Yan, Yiming Ding, Guoao Ma, Yapeng Fu, Kailiang Xu, Dean Ta

    Abstract: Dynamic and precise measurement of cerebral blood flow velocity is crucial in neuroscience and the diagnosis of cerebrovascular diseases. Traditional color Doppler ultrasound can only measure the velocity component along the ultrasound beam, which restricts its ability to accurately capture the complete blood flow vector in complex environments. To overcome these limitations, we propose an ultrafa… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Journal ref: S. Yan, Y. Ding, G. Ma, Y. Fu, K. Xu, and D. Ta, Ultrafast Ultrasound Coded Vector Doppler Imaging of Blood Flow Velocity and Resistivity, Acta Physica Sinica, vol. 74, no. 1, 2025

  13. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  14. arXiv:2504.15512  [pdf, other

    cs.CR cs.LG

    T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models

    Authors: Siyuan Liang, Jiayang Liu, Jiecheng Zhai, Tianmeng Fang, Rongcheng Tu, Aishan Liu, Xiaochun Cao, Dacheng Tao

    Abstract: The rapid development of generative artificial intelligence has made text to video models essential for building future multimodal world simulators. However, these models remain vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content. Such vulnerabilities undermine the reliability and security of simulation b… ▽ More

    Submitted 26 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 33 pages, 9 figures

  15. arXiv:2504.14294  [pdf, other

    cs.CV

    From Missing Pieces to Masterpieces: Image Completion with Context-Adaptive Diffusion

    Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Michael Felsberg, Dacheng Tao, Xuelong Li

    Abstract: Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resultin… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Accepted in TPAMI

  16. arXiv:2504.14174  [pdf, other

    cs.LG cs.AI

    A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences

    Authors: Jing Han, Hanting Chen, Kai Han, Xiaomeng Huang, Yongyun Hu, Wenjun Xu, Dacheng Tao, Ping Zhang

    Abstract: With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporat… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Perspective article

  17. arXiv:2504.13504  [pdf, other

    astro-ph.IM astro-ph.SR

    SolarZip: An Efficient and Adaptive Compression Framework for Solar EUV Imaging Data

    Authors: Zedong Liu, Song Tan, Alexander Warmuth, Frédéric Schuller, Yun Hong, Wenjing Huang, Yida Gu, Bojing Zhu, Guangming Tan, Dingwen Tao

    Abstract: Context: With the advancement of solar physics research, next-generation solar space missions and ground-based telescopes face significant challenges in efficiently transmitting and/or storing large-scale observational data. Aims: We develop an efficient compression and evaluation framework for solar EUV data, specifically optimized for Solar Orbiter Extreme Ultraviolet Imager (EUI) data, signific… ▽ More

    Submitted 21 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  18. arXiv:2504.12323  [pdf, other

    cs.CL cs.AI

    The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation

    Authors: Zheng Zhang, Ning Li, Qi Liu, Rui Li, Weibo Gao, Qingyang Mao, Zhenya Huang, Baosheng Yu, Dacheng Tao

    Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant document from external knowledge sources. By referencing this external knowledge, RAG effectively reduces the generation of factually incorrect content and addresses hallucination issues within LLMs. Recently, there has been growing attention to improving the performance and efficiency of RAG systems… ▽ More

    Submitted 19 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: 12 pages

  19. arXiv:2504.12319  [pdf, other

    cs.IR cs.AI cs.CL q-fin.CP

    Specialized text classification: an approach to classifying Open Banking transactions

    Authors: Duc Tuyen TA, Wajdi Ben Saad, Ji Young Oh

    Abstract: With the introduction of the PSD2 regulation in the EU which established the Open Banking framework, a new window of opportunities has opened for banks and fintechs to explore and enrich Bank transaction descriptions with the aim of building a better understanding of customer behavior, while using this understanding to prevent fraud, reduce risks and offer more competitive and tailored services.… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Journal ref: 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT)

  20. arXiv:2504.12108  [pdf, other

    cs.CL

    Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation

    Authors: Shizhan Cai, Liang Ding, Dacheng Tao

    Abstract: The rapid development of Large Language Models (LLMs) has intensified concerns about content traceability and potential misuse. Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks. To address these issues, we propose a novel watermarking scheme that improves both detectability and text quality b… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  21. arXiv:2504.09130  [pdf, other

    cs.CL

    VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

    Authors: Yikun Wang, Siyin Wang, Qinyuan Cheng, Zhaoye Fei, Liang Ding, Qipeng Guo, Dacheng Tao, Xipeng Qiu

    Abstract: Recent advancements in Large Vision-Language Models have showcased remarkable capabilities. However, they often falter when confronted with complex reasoning tasks that humans typically address through visual aids and deliberate, step-by-step thinking. While existing methods have explored text-based slow thinking or rudimentary visual assistance, they fall short of capturing the intricate, interle… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: 12 pages

  22. arXiv:2504.08690  [pdf, other

    cs.CL cs.AI

    Fast-Slow-Thinking: Complex Task Solving with Large Language Models

    Authors: Yiliu Sun, Yanfang Zhang, Zicheng Zhao, Sheng Wan, Dacheng Tao, Chen Gong

    Abstract: Nowadays, Large Language Models (LLMs) have been gradually employed to solve complex tasks. To face the challenge, task decomposition has become an effective way, which proposes to divide a complex task into multiple simpler subtasks and then solve them separately so that the difficulty of the original task can be reduced. However, the performance of existing task decomposition methods can be subo… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 37 pages, 7 figures

  23. arXiv:2504.08000  [pdf, other

    cs.AI cs.LG

    Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

    Authors: Jiahua Lan, Sen Zhang, Haixia Pan, Ruijun Liu, Li Shen, Dacheng Tao

    Abstract: In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grai… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Reinforcement learning, RL skill neuron, stability and plasticity

  24. Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models

    Authors: Yisong Xiao, Aishan Liu, Siyuan Liang, Xianglong Liu, Dacheng Tao

    Abstract: LLMs have demonstrated remarkable performance across diverse applications, yet they inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups. These associations perpetuate and even amplify harmful social biases, raising significant fairness concerns. To mitigate such biases, prior studies have attempted to… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by ISSTA 2025.20 pages

  25. arXiv:2504.07527  [pdf, other

    cs.CL

    Supervised Optimism Correction: Be Confident When LLMs Are Sure

    Authors: Junjie Zhang, Rushuai Yang, Shunyu Liu, Ting-En Lin, Fei Huang, Yi Chen, Yongbin Li, Dacheng Tao

    Abstract: In this work, we establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning under the token-level Markov decision process, revealing that large language models indeed learn an implicit $Q$-function for inference. Through this theoretical lens, we demonstrate that the widely used beam search method suffers from unacceptable over-optimism, where infere… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  26. arXiv:2504.06553  [pdf, other

    cs.RO cs.CV

    ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

    Authors: Yun Chang, Leonor Fermoselle, Duy Ta, Bernadette Bucher, Luca Carlone, Jiuguang Wang

    Abstract: While recent work in scene reconstruction and understanding has made strides in grounding natural language to physical 3D environments, it is still challenging to ground abstract, high-level instructions to a 3D scene. High-level instructions might not explicitly invoke semantic elements in the scene, and even the process of breaking a high-level task into a set of more concrete subtasks, a proces… ▽ More

    Submitted 11 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  27. arXiv:2504.02492  [pdf

    cs.RO cs.AI

    Industrial Internet Robot Collaboration System and Edge Computing Optimization

    Authors: Qian Zuo, Dajun Tao, Tian Qi, Jieyi Xie, Zijie Zhou, Zhen Tian, Yu Mingyu

    Abstract: In a complex environment, for a mobile robot to safely and collision - free avoid all obstacles, it poses high requirements for its intelligence level. Given that the information such as the position and geometric characteristics of obstacles is random, the control parameters of the robot, such as velocity and angular velocity, are also prone to random deviations. To address this issue in the fram… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  28. arXiv:2504.02272  [pdf, other

    cs.CV

    Generative Classifier for Domain Generalization

    Authors: Shaocong Long, Qianyu Zhou, Xiangtai Li, Chenhao Ying, Yunhai Tong, Lizhuang Ma, Yuan Luo, Dacheng Tao

    Abstract: Domain generalization (DG) aims to improve the generalizability of computer vision models toward distribution shifts. The mainstream DG methods focus on learning domain invariance, however, such methods overlook the potential inherent in domain-specific information. While the prevailing practice of discriminative linear classifier has been tailored to domain-invariant features, it struggles when c… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Code will be available at https://github.com/longshaocong/GCDG

  29. arXiv:2503.23924  [pdf, other

    cs.CL cs.LG

    Model Hemorrhage and the Robustness Limits of Large Language Models

    Authors: Ziyang Ma, Zuchao Li, Lefei Zhang, Gui-Song Xia, Bo Du, Liangpei Zhang, Dacheng Tao

    Abstract: Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment through quantization, pruning, or decoding strategy adjustments. We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes. Through systematic analysis o… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 33 pages, 18 figures

  30. arXiv:2503.21460  [pdf, other

    cs.CL

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    Authors: Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architec… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 329 papers surveyed, resources are at https://github.com/luo-junyu/Awesome-Agent-Papers

  31. arXiv:2503.20031  [pdf, other

    astro-ph.IM cs.CE

    Lossy Compression of Scientific Data: Applications Constrains and Requirements

    Authors: Franck Cappello, Allison Baker, Ebru Bozda, Martin Burtscher, Kyle Chard, Sheng Di, Paul Christopher O Grady, Peng Jiang, Shaomeng Li, Erik Lindahl, Peter Lindstrom, Magnus Lundborg, Kai Zhao, Xin Liang, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Dingwen Tao, Jiannan Tian, Robert Underwood, Kazutomo Yoshii, Danylo Lykov, Yuri Alexeev, Kyle Gerard Felker

    Abstract: Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specif… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 33 pages

  32. arXiv:2503.19937  [pdf, other

    cs.CV cs.AI

    Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

    Authors: Zhiyao Ren, Yibing Zhan, Baosheng Yu, Dacheng Tao

    Abstract: Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and gene… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  33. arXiv:2503.15092  [pdf, other

    cs.CR cs.AI cs.CL

    Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

    Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao

    Abstract: This study presents the first comprehensive safety evaluation of the DeepSeek models, focusing on evaluating the safety risks associated with their generated content. Our evaluation encompasses DeepSeek's latest generation of large language models, multimodal large language models, and text-to-image models, systematically examining their performance regarding unsafe content generation. Notably, we… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  34. arXiv:2503.14910  [pdf, other

    cs.CV

    Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift

    Authors: Jingyi Liao, Xun Xu, Yongyi Su, Rong-Cheng Tu, Yifan Liu, Dacheng Tao, Xulei Yang

    Abstract: Anomaly detection plays a crucial role in quality control for industrial applications. However, ensuring robustness under unseen domain shifts such as lighting variations or sensor drift remains a significant challenge. Existing methods attempt to address domain shifts by training generalizable models but often rely on prior knowledge of target distributions and can hardly generalise to backbones… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  35. arXiv:2503.12937  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

    Authors: Jingyi Zhang, Jiaxing Huang, Huanjin Yao, Shunyu Liu, Xikun Zhang, Shijian Lu, Dacheng Tao

    Abstract: Recent studies generally enhance MLLMs' reasoning capabilities via supervised fine-tuning on high-quality chain-of-thought reasoning data, which often leads models to merely imitate successful reasoning paths without understanding what the wrong reasoning paths are. In this work, we aim to enhance the MLLMs' reasoning ability beyond passively imitating positive reasoning paths. To this end, we des… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  36. arXiv:2503.11701  [pdf, other

    cs.LG

    A Survey of Direct Preference Optimization

    Authors: Shunyu Liu, Wenkai Fang, Zetian Hu, Junjie Zhang, Yang Zhou, Kongcheng Zhang, Rongcheng Tu, Ting-En Lin, Fei Huang, Mingli Song, Yongbin Li, Dacheng Tao

    Abstract: Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful paradigm for aligning LLMs with human preferences, its reliance on complex reward modeling introduces inherent trade-offs in compu… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  37. arXiv:2503.09656  [pdf, other

    cs.LG cs.CL

    LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics

    Authors: Jialiang Tang, Shuo Chen, Chen Gong, Jing Zhang, Dacheng Tao

    Abstract: Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring. Recent studies have revealed that Large Language Models (LLMs), with their powerful in-contextual modeling capabilities, hold significant potential for TSF. However, existing LLM-based methods usually perform suboptimally because they neglect the inherent characteristics of time seri… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  38. arXiv:2503.07340  [pdf

    cs.RO cs.AI cs.LG

    Research and Design on Intelligent Recognition of Unordered Targets for Robots Based on Reinforcement Learning

    Authors: Yiting Mao, Dajun Tao, Shengyuan Zhang, Tian Qi, Keqin Li

    Abstract: In the field of robot target recognition research driven by artificial intelligence (AI), factors such as the disordered distribution of targets, the complexity of the environment, the massive scale of data, and noise interference have significantly restricted the improvement of target recognition accuracy. Against the backdrop of the continuous iteration and upgrading of current AI technologies,… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  39. arXiv:2503.06796  [pdf, other

    cs.RO

    RoboDesign1M: A Large-scale Dataset for Robot Design Understanding

    Authors: Tri Le, Toan Nguyen, Quang Tran, Quang Nguyen, Baoru Huang, Hoan Nguyen, Minh Nhat Vu, Tung D. Ta, Anh Nguyen

    Abstract: Robot design is a complex and time-consuming process that requires specialized expertise. Gaining a deeper understanding of robot design data can enable various applications, including automated design generation, retrieving example designs from text, and developing AI-powered design assistants. While recent advancements in foundation models present promising approaches to addressing these challen… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 8 pages

  40. arXiv:2503.05794  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

    Authors: Yiming Li, Kaiying Yan, Shuo Shao, Tongqing Zhai, Shu-Tao Xia, Zhan Qin, Dacheng Tao

    Abstract: With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)… ▽ More

    Submitted 5 April, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

    Comments: 14 pages. The journal extension of our ICASSP'21 paper (arXiv:2010.11607)

  41. arXiv:2503.05497  [pdf, ps, other

    nucl-ex nucl-th

    $t$+$t$ cluster states in $^{6}$He

    Authors: W. H. Ma, D. Y. Tao, B. Zhou, J. S. Wang, Y. G. Ma, D. Q. Fang, W. B. He, Y. Y. Yang, J. B. Ma, S. L. Jin, P. Ma, J. X. Li, Y. S. Song, Q. Hu, Z. Bai, M. R. Huang, X. Q. Liu, Z. H. Gao, F. F. Duan, S. Y. Jin, S. W. Xu, G. M. Yu, T. F. Wang, Q. Wang

    Abstract: The study of $t$+$t$ cluster states in $^{6}$He provides valuable insights into exotic nuclear structures and the behavior of fermionic cluster systems. This study shows rich cluster resonant state structures above the threshold, identified by experimental reconstruction and theoretical calculations. The excitation energy spectrum above the $t$+$t$ threshold in $^{6}$He is measured via the fragmen… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 9 pages, 7 figures, 1 table

  42. arXiv:2503.04550  [pdf, other

    cs.AI

    Benchmarking Reasoning Robustness in Large Language Models

    Authors: Tong Yu, Yongcheng Jing, Xikun Zhang, Wentao Jiang, Wenjie Wu, Yingjie Wang, Wenbin Hu, Bo Du, Dacheng Tao

    Abstract: Despite the recent success of large language models (LLMs) in reasoning such as DeepSeek, we for the first time identify a key dilemma in reasoning robustness and generalization: significant performance degradation on novel or incomplete data, suggesting a reliance on memorized patterns rather than systematic reasoning. Our closer examination reveals four key unique limitations underlying this iss… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  43. arXiv:2503.01642  [pdf, other

    cs.AI

    Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning

    Authors: Wenjie Wu, Yongcheng Jing, Yingjie Wang, Wenbin Hu, Dacheng Tao

    Abstract: Recent large language model (LLM) reasoning, despite its success, suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth, particularly in small-scale models deployed in resource-constrained environments. This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning to address these challen… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  44. arXiv:2503.01222  [pdf, other

    cs.CV cs.CL

    Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG

    Authors: Wenbin Wang, Yongcheng Jing, Liang Ding, Yingjie Wang, Li Shen, Yong Luo, Bo Du, Dacheng Tao

    Abstract: High-resolution (HR) image perception remains a key challenge in multimodal large language models (MLLMs). To overcome the limitations of existing methods, this paper shifts away from prior dedicated heuristic approaches and revisits the most fundamental idea to HR perception by enhancing the long-context capability of MLLMs, driven by recent advances in long-context techniques like retrieval-augm… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  45. arXiv:2502.19982  [pdf, other

    cs.CL cs.LG

    Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

    Authors: Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao

    Abstract: In this paper, we explore machine unlearning from a novel dimension, by studying how to safeguard model unlearning in large language models (LLMs). Our goal is to prevent unlearned models from recalling any related memory of the targeted knowledge.We begin by uncovering a surprisingly simple yet overlooked fact: existing methods typically erase only the exact expressions of the targeted knowledge,… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  46. arXiv:2502.18865  [pdf, other

    cs.LG cs.AI

    A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops

    Authors: Shi Fu, Yingjie Wang, Yuzhu Chen, Xinmei Tian, Dacheng Tao

    Abstract: High-quality data is essential for training large generative models, yet the vast reservoir of real data available online has become nearly depleted. Consequently, models increasingly generate their own data for further training, forming Self-consuming Training Loops (STLs). However, the empirical results have been strikingly inconsistent: some models degrade or even collapse, while others success… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR 2025

  47. arXiv:2502.18511  [pdf, other

    cs.CR cs.AI

    ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models

    Authors: Xuxu Liu, Siyuan Liang, Mengya Han, Yong Luo, Aishan Liu, Xiantao Cai, Zheng He, Dacheng Tao

    Abstract: Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor a… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  48. arXiv:2502.16235  [pdf, other

    cs.AI

    Dynamic Parallel Tree Search for Efficient LLM Reasoning

    Authors: Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao

    Abstract: Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree. However, recent methods focus on search accuracy while overlooking computational efficiency. The challenges of accelerating the ToT lie in the frequent switching of reasoning focus, and the redundant exploration of suboptimal solutions. To alleviate this dilemma, we propose Dynam… ▽ More

    Submitted 27 February, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

    Comments: 17 pages, 11 figures

  49. arXiv:2502.14881  [pdf, other

    cs.CR cs.CV

    A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

    Authors: Mang Ye, Xuankun Rong, Wenke Huang, Bo Du, Nenghai Yu, Dacheng Tao

    Abstract: With the rapid advancement of Large Vision-Language Models (LVLMs), ensuring their safety has emerged as a crucial area of research. This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilitie… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 22 pages, 2 figures

  50. arXiv:2502.14645  [pdf, other

    cs.CL cs.AI

    Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs

    Authors: Yuchen Wu, Liang Ding, Li Shen, Dacheng Tao

    Abstract: Knowledge editing allows for efficient adaptation of large language models (LLMs) to new information or corrections without requiring full retraining. However, prior methods typically focus on either single-language editing or basic multilingual editing, failing to achieve true cross-linguistic knowledge synchronization. To address this, we present a simple and practical state-of-the-art (SOTA) re… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.