Skip to main content

Showing 1–50 of 192 results for author: Dong, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21071  [pdf, ps, other

    cs.LG cs.CL

    Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph

    Authors: Jingwei Wang, Zai Zhang, Hao Qian, Chunjing Gan, Binbin Hu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Bin Shi, Bo Dong

    Abstract: Teaching large language models (LLMs) to use tools is crucial for improving their problem-solving abilities and expanding their applications. However, effectively using tools is challenging because it requires a deep understanding of tool functionalities and user intentions. Previous methods relied mainly on LLMs to generate instruction data, but the quality of these data was often insufficient. I… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 20 pages, 12 figures

  2. arXiv:2506.18048  [pdf, ps, other

    cs.CV

    CLGRPO: Reasoning Ability Enhancement for Small VLMs

    Authors: Fanyi Wang, Binzhi Dong, Haotian Hu, Jinjin Xu, Zhiwang Zhang

    Abstract: Small Vision Language Models (SVLMs) generally refer to models with parameter sizes less than or equal to 2B. Their low cost and power consumption characteristics confer high commercial value. However, their reasoning abilities are limited by the number of parameters. To address this issue, this paper proposes a post-training optimization paradigm called the Incremental Training Strategy to enhanc… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 11 pages, 5 figures

  3. arXiv:2506.07406  [pdf, ps, other

    cs.LG cs.AI

    InverseScope: Scalable Activation Inversion for Interpreting Large Language Models

    Authors: Yifan Luo, Zhennan Zhou, Bin Dong

    Abstract: Understanding the internal representations of large language models (LLMs) is a central challenge in interpretability research. Existing feature interpretability methods often rely on strong assumptions about the structure of representations that may not hold in practice. In this work, we introduce InverseScope, an assumption-light and scalable framework for interpreting neural activations via inp… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 18 pages, 8 figures

  4. arXiv:2506.03673  [pdf, ps, other

    cs.AI

    Reason from Future: Reverse Thought Chain Enhances LLM Reasoning

    Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu

    Abstract: It has been demonstrated that carefully designed reasoning paradigms, like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), can enhance the reasoning capabilities of small language models by detailed thinking and extensive thought searching, unbounded branching factors in the searching space create prohibitive reasoning consumption. However these methods fall into the trap of local optimum reason… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 findings

  5. arXiv:2505.24238  [pdf, ps, other

    cs.CV cs.LG

    MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

    Authors: Bowen Dong, Minheng Ni, Zitong Huang, Guanglei Yang, Wangmeng Zuo, Lei Zhang

    Abstract: Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations. This failure constitutes a significant issue and hinders the diagnosis of multim… ▽ More

    Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  6. arXiv:2505.23950  [pdf, ps, other

    cs.AI

    InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

    Authors: Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, Qirui Zheng, Wenxin Li, Sirui Han, Yike Guo, Yaodong Yang

    Abstract: As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: What essential capabilities are still missing? A critical aspect of human learning is continuous interaction with the environment -- not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support mul… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.21297  [pdf, ps, other

    cs.CL

    rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

    Authors: Yifei Liu, Li Lyna Zhang, Yi Zhu, Bingcheng Dong, Xudong Zhou, Ning Shang, Fan Yang, Mao Yang

    Abstract: Advancing code reasoning in large language models (LLMs) is fundamentally limited by the scarcity of high-difficulty datasets, especially those with verifiable input-output test cases necessary for rigorous solution validation at scale. We introduce rStar-Coder, which significantly improves LLM code reasoning capabilities by constructing a large-scale, verified dataset of 418K competition-level co… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.21036  [pdf, ps, other

    cs.CV cs.AI

    RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy

    Authors: Aiyue Chen, Bin Dong, Jingru Li, Jing Lin, Kun Tian, Yiwu Yao, Gongyi Wang

    Abstract: Video generation using diffusion models is highly computationally intensive, with 3D attention in Diffusion Transformer (DiT) models accounting for over 80\% of the total computational resources. In this work, we introduce {\bf RainFusion}, a novel training-free sparse attention method that exploits inherent sparsity nature in visual data to accelerate attention computation while preserving video… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  9. arXiv:2505.20613  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.LO

    REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

    Authors: Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, Bin Dong

    Abstract: Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (… ▽ More

    Submitted 16 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  10. arXiv:2505.15585  [pdf, ps, other

    cs.IR cs.CL cs.LG

    MIRB: Mathematical Information Retrieval Benchmark

    Authors: Haocheng Ju, Bin Dong

    Abstract: Mathematical Information Retrieval (MIR) is the task of retrieving information from mathematical documents and plays a key role in various applications, including theorem search in mathematical libraries, answer retrieval on math forums, and premise selection in automated theorem proving. However, a unified benchmark for evaluating these diverse retrieval tasks has been lacking. In this paper, we… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Our code and data are available at https://github.com/j991222/mirb and https://huggingface.co/collections/hcju/mirb-6827001711765454f58c5a76

  11. arXiv:2505.05727  [pdf, ps, other

    cs.NE

    A High-Dimensional Feature Selection Algorithm Based on Multiobjective Differential Evolution

    Authors: Zhenxing Zhang, Qianxiang An, Yilei Wang, Chenfeng Wu, Baoling Dong, Chunjie Zhou

    Abstract: Multiobjective feature selection seeks to determine the most discriminative feature subset by simultaneously optimizing two conflicting objectives: minimizing the number of selected features and the classification error rate. The goal is to enhance the model's predictive performance and computational efficiency. However, feature redundancy and interdependence in high-dimensional data present consi… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  12. arXiv:2505.04405  [pdf

    physics.optics cs.AI physics.app-ph

    High-speed multiwavelength photonic temporal integration using silicon photonics

    Authors: Yi Zhang, Nikolaos Farmakidis, Ioannis Roumpos, Miltiadis Moralis-Pegios, Apostolos Tsakyridis, June Sang Lee, Bowei Dong, Yuhan He, Samarth Aggarwal, Nikolaos Pleros, Harish Bhaskaran

    Abstract: Optical systems have been pivotal for energy-efficient computing, performing high-speed, parallel operations in low-loss carriers. While these predominantly analog optical accelerators bypass digitization to perform parallel floating-point computations, scaling optical hardware to map large-vector sizes for AI tasks remains challenging. Here, we overcome this limitation by unfolding scalar operati… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  13. arXiv:2503.12963  [pdf, other

    cs.CV

    Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait

    Authors: Chaolong Yang, Kai Yao, Yuyao Yan, Chenru Jiang, Weiguang Zhao, Jie Sun, Guangliang Cheng, Yifei Zhang, Bin Dong, Kaizhu Huang

    Abstract: Audio-driven single-image talking portrait generation plays a crucial role in virtual reality, digital human creation, and filmmaking. Existing approaches are generally categorized into keypoint-based and image-based methods. Keypoint-based methods effectively preserve character identity but struggle to capture fine facial details due to the fixed points limitation of the 3D Morphable Model. Moreo… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  14. arXiv:2503.11982  [pdf, other

    quant-ph cs.CR

    TetrisLock: Quantum Circuit Split Compilation with Interlocking Patterns

    Authors: Qian Wang, Jayden John, Ben Dong, Yuntao Liu

    Abstract: In quantum computing, quantum circuits are fundamental representations of quantum algorithms, which are compiled into executable functions for quantum solutions. Quantum compilers transform algorithmic quantum circuits into one compatible with target quantum computers, bridging quantum software and hardware. However, untrusted quantum compilers pose significant risks. They can lead to the theft of… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: To appear at DAC 2025

  15. arXiv:2503.07417  [pdf, ps, other

    cs.CV

    GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts

    Authors: Minwen Liao, Hao Bo Dong, Xinyi Wang, Kurban Ubul, Ziyang Yan, Yihua Shao

    Abstract: Low-light enhancement has wide applications in autonomous driving, 3D reconstruction, remote sensing, surveillance, and so on, which can significantly improve information utilization. However, most existing methods lack generalization and are limited to specific tasks such as image recovery. To address these issues, we propose Gated-Mechanism Mixture-of-Experts (GM-MoE), the first framework to int… ▽ More

    Submitted 28 June, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  16. arXiv:2503.07248  [pdf, other

    eess.IV cs.AI cs.CV

    AI-Driven Automated Tool for Abdominal CT Body Composition Analysis in Gastrointestinal Cancer Management

    Authors: Xinyu Nan, Meng He, Zifan Chen, Bin Dong, Lei Tang, Li Zhang

    Abstract: The incidence of gastrointestinal cancers remains significantly high, particularly in China, emphasizing the importance of accurate prognostic assessments and effective treatment strategies. Research shows a strong correlation between abdominal muscle and fat tissue composition and patient outcomes. However, existing manual methods for analyzing abdominal tissue composition are time-consuming and… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  17. arXiv:2503.02988  [pdf, other

    cs.LG

    Out-of-Distribution Generalization on Graphs via Progressive Inference

    Authors: Yiming Xu, Bin Shi, Zhen Peng, Huixiang Liu, Bo Dong, Chen Chen

    Abstract: The development and evaluation of graph neural networks (GNNs) generally follow the independent and identically distributed (i.i.d.) assumption. Yet this assumption is often untenable in practice due to the uncontrollable data generation mechanism. In particular, when the data distribution shows a significant shift, most GNNs would fail to produce reliable predictions and may even make decisions r… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI2025

  18. arXiv:2502.14848  [pdf, other

    cs.CL

    GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks

    Authors: Jianwen Luo, Yiming Huang, Jinxiang Meng, Fangyu Lei, Shizhu He, Xiao Liu, Shanshan Jiang, Bin Dong, Jun Zhao, Kang Liu

    Abstract: Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenar… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 8 pages of main text, 38 pages of appendices

    MSC Class: 68T50 ACM Class: I.2.7

  19. arXiv:2502.11347  [pdf, other

    cs.PF

    Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment

    Authors: Ben Dong, Qian Wang

    Abstract: The increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy. Confidential computing, enabled by Trusted Execution Environments (TEEs), offers a promising solution to mitigate these risks. However, existing TEE implementations, primarily CPU-based, struggle to efficiently support the r… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  20. Middleman Bias in Advertising: Aligning Relevance of Keyphrase Recommendations with Search

    Authors: Soumik Dey, Wei Zhang, Hansi Wu, Bingfeng Dong, Binbin Li

    Abstract: E-commerce sellers are recommended keyphrases based on their inventory on which they advertise to increase buyer engagement (clicks/sales). Keyphrases must be pertinent to items; otherwise, it can result in seller dissatisfaction and poor targeting -- towards that end relevance filters are employed. In this work, we describe the shortcomings of training relevance filter models on biased click/sale… ▽ More

    Submitted 14 February, 2025; v1 submitted 31 January, 2025; originally announced February 2025.

    Journal ref: WWW '25: Companion Proceedings of the ACM on Web Conference 2025

  21. arXiv:2501.16737  [pdf, ps, other

    cs.CV

    Consistency Diffusion Models for Single-Image 3D Reconstruction with Priors

    Authors: Chenru Jiang, Chengrui Zhang, Xi Yang, Jie Sun, Yifei Zhang, Bin Dong, Kaizhu Huang

    Abstract: This paper delves into the study of 3D point cloud reconstruction from a single image. Our objective is to develop the Consistency Diffusion Model, exploring synergistic 2D and 3D priors in the Bayesian framework to ensure superior consistency in the reconstruction process, a challenging yet critical requirement in this field. Specifically, we introduce a pioneering training framework under diffus… ▽ More

    Submitted 31 January, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  22. arXiv:2501.12612  [pdf, other

    cs.CL cs.CR

    T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

    Authors: Lijun Li, Zhelun Shi, Xuhao Hu, Bowen Dong, Yiran Qin, Xihui Liu, Lu Sheng, Jing Shao

    Abstract: Text-to-image (T2I) models have rapidly advanced, enabling the generation of high-quality images from text prompts across various domains. However, these models present notable safety concerns, including the risk of generating harmful, biased, or private content. Current research on assessing T2I safety remains in its early stages. While some efforts have been made to evaluate models on specific s… ▽ More

    Submitted 20 February, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  23. arXiv:2501.01834  [pdf, other

    cs.CV cs.AI

    MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning

    Authors: Pu Yang, Bin Dong

    Abstract: Image captioning is a critical task at the intersection of computer vision and natural language processing, with wide-ranging applications across various domains. For complex tasks such as diagnostic report generation, deep learning models require not only domain-specific image-caption datasets but also the incorporation of relevant general knowledge to provide contextual accuracy. Existing approa… ▽ More

    Submitted 27 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  24. arXiv:2501.01259  [pdf, other

    cs.AR

    Adaptive Hybrid FFT: A Novel Pipeline and Memory-Based Architecture for Radix-$2^k$ FFT in Large Size Processing

    Authors: Fangyu Zhao, Chunhua Xiao, Zhiguo Wang, Xiaohua Du, Bo Dong

    Abstract: In the field of digital signal processing, the fast Fourier transform (FFT) is a fundamental algorithm, with its processors being implemented using either the pipelined architecture, well-known for high-throughput applications but weak in hardware utilization, or the memory-based architecture, designed for area-constrained scenarios but failing to meet stringent throughput requirements. Therefore,… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  25. arXiv:2501.01146  [pdf, ps, other

    cs.CR

    PoVF: Empowering Decentralized Blockchain Systems with Verifiable Function Consensus

    Authors: Chenxi Xiong, Ting Yang, Yu Wang, Bing Dong

    Abstract: Consensus mechanism is the core technology for blockchain to ensure that transactions are executed in sequence. It also determines the decentralization, security, and efficiency of blockchain. Existing mechanisms all have certain centralization issues and fail to ensure the decentralization of blockchain networks. A decentralized and efficient mechanism is required to improve blockchain systems. T… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  26. arXiv:2412.16449  [pdf, other

    cs.LG cs.CR

    CBNN: 3-Party Secure Framework for Customized Binary Neural Networks Inference

    Authors: Benchang Dong, Zhili Chen, Xin Chen, Shiwen Wei, Jie Fu, Huifa Li

    Abstract: Binarized Neural Networks (BNN) offer efficient implementations for machine learning tasks and facilitate Privacy-Preserving Machine Learning (PPML) by simplifying operations with binary values. Nevertheless, challenges persist in terms of communication and accuracy in their application scenarios. In this work, we introduce CBNN, a three-party secure computation framework tailored for efficient BN… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  27. arXiv:2412.15979  [pdf, other

    cs.CV

    MR-GDINO: Efficient Open-World Continual Object Detection

    Authors: Bowen Dong, Zitong Huang, Guanglei Yang, Lei Zhang, Wangmeng Zuo

    Abstract: Open-world (OW) recognition and detection models show strong zero- and few-shot adaptation abilities, inspiring their use as initializations in continual learning methods to improve performance. Despite promising results on seen classes, such OW abilities on unseen classes are largely degenerated due to catastrophic forgetting. To tackle this challenge, we propose an open-world continual object de… ▽ More

    Submitted 23 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

    Comments: Website: https://m1saka.moe/owcod/ . Code is available at: https://github.com/DongSky/MR-GDINO

  28. CLDG: Contrastive Learning on Dynamic Graphs

    Authors: Yiming Xu, Bin Shi, Teng Ma, Bo Dong, Haoyi Zhou, Qinghua Zheng

    Abstract: The graph with complex annotations is the most potent data type, whose constantly evolving motivates further exploration of the unsupervised dynamic graph representation. One of the representative paradigms is graph contrastive learning. It constructs self-supervised signals by maximizing the mutual information between the statistic graph's augmentation views. However, the semantics and labels may… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by ICDE2023

  29. arXiv:2412.12126  [pdf

    cs.DC cs.CV cs.LG eess.IV eess.SP

    Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI

    Authors: Sizhe Xing, Aolong Sun, Chengxi Wang, Yizhi Wang, Boyu Dong, Junhui Hu, Xuyu Deng, An Yan, Yingjun Liu, Fangchen Hu, Zhongya Li, Ouhan Huang, Junhao Zhao, Yingjun Zhou, Ziwei Li, Jianyang Shi, Xi Xiao, Richard Penty, Qixiang Cheng, Nan Chi, Junwen Zhang

    Abstract: The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on exten… ▽ More

    Submitted 1 May, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  30. Class Balance Matters to Active Class-Incremental Learning

    Authors: Zitong Huang, Ze Chen, Yuanze Li, Bowen Dong, Erjin Zhou, Yong Liu, Rick Siow Mong Goh, Chun-Mei Feng, Wangmeng Zuo

    Abstract: Few-Shot Class-Incremental Learning has shown remarkable efficacy in efficient learning new concepts with limited annotations. Nevertheless, the heuristic few-shot annotations may not always cover the most informative samples, which largely restricts the capability of incremental learner. We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for i… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: ACM MM 2024

  31. arXiv:2411.11581  [pdf, other

    cs.CL

    OASIS: Open Agent Social Interaction Simulations with One Million Agents

    Authors: Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao

    Abstract: There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a parti… ▽ More

    Submitted 23 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  32. arXiv:2411.11020  [pdf, other

    cs.LG cs.SI

    Training a Label-Noise-Resistant GNN with Reduced Complexity

    Authors: Rui Zhao, Bin Shi, Zhiming Liang, Jianfei Ruan, Bo Dong, Lu Lin

    Abstract: Graph Neural Networks (GNNs) have been widely employed for semi-supervised node classification tasks on graphs. However, the performance of GNNs is significantly affected by label noise, that is, a small amount of incorrectly labeled nodes can substantially misguide model training. Mainstream solutions define node classification with label noise (NCLN) as a reliable labeling task, often introducin… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  33. arXiv:2411.03990  [pdf, other

    cs.RO cs.CV cs.LG

    ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

    Authors: Chenrui Tie, Yue Chen, Ruihai Wu, Boxuan Dong, Zeyi Li, Chongkai Gao, Hao Dong

    Abstract: Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot m… ▽ More

    Submitted 2 March, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Accept to ICLR 2025

  34. arXiv:2410.13454  [pdf, ps, other

    eess.SY cs.MA

    Byzantine-Resilient Output Optimization of Multiagent via Self-Triggered Hybrid Detection Approach

    Authors: Chenhang Yan, Liping Yan, Yuezu Lv, Bolei Dong, Yuanqing Xia

    Abstract: How to achieve precise distributed optimization despite unknown attacks, especially the Byzantine attacks, is one of the critical challenges for multiagent systems. This paper addresses a distributed resilient optimization for linear heterogeneous multi-agent systems faced with adversarial threats. We establish a framework aimed at realizing resilient optimization for continuous-time systems by in… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  35. arXiv:2410.10878  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    Herald: A Natural Language Annotated Lean 4 Dataset

    Authors: Guoxiong Gao, Yutong Wang, Jiedong Jiang, Qi Gao, Zihan Qin, Tianyi Xu, Bin Dong

    Abstract: Verifiable formal languages like Lean have profoundly impacted mathematical reasoning, particularly through the use of large language models (LLMs) for automated reasoning. A significant challenge in training LLMs for these formal languages is the lack of parallel datasets that align natural language with formal language proofs. To address this challenge, this paper introduces a novel framework fo… ▽ More

    Submitted 27 February, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

  36. arXiv:2410.10150  [pdf, other

    cs.CL cs.AI

    Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting

    Authors: Yifan Luo, Zhennan Zhou, Meitan Wang, Bin Dong

    Abstract: In this paper, we investigate the safety mechanisms of instruction fine-tuned large language models (LLMs). We discover that re-weighting MLP neurons can significantly compromise a model's safety, especially for MLPs in end-of-sentence inferences. We hypothesize that LLMs evaluate the harmfulness of prompts during end-of-sentence inferences, and MLP layers plays a critical role in this process. Ba… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  37. arXiv:2410.03756  [pdf, other

    cs.AI cs.CY cs.DC cs.LG eess.SY

    The Smart Buildings Control Suite: A Diverse Open Source Benchmark to Evaluate and Scale HVAC Control Policies for Sustainability

    Authors: Judah Goldfeder, Victoria Dean, Zixin Jiang, Xuezheng Wang, Bing dong, Hod Lipson, John Sipple

    Abstract: Commercial buildings account for 17% of U.S. carbon emissions, with roughly half of that from Heating, Ventilation, and Air Conditioning (HVAC). HVAC devices form a complex thermodynamic system, and while Model Predictive Control and Reinforcement Learning have been used to optimize control policies, scaling to thousands of buildings remains a significant unsolved challenge. Most current algorithm… ▽ More

    Submitted 31 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  38. arXiv:2409.11323  [pdf, other

    cs.CV cs.LG

    LPT++: Efficient Training on Mixture of Long-tailed Experts

    Authors: Bowen Dong, Pan Zhou, Wangmeng Zuo

    Abstract: We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble. LPT++ enhances frozen Vision Transformers (ViTs) through the integration of three core components. The first is a universal long-tailed adaptation module, which aggregates long-tailed prompts and visual adapters to adapt the pretrained m… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Extended version of arXiv:2210.01033

  39. arXiv:2409.05298  [pdf, other

    cs.CR

    Evaluating Post-Quantum Cryptography on Embedded Systems: A Performance Analysis

    Authors: Ben Dong, Qian Wang

    Abstract: The National Institute of Standards and Technology (NIST) has finalized the selection of post-quantum cryptographic (PQC) algorithms for use in the era of quantum computing. Despite their integration into TLS protocol for key establishment and signature generation, there is limited study on profiling these newly standardized algorithms in resource-constrained communication systems. In this work, w… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  40. arXiv:2408.16749  [pdf

    cs.CL cs.AI

    Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge

    Authors: Beidi Dong, Jin R. Lee, Ziwei Zhu, Balassubramanian Srinivasan

    Abstract: The United States has experienced a significant increase in violent extremism, prompting the need for automated tools to detect and limit the spread of extremist ideology online. This study evaluates the performance of Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) in detecting and classifying online domestic extremist posts. We collect… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  41. arXiv:2408.14032  [pdf, other

    cs.CV

    More Pictures Say More: Visual Intersection Network for Open Set Object Detection

    Authors: Bingcheng Dong, Yuning Ding, Jinrong Zhang, Sifan Zhang, Shenglan Liu

    Abstract: Open Set Object Detection has seen rapid development recently, but it continues to pose significant challenges. Language-based methods, grappling with the substantial modal disparity between textual and visual modalities, require extensive computational resources to bridge this gap. Although integrating visual prompts into these frameworks shows promise for enhancing performance, it always comes w… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 7pages

  42. arXiv:2408.13836  [pdf, other

    cs.CV cs.AI

    PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images

    Authors: Zifan Chen, Xinyu Nan, Jiazheng Li, Jie Zhao, Haifeng Li, Ziling Lin, Haoshen Li, Heyun Chen, Yiting Liu, Lei Tang, Li Zhang, Bin Dong

    Abstract: Volumetric segmentation is important in medical imaging, but current methods face challenges like requiring lots of manual annotations and being tailored to specific tasks, which limits their versatility. General segmentation models used for natural images don't perform well with the unique features of medical images. There's a strong need for an adaptable approach that can effectively handle diff… ▽ More

    Submitted 25 October, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 28 pages, 6 figures

  43. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advances in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns… ▽ More

    Submitted 7 February, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 3DV 2025

  44. arXiv:2407.21314  [pdf, other

    cs.LG stat.ML

    State-observation augmented diffusion model for nonlinear assimilation with unknown dynamics

    Authors: Zhuoyuan Li, Bin Dong, Pingwen Zhang

    Abstract: Data assimilation has become a key technique for combining physical models with observational data to estimate state variables. However, classical assimilation algorithms often struggle with the high nonlinearity present in both physical and observational models. To address this challenge, a novel generative model, termed the State-Observation Augmented Diffusion (SOAD) model is proposed for data-… ▽ More

    Submitted 7 February, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

    MSC Class: 49N45; 60J60; 62F15; 68T20

  45. arXiv:2407.09521  [pdf, other

    cs.CV cs.NE

    Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

    Authors: Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang

    Abstract: We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conven… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  46. arXiv:2406.10878  [pdf, other

    cs.AI cs.CL

    Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions

    Authors: Yiming Tang, Bin Dong

    Abstract: Large language models (LLMs) benefit greatly from prompt engineering, with in-context learning standing as a pivital technique. While former approaches have provided various ways to construct the demonstrations used for in-context learning, they often ignore the inherent heterogeneity within datasets, applying the same demonstrations to all reasoning questions. We observed that the effectiveness o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  47. arXiv:2405.05714  [pdf, other

    cs.CV cs.LG

    Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

    Authors: Rui Zhao, Bin Shi, Jianfei Ruan, Tianze Pan, Bo Dong

    Abstract: In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize… ▽ More

    Submitted 2 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  48. arXiv:2405.03136  [pdf, other

    cs.CR

    FOBNN: Fast Oblivious Binarized Neural Network Inference

    Authors: Xin Chen, Zhili Chen, Benchang Dong, Shiwen Wei, Lin Chen, Daojing He

    Abstract: The superior performance of deep learning has propelled the rise of Deep Learning as a Service, enabling users to transmit their private data to service providers for model execution and inference retrieval. Nevertheless, the primary concern remains safeguarding the confidentiality of sensitive user data while optimizing the efficiency of secure protocols. To address this, we develop a fast oblivi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  49. arXiv:2404.16331  [pdf, other

    cs.CV cs.AI

    IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

    Authors: Zitong Huang, Ze Chen, Bowen Dong, Chaoqi Liang, Erjin Zhou, Wangmeng Zuo

    Abstract: Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning, and 2) performing model averaging in the early epochs of training yields a greater performance improvement than doing that in later epochs. Inspired by these t… ▽ More

    Submitted 4 December, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  50. arXiv:2404.02823  [pdf, other

    cs.CL cs.AI cs.LG

    Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models

    Authors: Haoran Sun, Lixin Liu, Junjie Li, Fengyu Wang, Baohua Dong, Ran Lin, Ruohui Huang

    Abstract: The ability of large language models (LLMs) to follow instructions is crucial to real-world applications. Despite recent advances, several studies have highlighted that LLMs struggle when faced with challenging instructions, especially those that include complex constraints, hindering their effectiveness in various tasks. To address this challenge, we introduce Conifer, a novel instruction tuning… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.