Skip to main content

Showing 1–50 of 357 results for author: Dai, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01464  [pdf, ps, other

    cs.IT

    Coding for Quasi-Static Fading Channel with Imperfect CSI at the Transmitter and Quantized Feedback

    Authors: Yuhan Yang, Mei Han, Haonan Zhang, Haoheng Yuan, Fan Cheng, Bin Dai

    Abstract: The classical Schalkwijk-Kailath (SK) scheme for the additive Gaussian noise channel with noiseless feedback is highly efficient since its coding complexity is extremely low and the decoding error doubly exponentially decays as the coding blocklength tends to infinity. However, its application to the fading channel with imperfect CSI at the transmitter (I-CSIT) is challenging since the SK scheme i… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 7 pages, 6 figures, conference, this paper will be presented at the 2025 IEEE ITW

  2. arXiv:2507.00942  [pdf, ps, other

    cs.IT

    Optimal Feedback Schemes for Dirty Paper Channels With State Estimation at the Receiver

    Authors: Dengfeng Xia, Han Deng, Haonan Zhang, Fan Cheng, Bin Dai, Liuguo Yin

    Abstract: In the literature, it has been shown that feedback does not increase the optimal rate-distortion region of the dirty paper channel with state estimation at the receiver (SE-R). On the other hand, it is well-known that feedback helps to construct low-complexity coding schemes in Gaussian channels, such as the elegant Schalkwijk-Kailath (SK) feedback scheme. This motivates us to explore capacity-ach… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: This paper will be presented at the 2025 IEEE Information Theory Workshop (ITW)

  3. arXiv:2506.22401  [pdf, ps, other

    cs.LG math.OC

    Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

    Authors: Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

    Abstract: Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  4. arXiv:2506.18613  [pdf, ps, other

    cs.IT

    A Simple but Accurate Approximation for Multivariate Gaussian Rate-Distortion Function and Its Application in Maximal Coding Rate Reduction

    Authors: Zhenglin Huang, Qifa Yan, Bin Dai, Xiaohu Tang

    Abstract: The multivariate Gaussian rate-distortion (RD) function is crucial in various applications, such as digital communications, data storage, or neural networks. However, the complex form of the multivariate Gaussian RD function prevents its application in many neural network-based scenarios that rely on its analytical properties, for example, white-box neural networks, multi-device task-oriented comm… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 15 pages, 9 figures. The article has been accepted by Tsinghua Science and Technology

  5. arXiv:2506.15190  [pdf, ps, other

    cs.LG q-bio.NC

    Learning Task-Agnostic Skill Bases to Uncover Motor Primitives in Animal Behaviors

    Authors: Jiyi Wang, Jingyang Ke, Bo Dai, Anqi Wu

    Abstract: Animals flexibly recombine a finite set of core motor primitives to meet diverse task demands, but existing behavior-segmentation methods oversimplify this process by imposing discrete syllables under restrictive generative assumptions. To reflect the animal behavior generation procedure, we introduce skill-based imitation learning (SKIL) for behavior understanding, a reinforcement learning-based… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 9 pages and 4 figures for the main text

  6. arXiv:2506.14758  [pdf, ps, other

    cs.CL

    Reasoning with Exploration: An Entropy Perspective

    Authors: Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Wayne Xin Zhao, Zhenliang Zhang, Furu Wei

    Abstract: Balancing exploration and exploitation is a central goal in reinforcement learning (RL). Despite recent advances in enhancing language model (LM) reasoning, most methods lean toward exploitation, and increasingly encounter performance plateaus. In this work, we revisit entropy -- a signal of exploration in RL -- and examine its relationship to exploratory reasoning in LMs. Through empirical analys… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  7. arXiv:2506.05341  [pdf, ps, other

    cs.CV cs.AI

    Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

    Authors: Xingjian Ran, Yixuan Li, Linning Xu, Mulin Yu, Bo Dai

    Abstract: Realistic 3D indoor scene synthesis is vital for embodied AI and digital content creation. It can be naturally divided into two subtasks: object generation and layout generation. While recent generative models have significantly advanced object-level quality and controllability, layout generation remains challenging due to limited datasets. Existing methods either overfit to these datasets or rely… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Project Page: https://directlayout.github.io/

  8. arXiv:2506.01524  [pdf, ps, other

    cs.CL cs.AI

    V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat

    Authors: Qi Lin, Weikai Xu, Lisi Chen, Bin Dai

    Abstract: With the continued proliferation of Large Language Model (LLM) based chatbots, there is a growing demand for generating responses that are not only linguistically fluent but also consistently aligned with persona-specific traits in conversations. However, existing role-play and persona-based chat approaches rely heavily on static role descriptions, coarse-grained signal space, and low-quality synt… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  9. arXiv:2505.23716  [pdf, ps, other

    cs.CV

    AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

    Authors: Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, Bo Dai

    Abstract: We introduce AnySplat, a feed forward network for novel view synthesis from uncalibrated image collections. In contrast to traditional neural rendering pipelines that demand known camera poses and per scene optimization, or recent feed forward methods that buckle under the computational weight of dense views, our model predicts everything in one shot. A single forward pass yields a set of 3D Gauss… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page: https://city-super.github.io/anysplat/

  10. arXiv:2505.21501  [pdf, ps, other

    cs.CV

    Vision Transformers with Self-Distilled Registers

    Authors: Yinjie Chen, Zipeng Yan, Chong Zhou, Bo Dai, Andrew F. Luo

    Abstract: Vision Transformers (ViTs) have emerged as the dominant architecture for visual processing tasks, demonstrating excellent scalability with increased training data and model size. However, recent work has identified the emergence of artifact tokens in ViTs that are incongruous with the local semantics. These anomalous tokens degrade ViT performance in tasks that require fine-grained localization or… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 27 pages, 14 figures

  11. arXiv:2505.21483  [pdf, ps, other

    cs.CV

    MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation

    Authors: Kerui Ren, Jiayang Bai, Linning Xu, Lihan Jiang, Jiangmiao Pang, Mulin Yu, Bo Dai

    Abstract: Object compositing offers significant promise for augmented reality (AR) and embodied intelligence applications. Existing approaches predominantly focus on single-image scenarios or intrinsic decomposition techniques, facing challenges with multi-view consistency, complex scenes, and diverse lighting conditions. Recent inverse rendering advancements, such as 3D Gaussian and diffusion-based methods… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  12. arXiv:2505.21439  [pdf, other

    cs.CL cs.IR

    Towards Better Instruction Following Retrieval Models

    Authors: Yuchen Zhuang, Aaron Trinh, Rushi Qiang, Haotian Sun, Chao Zhang, Hanjun Dai, Bo Dai

    Abstract: Modern information retrieval (IR) models, trained exclusively on standard <query, passage> pairs, struggle to effectively interpret and follow explicit user instructions. We introduce InF-IR, a large-scale, high-quality training corpus tailored for enhancing retrieval models in Instruction-Following IR. InF-IR expands traditional training pairs into over 38,000 expressive <instruction, query, pass… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Retrieval Models, Embedding, Retrieval with Instructions

  13. arXiv:2505.20613  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.LO

    REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

    Authors: Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, Bin Dong

    Abstract: Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (… ▽ More

    Submitted 16 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  14. arXiv:2505.20282  [pdf, ps, other

    cs.CL

    One-shot Entropy Minimization

    Authors: Zitian Gao, Lynx Chen, Joey Zhou, Bryan Dai

    Abstract: We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large… ▽ More

    Submitted 3 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Work in progress

  15. arXiv:2505.20267  [pdf, ps, other

    cs.CV

    HaloGS: Loose Coupling of Compact Geometry and Gaussian Splats for 3D Scenes

    Authors: Changjian Jiang, Kerui Ren, Linning Xu, Jiong Chen, Jiangmiao Pang, Yu Zhang, Bo Dai, Mulin Yu

    Abstract: High fidelity 3D reconstruction and rendering hinge on capturing precise geometry while preserving photo realistic detail. Most existing methods either fuse these goals into a single cumbersome model or adopt hybrid schemes whose uniform primitives lead to a trade off between efficiency and fidelity. In this paper, we introduce HaloGS, a dual representation that loosely couples coarse triangles fo… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  16. arXiv:2505.18983  [pdf, other

    cs.LG cs.CV

    AmorLIP: Efficient Language-Image Pretraining via Amortization

    Authors: Haotian Sun, Yitong Li, Yuchen Zhuang, Niao He, Hanjun Dai, Bo Dai

    Abstract: Contrastive Language-Image Pretraining (CLIP) has demonstrated strong zero-shot performance across diverse downstream text-image tasks. Existing CLIP methods typically optimize a contrastive objective using negative samples drawn from each minibatch. To achieve robust representation learning, these methods require extremely large batch sizes and escalate computational demands to hundreds or even t… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  17. arXiv:2505.15725  [pdf, ps, other

    cs.RO cs.CV

    UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

    Authors: Xiangyu Wang, Donglin Yang, Yue Liao, Wenhao Zheng, wenjun wu, Bin Dai, Hongsheng Li, Si Liu

    Abstract: Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions.… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  18. arXiv:2505.12748  [pdf, ps, other

    cs.RO cs.AI cs.CV

    TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation

    Authors: Hangyu Li, Qin Zhao, Haoran Xu, Xinyu Jiang, Qingwei Ben, Feiyu Jia, Haoyu Zhao, Liang Xu, Jia Zeng, Hanqing Wang, Bo Dai, Junting Dong, Jiangmiao Pang

    Abstract: Teleoperation is a cornerstone of embodied-robot learning, and bimanual dexterous teleoperation in particular provides rich demonstrations that are difficult to obtain with fully autonomous systems. While recent studies have proposed diverse hardware pipelines-ranging from inertial motion-capture gloves to exoskeletons and vision-based interfaces-there is still no unified benchmark that enables fa… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 13 pages

  19. arXiv:2505.11468  [pdf, other

    cs.CV

    PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment

    Authors: Dingbang Huang, Wenbo Li, Yifei Zhao, Xinyu Pan, Yanhong Zeng, Bo Dai

    Abstract: Diffusion models have made remarkable advancements in generating high-quality images from textual descriptions. Recent works like LayerDiffuse have extended the previous single-layer, unified image generation paradigm to transparent image layer generation. However, existing multi-layer generation methods fail to handle the interactions among multiple layers such as rational global layout, physics-… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Project Page: https://github.com/dingbang777/PSDiffusion/

  20. arXiv:2505.07782  [pdf, ps, other

    cs.LG

    MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

    Authors: Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

    Abstract: We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experimen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  21. Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes

    Authors: Xijie Yang, Linning Xu, Lihan Jiang, Dahua Lin, Bo Dai

    Abstract: 3D Gaussian Splatting (3DGS) enables the reconstruction of intricate digital 3D assets from multi-view images by leveraging a set of 3D Gaussian primitives for rendering. Its explicit and discrete representation facilitates the seamless composition of complex digital worlds, offering significant advantages over previous neural implicit methods. However, when applied to large-scale compositions, su… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: project page: https://xijie-yang.github.io/V3DG/

  22. arXiv:2505.03155  [pdf, ps, other

    cs.LG

    Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

    Authors: Max Qiushi Lin, Jincheng Mei, Matin Aghaei, Michael Lu, Bo Dai, Alekh Agarwal, Dale Schuurmans, Csaba Szepesvari, Sharan Vaswani

    Abstract: Policy gradient (PG) methods have played an essential role in the empirical successes of reinforcement learning. In order to handle large state-action spaces, PG methods are typically used with function approximation. In this setting, the approximation error in modeling problem-dependent quantities is a key notion for characterizing the global convergence of PG methods. We focus on Softmax PG with… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 75 pages

  23. arXiv:2504.16667  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Representation Learning via Non-Contrastive Mutual Information

    Authors: Zhaohan Daniel Guo, Bernardo Avila Pires, Khimya Khetarpal, Dale Schuurmans, Bo Dai

    Abstract: Labeling data is often very time consuming and expensive, leaving us with a majority of unlabeled data. Self-supervised representation learning methods such as SimCLR (Chen et al., 2020) or BYOL (Grill et al., 2020) have been very successful at learning meaningful latent representations from unlabeled image data, resulting in much more general and transferable representations for downstream tasks.… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    ACM Class: I.2.6; I.2.10

  24. arXiv:2504.09546  [pdf, other

    physics.ed-ph cs.AI

    A simulation-heuristics dual-process model for intuitive physics

    Authors: Shiqian Li, Yuxi Ma, Jiajun Yan, Bo Dai, Yujia Peng, Chi Zhang, Yixin Zhu

    Abstract: The role of mental simulation in human physical reasoning is widely acknowledged, but whether it is employed across scenarios with varying simulation costs and where its boundary lies remains unclear. Using a pouring-marble task, our human study revealed two distinct error patterns when predicting pouring angles, differentiated by simulation time. While mental simulation accurately captured human… ▽ More

    Submitted 19 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: 8 pages, CogSci 2025

  25. arXiv:2504.04126  [pdf, other

    cs.CV cs.AI

    Multi-identity Human Image Animation with Structural Video Diffusion

    Authors: Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Yuwei Guo, Dahua Lin, Tianfan Xue, Bo Dai

    Abstract: Generating human videos from a single image while ensuring high visual quality and precise control is a challenging task, especially in complex scenarios involving multiple individuals and interactions with objects. Existing methods, while effective for single-human cases, often fail to handle the intricacies of multi-identity interactions because they struggle to associate the correct pairs of hu… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: 11 pages

  26. arXiv:2504.03279  [pdf, other

    cs.DB

    Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees

    Authors: Qichen Wang, Bingnan Chen, Binyang Dai, Ke Yi, Feifei Li, Liang Lin

    Abstract: Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden consta… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Technical report for the SIGMOD 2025 paper

    ACM Class: H.2.4

  27. arXiv:2504.02130  [pdf, other

    cs.LG

    Ordering-based Conditions for Global Convergence of Policy Gradient Methods

    Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans

    Abstract: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. textcolor{blue}{First}, we establish a few key observations that frame the study: \textbf{(i)} Global convergence can be achieved under linear function approximation without policy or r… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: arXiv version for the NeurIPS 2023 paper; to be updated for a technical issue

  28. arXiv:2503.19901  [pdf, other

    cs.CV

    TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

    Authors: Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang

    Abstract: Synthesizing diverse and physically plausible Human-Scene Interactions (HSI) is pivotal for both computer animation and embodied AI. Despite encouraging progress, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of m… ▽ More

    Submitted 3 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  29. arXiv:2503.17662  [pdf, other

    cs.CL

    Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning

    Authors: Ke Ji, Yixin Lian, Linxu Li, Jingsheng Gao, Weiyuan Li, Bin Dai

    Abstract: In recent years, large language models (LLMs) have achieved breakthrough progress in many dialogue generation tasks. However, their lack of emotion and fine-grained role awareness limits the model's ability to provide personalized and diverse interactions further. Current methods face high costs in collecting high-quality annotated data for scenarios such as role-playing, and traditional human ali… ▽ More

    Submitted 25 March, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: 18 pages, 4 figures

  30. arXiv:2503.17227  [pdf, other

    cs.RO

    Control the Soft Robot Arm with its Physical Twin

    Authors: Qinghua Guan, Hung Hon Cheng, Benhui Dai, Josie Hughes

    Abstract: To exploit the compliant capabilities of soft robot arms we require controller which can exploit their physical capabilities. Teleoperation, leveraging a human in the loop, is a key step towards achieving more complex control strategies. Whilst teleoperation is widely used for rigid robots, for soft robots we require teleoperation methods where the configuration of the whole body is considered. We… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  31. arXiv:2503.13424  [pdf, other

    cs.CV

    Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation

    Authors: Xinyu Lian, Zichao Yu, Ruiming Liang, Yitong Wang, Li Ray Luo, Kaixu Chen, Yuanzhen Zhou, Qihong Tang, Xudong Xu, Zhaoyang Lyu, Bo Dai, Jiangmiao Pang

    Abstract: Large-scale articulated objects with high quality are desperately needed for multiple tasks related to embodied AI. Most existing methods for creating articulated objects are either data-driven or simulation based, which are limited by the scale and quality of the training data or the fidelity and heavy labour of the simulation. In this paper, we propose Infinite Mobility, a novel method for synth… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Project page: https://infinite-mobility.github.io 10 pages,12 figures

  32. arXiv:2503.08180  [pdf, other

    cs.CV

    Towards Synthesized and Editable Motion In-Betweening Through Part-Wise Phase Representation

    Authors: Minyue Dai, Jingbo Wang, Ke Fan, Bin Ji, Haoyu Zhao, Junting Dong, Bo Dai

    Abstract: Styled motion in-betweening is crucial for computer animation and gaming. However, existing methods typically encode motion styles by modeling whole-body motions, often overlooking the representation of individual body parts. This limitation reduces the flexibility of infilled motion, particularly in adjusting the motion styles of specific limbs independently. To overcome this challenge, we propos… ▽ More

    Submitted 12 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 10 pages, 5 figures

  33. arXiv:2502.20650  [pdf, other

    cs.CV cs.CR

    Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

    Authors: Yu Pan, Jiahao Chen, Bingrong Dai, Lin Wang, Yi Du, Jiao Liu

    Abstract: In recent years, Diffusion Models (DMs) have demonstrated significant advances in the field of image generation. However, according to current research, DMs are vulnerable to backdoor attacks, which allow attackers to control the model's output by inputting data containing covert triggers, such as a specific visual patch or phrase. Existing defense strategies are well equipped to thwart such attac… ▽ More

    Submitted 8 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  34. arXiv:2502.14768  [pdf, other

    cs.CL cs.AI

    Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

    Authors: Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo

    Abstract: Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  35. arXiv:2502.09780  [pdf, ps, other

    cs.LG cs.AI cs.GT math.OC

    Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games

    Authors: Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

    Abstract: Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However,… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  36. arXiv:2502.07141  [pdf, other

    cs.LG

    Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

    Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant Raj, Csaba Szepesvari, Dale Schuurmans

    Abstract: We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Updated version for a paper published at NeurIPS 2024

  37. arXiv:2502.06957  [pdf, other

    cs.CV

    GAS: Generative Avatar Synthesis from a Single Image

    Authors: Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, Fernando De la Torre

    Abstract: We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image, addressing the challenging problem of single-image avatar generation. While recent methods employ diffusion models conditioned on human templates like depth or normal maps, they often struggle to preserve appearance information due to the discrepancy between sparse… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  38. arXiv:2502.00361  [pdf, ps, other

    cs.LG

    Efficient Online Reinforcement Learning for Diffusion Policy

    Authors: Haitong Ma, Tianyi Chen, Kai Wang, Na Li, Bo Dai

    Abstract: Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs hu… ▽ More

    Submitted 29 June, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: 17 pages, 5 figures

  39. arXiv:2501.07256  [pdf, other

    cs.CV

    EdgeTAM: On-Device Track Anything Model

    Authors: Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran

    Abstract: On top of Segment Anything Model (SAM), SAM 2 further extends its capability from image to video inputs through a memory bank mechanism and obtains a remarkable performance compared with previous methods, making it a foundation model for video segmentation task. In this paper, we aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performan… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: Code will be released at https://github.com/facebookresearch/EdgeTAM

  40. arXiv:2501.01393  [pdf, other

    cs.CV cs.GR

    Learning 3D Garment Animation from Trajectories of A Piece of Cloth

    Authors: Yidi Shao, Chen Change Loy, Bo Dai

    Abstract: Garment animation is ubiquitous in various applications, such as virtual reality, gaming, and film producing. Recently, learning-based approaches obtain compelling performance in animating diverse garments under versatile scenarios. Nevertheless, to mimic the deformations of the observed garments, data-driven methods require large scale of garment data, which are both resource-wise expensive and t… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted by NeurIPS2024, 16 pages

  41. arXiv:2501.00756  [pdf, other

    cs.LG

    FasterSTS: A Faster Spatio-Temporal Synchronous Graph Convolutional Networks for Traffic flow Forecasting

    Authors: Ben-Ao Dai, Nengchao Lyu, Yongchao Miao

    Abstract: Accurate traffic flow prediction heavily relies on the spatio-temporal correlation of traffic flow data. Most current studies separately capture correlations in spatial and temporal dimensions, making it difficult to capture complex spatio-temporal heterogeneity, and often at the expense of increasing model complexity to improve prediction accuracy. Although there have been groundbreaking attempts… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 13pages,3 figures

  42. arXiv:2412.17804  [pdf, other

    cs.CV cs.GR

    GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects

    Authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai

    Abstract: We introduce GausSim, a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter, accounting for realistic deformations without idealized assumptions. To improve computational effi… ▽ More

    Submitted 10 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Project page: https://www.mmlab-ntu.com/project/gausim/index.html

  43. arXiv:2412.17635  [pdf, other

    cs.CV

    LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

    Authors: Hao Li, Roy Qin, Zhengyu Zou, Diqi He, Bohan Li, Bingquan Dai, Dingewn Zhang, Junwei Han

    Abstract: Applying Gaussian Splatting to perception tasks for 3D scene understanding is becoming increasingly popular. Most existing works primarily focus on rendering 2D feature maps from novel viewpoints, which leads to an imprecise 3D language field with outlier languages, ultimately failing to align objects in 3D space. By utilizing masked images for feature extraction, these approaches also lack essent… ▽ More

    Submitted 23 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: \url{https://langsurf.github.io}

  44. arXiv:2412.16456  [pdf, other

    cs.RO

    Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

    Authors: Xuemin Chi, Yiming Li, Jihao Huang, Bolun Dai, Zhitao Liu, Sylvain Calinon

    Abstract: Generating collision-free motions in dynamic environments is a challenging problem for high-dimensional robotics, particularly under real-time constraints. Control Barrier Functions (CBFs), widely utilized in safety-critical control, have shown significant potential for motion generation. However, for high-dimensional robot manipulators, existing QP formulations and CBF-based methods rely on posit… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: 8 pages, 5 figures

  45. arXiv:2412.15287  [pdf, other

    cs.CL cs.AI cs.LG

    Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

    Authors: Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust

    Abstract: Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  46. arXiv:2412.14559  [pdf, other

    cs.CV cs.LG

    ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

    Authors: Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, Ruimao Zhang

    Abstract: The scaling law has been validated in various domains, such as natural language processing (NLP) and massive computer vision tasks; however, its application to motion generation remains largely unexplored. In this paper, we introduce a scalable motion generation framework that includes the motion tokenizer Motion FSQ-VAE and a text-prefix autoregressive transformer. Through comprehensive experimen… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

  47. arXiv:2412.08685  [pdf, other

    cs.CV

    ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes

    Authors: Yuxi Wei, Jingbo Wang, Yuwen Du, Dingju Wang, Liang Pan, Chenxin Xu, Yao Feng, Bo Dai, Siheng Chen

    Abstract: Generating realistic and interactive dynamics of traffic participants according to specific instruction is critical for street scene simulation. However, there is currently a lack of a comprehensive method that generates realistic dynamics of different types of participants including vehicles and pedestrians, with different kinds of interactions between them. In this paper, we introduce ChatDyn, t… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  48. arXiv:2412.08331  [pdf, other

    cs.CV

    SLGaussian: Fast Language Gaussian Splatting in Sparse Views

    Authors: Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang

    Abstract: 3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forwa… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  49. arXiv:2412.07660  [pdf, other

    cs.CV

    Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians

    Authors: Yixuan Li, Xingjian Ran, Linning Xu, Tao Lu, Mulin Yu, Zhenzhi Wang, Yuanbo Xiangli, Dahua Lin, Bo Dai

    Abstract: Buildings are primary components of cities, often featuring repeated elements such as windows and doors. Traditional 3D building asset creation is labor-intensive and requires specialized skills to develop design rules. Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability. Drawing inspiration from procedural modeling t… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Project page: https://city-super.github.io/procgs/

  50. arXiv:2412.06275  [pdf, other

    cs.IT

    Performance Analysis and Code Design for Resistive Random-Access Memory Using Channel Decomposition Approach

    Authors: Guanghui Song, Meiru Gao, Ying Li, Bin Dai, Kui Cai

    Abstract: A novel framework for performance analysis and code design is proposed to address the sneak path (SP) problem in resistive random-access memory (ReRAM) arrays. The main idea is to decompose the ReRAM channel, which is both non-ergodic and data-dependent, into multiple stationary memoryless channels. A finite-length performance bound is derived by analyzing the capacity and dispersion of these stat… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.