Skip to main content

Showing 1–50 of 1,575 results for author: Chen, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10556  [pdf

    cs.LG physics.ao-ph

    An AI-driven framework for the prediction of personalised health response to air pollution

    Authors: Nazanin Zounemat Kermani, Sadjad Naderi, Claire H. Dilliway, Claire E. Heaney, Shrreya Behll, Boyang Chen, Hisham Abubakar-Waziri, Alexandra E. Porter, Marc Chadeau-Hyam, Fangxin Fang, Ian M. Adcock, Kian Fan Chung, Christopher C. Pain

    Abstract: Air pollution poses a significant threat to public health, causing or exacerbating many respiratory and cardiovascular diseases. In addition, climate change is bringing about more extreme weather events such as wildfires and heatwaves, which can increase levels of pollution and worsen the effects of pollution exposure. Recent advances in personal sensing have transformed the collection of behaviou… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Kermani and Naderi share first authorship. 20 pages, 6 figures and 1 table

  2. arXiv:2505.09986  [pdf, other

    cs.CV eess.IV

    High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

    Authors: Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terres… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs

    Authors: Yi Gui, Yao Wan, Zhen Li, Zhongyi Zhang, Dongping Chen, Hongyu Zhang, Yi Su, Bohua Chen, Xing Zhou, Wenbin Jiang, Xiangliang Zhang

    Abstract: Automating the synthesis of User Interfaces (UIs) plays a crucial role in enhancing productivity and accelerating the development lifecycle, reducing both development time and manual effort. Recently, the rapid development of Multimodal Large Language Models (MLLMs) has made it possible to generate front-end Hypertext Markup Language (HTML) code directly from webpage designs. However, real-world w… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: WWW' 2025

  4. arXiv:2505.08189  [pdf, other

    cs.LG cs.AI

    DSADF: Thinking Fast and Slow for Decision Making

    Authors: Alex Zhihao Dou, Dongfei Cui, Jun Yan, Weida Wang, Benteng Chen, Haoming Wang, Zeke Xie, Shufei Zhang

    Abstract: Although Reinforcement Learning (RL) agents are effective in well-defined environments, they often struggle to generalize their learned policies to dynamic settings due to their reliance on trial-and-error interactions. Recent work has explored applying Large Language Models (LLMs) or Vision Language Models (VLMs) to boost the generalization of RL agents through policy optimization guidance or pri… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.07861  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

    Authors: Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi

    Abstract: Due to long generations, large language model (LLM) math reasoning demands significant computational resources and time. While many existing efficient inference methods have been developed with excellent performance preservation on language tasks, they often severely degrade math performance. In this paper, we propose Caprese, a low-cost distillation method to recover lost capabilities from deploy… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  6. arXiv:2505.07715  [pdf, ps, other

    cs.CV cs.AI

    Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

    Authors: Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan

    Abstract: Event-based object detection has gained increasing attention due to its advantages such as high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based ob… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.07209  [pdf, other

    cs.CV

    Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

    Authors: Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu

    Abstract: Concept Bottleneck Models (CBMs) try to make the decision-making process transparent by exploring an intermediate concept space between the input image and the output prediction. Existing CBMs just learn coarse-grained relations between the whole image and the concepts, less considering local image information, leading to two main drawbacks: i) they often produce spurious visual-concept relations,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  8. arXiv:2505.06537  [pdf, ps, other

    cs.CV cs.AI

    ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

    Authors: Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Anyi Rao, Biaolong Chen, Aixi Zhang, Si Liu, Hao Jiang

    Abstract: Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives.… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  9. arXiv:2505.05870  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Towards Facial Image Compression with Consistency Preserving Diffusion Prior

    Authors: Yimin Zhou, Yichong Xia, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reco… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  10. arXiv:2505.04410  [pdf, other

    cs.CV

    DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

    Authors: Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian

    Abstract: Dense visual prediction tasks have been constrained by their reliance on predefined categories, limiting their applicability in real-world scenarios where visual concepts are unbounded. While Vision-Language Models (VLMs) like CLIP have shown promise in open-vocabulary tasks, their direct application to dense prediction often leads to suboptimal performance due to limitations in local feature repr… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  11. arXiv:2505.02705  [pdf, other

    eess.IV cs.CV

    Multi-View Learning with Context-Guided Receptance for Image Denoising

    Authors: Binghong Chen, Tingting Chai, Wei Jiang, Yuanrong Xu, Guanglu Zhou, Xiangqian Wu

    Abstract: Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (\M) model is proposed, combining enhanced multi-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025, code will be available at https://github.com/Seeker98/CRWKV

  12. arXiv:2505.01744  [pdf, ps, other

    cs.LG

    Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

    Authors: Yezhen Wang, Zhouhao Yang, Brian K Chen, Fanyi Pu, Bo Li, Tianyu Gao, Kenji Kawaguchi

    Abstract: Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-ran… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  13. arXiv:2505.01490  [pdf, other

    cs.CV

    WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

    Authors: Daoan Zhang, Che Jiang, Ruoshi Xu, Biaoxiang Chen, Zijian Jin, Yutian Lu, Jianguo Zhang, Liang Yong, Jiebo Luo, Shengda Luo

    Abstract: Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models still struggle with prompts that require rich world knowledge and implicit reasoning: both of which are critical for producing semantically accurate, coherent, and contextually appropriate images in real-world scenarios. To address this gap, we introduce \textbf{WorldGenBench}, a benchmark desig… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  14. arXiv:2504.21263  [pdf, other

    cs.CV cs.LG cs.MM

    Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

    Authors: Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia

    Abstract: Visual In-Context Learning (VICL) enables adaptively solving vision tasks by leveraging pixel demonstrations, mimicking human-like task completion through analogy. Prompt selection is critical in VICL, but current methods assume the existence of a single "ideal" prompt in a pool of candidates, which in practice may not hold true. Multiple suitable prompts may exist, but individually they often fal… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR'25. 10 pages, 5 figures, 6 tables

  15. arXiv:2504.20829  [pdf, other

    cs.CV cs.AI

    GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion

    Authors: Jiaxin Hong, Sixu Chen, Shuoyang Sun, Hongyao Yu, Hao Fang, Yuqi Tan, Bin Chen, Shuhan Qi, Jiawei Li

    Abstract: As 3D Gaussian Splatting (3DGS) emerges as a breakthrough in scene representation and novel view synthesis, its rapid adoption in safety-critical domains (e.g., autonomous systems, AR/VR) urgently demands scrutiny of potential security vulnerabilities. This paper presents the first systematic study of backdoor threats in 3DGS pipelines. We identify that adversaries may implant backdoor views to in… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  16. arXiv:2504.20609  [pdf, other

    cs.CL

    WenyanGPT: A Large Language Model for Classical Chinese Tasks

    Authors: Xinyu Yao, Mengdi Wang, Bo Chen, Xiaobing Zhao

    Abstract: Classical Chinese, as the core carrier of Chinese culture, plays a crucial role in the inheritance and study of ancient literature. However, existing natural language processing models primarily optimize for Modern Chinese, resulting in inadequate performance on Classical Chinese. This paper presents a comprehensive solution for Classical Chinese language processing. By continuing pre-training and… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  17. arXiv:2504.20471  [pdf, other

    cs.LG cs.AI stat.ME

    The Estimation of Continual Causal Effect for Dataset Shifting Streams

    Authors: Baining Chen, Yiming Zhang, Yuqiao Han, Ruyue Zhang, Ruihuan Du, Zhishuo Zhou, Zhengdan Zhu, Xun Liu, Jiecheng Guo

    Abstract: Causal effect estimation has been widely used in marketing optimization. The framework of an uplift model followed by a constrained optimization algorithm is popular in practice. To enhance performance in the online environment, the framework needs to be improved to address the complexities caused by temporal dataset shift. This paper focuses on capturing the dataset shift from user behavior and d… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  18. arXiv:2504.19687  [pdf, other

    cs.CV

    Prompt Guiding Multi-Scale Adaptive Sparse Representation-driven Network for Low-Dose CT MAR

    Authors: Baoshun Shi, Bing Chen, Shaolei Zhang, Huazhu Fu, Zhanli Hu

    Abstract: Low-dose CT (LDCT) is capable of reducing X-ray radiation exposure, but it will potentially degrade image quality, even yields metal artifacts at the case of metallic implants. For simultaneous LDCT reconstruction and metal artifact reduction (LDMAR), existing deep learning-based efforts face two main limitations: i) the network design neglects multi-scale and within-scale information; ii) trainin… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  19. arXiv:2504.19441  [pdf, ps, other

    cs.IT eess.SP

    Age of Information Analysis for NOMA-Assisted Grant-Free Transmissions with Randomly Arrived Packets

    Authors: Yanshi Sun, Yanglin Ye, Caihong Kai, Zhiguo Ding, Bin Chen

    Abstract: This paper investigates the application of non-orthogonal multiple access (NOMA) to grant-free transmissions to reduce the age of information (AoI) in uplink status update systems, where multiple sources upload their {status updates} to {a common} receiver. Unlike existing studies which {adopted} the idealized generate-at-will (GAW) model, {i.e., a status} update data can be generated and transmit… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  20. arXiv:2504.17836  [pdf, other

    stat.ML cs.LG eess.SY physics.comp-ph

    Learning Enhanced Ensemble Filters

    Authors: Eviatar Bach, Ricardo Baptista, Edoardo Calvello, Bohan Chen, Andrew Stuart

    Abstract: The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state--observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansa… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Preprint submitted to Journal of Computational Physics

  21. arXiv:2504.17457  [pdf, other

    cs.CV

    Unveiling Hidden Vulnerabilities in Digital Human Generation via Adversarial Attacks

    Authors: Zhiying Li, Yeying Jin, Fan Shen, Zhi Liu, Weibin Chen, Pengju Zhang, Xiaomei Zhang, Boyu Chen, Michael Shen, Kejian Wu, Zhaoxin Fan, Jin Dong

    Abstract: Expressive human pose and shape estimation (EHPS) is crucial for digital human generation, especially in applications like live streaming. While existing research primarily focuses on reducing estimation errors, it largely neglects robustness and security aspects, leaving these systems vulnerable to adversarial attacks. To address this significant challenge, we propose the \textbf{Tangible Attack… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 14 pages, 7 figures

  22. arXiv:2504.17404  [pdf, other

    cs.AI

    Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society

    Authors: Yi Zeng, Feifei Zhao, Yuwei Wang, Enmeng Lu, Yaodong Yang, Lei Wang, Chao Liu, Yitao Liang, Dongcheng Zhao, Bing Han, Haibo Tong, Yao Liang, Dongqi Liang, Kang Sun, Boyuan Chen, Jinyu Fan

    Abstract: Artificial Intelligence (AI) systems are becoming increasingly powerful and autonomous, and may progress to surpass human intelligence levels, namely Artificial Superintelligence (ASI). During the progression from AI to ASI, it may exceed human control, violate human values, and even lead to irreversible catastrophic consequences in extreme cases. This gives rise to a pressing issue that needs to… ▽ More

    Submitted 25 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  23. arXiv:2504.16299  [pdf, other

    cs.IT quant-ph

    Towards Quantum Universal Hypothesis Testing

    Authors: Arick Grootveld, Haodong Yang, Biao Chen, Venkata Gandikota, Jason Pollack

    Abstract: Hoeffding's formulation and solution to the universal hypothesis testing (UHT) problem had a profound impact on many subsequent works dealing with asymmetric hypotheses. In this work, we introduce a quantum universal hypothesis testing framework that serves as a quantum analog to Hoeffding's UHT. Motivated by Hoeffding's approach, which estimates the empirical distribution and uses it to construct… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  24. arXiv:2504.15736  [pdf, other

    cs.LG stat.ML

    Riemannian Neural Geodesic Interpolant

    Authors: Jiawen Wu, Bingguang Chen, Yuyi Zhou, Qi Meng, Rongchan Zhu, Zhi-Ming Ma

    Abstract: Stochastic interpolants are efficient generative models that bridge two arbitrary probability density functions in finite time, enabling flexible generation from the source to the target distribution or vice versa. These models are primarily developed in Euclidean space, and are therefore limited in their application to many distribution learning problems defined on Riemannian manifolds in real-wo… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  25. arXiv:2504.15472  [pdf, other

    cs.RO cs.LG eess.SY

    LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning

    Authors: Pingcheng Jian, Xiao Wei, Yanbaihui Liu, Samuel A. Moore, Michael M. Zavlanos, Boyuan Chen

    Abstract: We introduce Large Language Model-Assisted Preference Prediction (LAPP), a novel framework for robot learning that enables efficient, customizable, and expressive behavior acquisition with minimum human effort. Unlike prior approaches that rely heavily on reward engineering, human demonstrations, motion capture, or expensive pairwise preference labels, LAPP leverages large language models (LLMs) t… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.14519  [pdf, other

    cs.LG cs.AI

    SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training

    Authors: Zhouyang Li, Yuliang Liu, Wei Zhang, Tailing Yuan, Bin Chen, Chengru Song, Di Zhang

    Abstract: Pipeline Parallelism (PP) serves as a crucial technique for training Large Language Models (LLMs), owing to its capability to alleviate memory pressure from model states with relatively low communication overhead. However, in long-context scenarios, existing pipeline parallelism methods fail to address the substantial activation memory pressure, primarily due to the peak memory consumption resulti… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  27. arXiv:2504.14330   

    cs.IT cs.RO

    DLW-CI: A Dynamic Likelihood-Weighted Cooperative Infotaxis Approach for Multi-Source Search in Urban Environments Using Consumer Drone Networks

    Authors: Xiaoran Zhang, Yatai Ji, Yong Zhao, Chuan Ai, Bin Chen, Zhengqiu Zhu

    Abstract: Consumer-grade drones equipped with low-cost sensors have emerged as a cornerstone of Autonomous Intelligent Systems (AISs) for environmental monitoring and hazardous substance detection in urban environments. However, existing research primarily addresses single-source search problems, overlooking the complexities of real-world urban scenarios where both the location and quantity of hazardous sou… ▽ More

    Submitted 29 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

    Comments: Errors found in the Methods section

  28. arXiv:2504.14286  [pdf, other

    cs.LG

    SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

    Authors: Xiaojiang Zhang, Jinghui Wang, Zifei Cheng, Wenhao Zhuang, Zheng Lin, Minglei Zhang, Shaojie Wang, Yinghan Cui, Chao Wang, Junyi Peng, Shimiao Jiang, Shiqi Kuang, Shouyu Yin, Chaohang Wen, Haotian Zhang, Bin Chen, Bing Yu

    Abstract: Recent advances of reasoning models, exemplified by OpenAI's o1 and DeepSeek's R1, highlight the significant potential of Reinforcement Learning (RL) to enhance the reasoning capabilities of Large Language Models (LLMs). However, replicating these advancements across diverse domains remains challenging due to limited methodological transparency. In this work, we present two-Staged history-Resampli… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  29. arXiv:2504.11286  [pdf, other

    eess.IV cs.CV

    Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain

    Authors: Pengcheng Zheng, Kecheng Chen, Jiaxin Huang, Bohao Chen, Ju Liu, Yazhou Ren, Xiaorong Pu

    Abstract: Medical image restoration tasks aim to recover high-quality images from degraded observations, exhibiting emergent desires in many clinical scenarios, such as low-dose CT image denoising, MRI super-resolution, and MRI artifact removal. Despite the success achieved by existing deep learning-based restoration methods with sophisticated modules, they struggle with rendering computationally-efficient… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  30. arXiv:2504.09975  [pdf, other

    cs.GR cs.CV

    OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation

    Authors: Si-Tong Wei, Rui-Huan Wang, Chuan-Zhi Zhou, Baoquan Chen, Peng-Shuai Wang

    Abstract: Autoregressive models have achieved remarkable success across various domains, yet their performance in 3D shape generation lags significantly behind that of diffusion models. In this paper, we introduce OctGPT, a novel multiscale autoregressive model for 3D shape generation that dramatically improves the efficiency and performance of prior 3D autoregressive approaches, while rivaling or surpassin… ▽ More

    Submitted 15 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: SIGGRAPH 2025

  31. arXiv:2504.09903  [pdf, other

    cs.CL

    Refining Financial Consumer Complaints through Multi-Scale Model Interaction

    Authors: Bo-Wei Chen, An-Zi Yen, Chung-Chi Chen

    Abstract: Legal writing demands clarity, formality, and domain-specific precision-qualities often lacking in documents authored by individuals without legal training. To bridge this gap, this paper explores the task of legal text refinement that transforms informal, conversational inputs into persuasive legal arguments. We introduce FinDR, a Chinese dataset of financial dispute records, annotated with offic… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  32. arXiv:2504.06928  [pdf

    cs.CY cs.AI

    Beyond Tools: Generative AI as Epistemic Infrastructure in Education

    Authors: Bodong Chen

    Abstract: As generative AI rapidly integrates into educational infrastructures worldwide, it transforms how knowledge gets created, validated, and shared, yet current discourse inadequately addresses its implications as epistemic infrastructure mediating teaching and learning. This paper investigates how AI systems function as epistemic infrastructures in education and their impact on human epistemic agency… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 23 pages, 2 figures

    ACM Class: K.3.1; K.4.3; H.5.2

  33. arXiv:2504.04702  [pdf, ps, other

    cs.LG cs.AI cs.CC

    Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent

    Authors: Bo Chen, Zhenmei Shi, Zhao Song, Jiahao Zhang

    Abstract: Recent advancements in Transformer-based architectures have led to impressive breakthroughs in natural language processing tasks, with models such as GPT-4, Claude, and Gemini demonstrating human-level reasoning abilities. However, despite their high performance, concerns remain about the inherent limitations of these models, especially when it comes to learning basic logical functions. While comp… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  34. arXiv:2504.03648  [pdf, other

    cs.DC cs.AI

    AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure

    Authors: The AIBrix Team, Jiaxin Shan, Varun Gupta, Le Xu, Haiyang Shi, Jingyuan Zhang, Ning Wang, Linhui Xu, Rong Kang, Tongping Liu, Yifei Zhang, Yiqing Zhu, Shuowei Jin, Gangmuk Lim, Binbin Chen, Zuzhi Chen, Xiao Liu, Xin Chen, Kante Yin, Chak-Pong Chung, Chenyu Jiang, Yicheng Lu, Jianjun Chen, Caixue Lin, Wu Xiang , et al. (2 additional authors not shown)

    Abstract: We introduce AIBrix, a cloud-native, open-source framework designed to optimize and simplify large-scale LLM deployment in cloud environments. Unlike traditional cloud-native stacks, AIBrix follows a co-design philosophy, ensuring every layer of the infrastructure is purpose-built for seamless integration with inference engines like vLLM. AIBrix introduces several key innovations to reduce inferen… ▽ More

    Submitted 22 February, 2025; originally announced April 2025.

  35. arXiv:2504.03624  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  36. arXiv:2504.03600  [pdf, other

    eess.IV cs.AI cs.CV

    MedSAM2: Segment Anything in 3D Medical Images and Videos

    Authors: Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

    Abstract: Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation mode… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: https://medsam2.github.io/

  37. arXiv:2504.03598  [pdf, other

    cs.CL cs.AI cs.IR

    EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline

    Authors: Peter Baile Chen, Tomer Wolfson, Michael Cafarella, Dan Roth

    Abstract: Existing information retrieval systems excel in cases where the language of target documents closely matches that of the user query. However, real-world retrieval systems are often required to implicitly reason whether a document is relevant. For example, when retrieving technical texts or tables, their relevance to the user query may be implied through a particular jargon or structure, rather tha… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Dataset and code are available at https://peterbaile.github.io/enrichindex/

  38. arXiv:2504.03587  [pdf, other

    cs.CV cs.IR cs.MM

    AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing

    Authors: Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen

    Abstract: Self-Supervised Video Hashing (SSVH) compresses videos into hash codes for efficient indexing and retrieval using unlabeled training videos. Existing approaches rely on random frame sampling to learn video features and treat all frames equally. This results in suboptimal hash codes, as it ignores frame-specific information density and reconstruction difficulty. To address this limitation, we propo… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR'25. 11 pages, 5 figures, 3 tables

  39. arXiv:2504.03279  [pdf, other

    cs.DB

    Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees

    Authors: Qichen Wang, Bingnan Chen, Binyang Dai, Ke Yi, Feifei Li, Liang Lin

    Abstract: Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden consta… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Technical report for the SIGMOD 2025 paper

    ACM Class: H.2.4

  40. arXiv:2504.02998  [pdf, other

    cs.HC

    A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication

    Authors: Bixun Chen, Shaun Macdonald, Moataz Attallah, Paul Chapman, Rami Ghannam

    Abstract: Extended Reality (XR) has expanded the horizons of entertainment and social life and shows great potential in the manufacturing industry. Prototyping in XR can help designers make initial proposals and iterations at low cost before manufacturers and investors decide whether to invest in research, development or even production. According to the literature (54 manuscripts in the last 15 years) prot… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  41. arXiv:2504.01913  [pdf, other

    cs.GR cs.LG

    Representing Flow Fields with Divergence-Free Kernels for Reconstruction

    Authors: Xingyu Ni, Jingrui Xing, Xingqiao Li, Bin Wang, Baoquan Chen

    Abstract: Accurately reconstructing continuous flow fields from sparse or indirect measurements remains an open challenge, as existing techniques often suffer from oversmoothing artifacts, reliance on heterogeneous architectures, and the computational burden of enforcing physics-informed losses in implicit neural representations (INRs). In this paper, we introduce a novel flow field reconstruction framework… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  42. arXiv:2504.00745  [pdf, other

    cs.GR physics.flu-dyn

    The Granule-In-Cell Method for Simulating Sand--Water Mixtures

    Authors: Yizao Tang, Yuechen Zhu, Xingyu Ni, Baoquan Chen

    Abstract: The simulation of sand--water mixtures requires capturing the stochastic behavior of individual sand particles within a uniform, continuous fluid medium, such as the characteristic of migration, deposition, and plugging across various scenarios. In this paper, we introduce a Granule-in-Cell (GIC) method for simulating such sand--water interaction. We leverage the Discrete Element Method (DEM) to c… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  43. arXiv:2503.22952  [pdf, other

    cs.CV

    OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

    Authors: Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, Zilong Zheng

    Abstract: The rapid advancement of multi-modal language models (MLLMs) like GPT-4o has propelled the development of Omni language models, designed to process and proactively respond to continuous streams of multi-modal data. Despite their potential, evaluating their real-world interactive capabilities in streaming video contexts remains a formidable challenge. In this work, we introduce OmniMMI, a comprehen… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: To appear at CVPR 2025

  44. arXiv:2503.22913  [pdf, other

    cs.CL

    Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

    Authors: Xinyu Wang, Linrui Ma, Jerry Huang, Peng Lu, Prasanna Parthasarathi, Xiao-Wen Chang, Boxing Chen, Yufei Cui

    Abstract: Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms o… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  45. arXiv:2503.22634  [pdf, other

    cs.RO cs.AI

    Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels

    Authors: Adam Wei, Abhinav Agarwal, Boyuan Chen, Rohan Bosworth, Nicholas Pfaff, Russ Tedrake

    Abstract: In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pu… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 9 pages, 15 figures, In Submission to IROS 2025

  46. arXiv:2503.21460  [pdf, other

    cs.CL

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    Authors: Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architec… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 329 papers surveyed, resources are at https://github.com/luo-junyu/Awesome-Agent-Papers

  47. arXiv:2503.21442  [pdf, other

    cs.GR cs.CV

    RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

    Authors: Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, Mengyu Chu

    Abstract: We consider the problem of adding dynamic rain effects to in-the-wild scenes in a physically-correct manner. Recent advances in scene modeling have made significant progress, with NeRF and 3DGS techniques emerging as powerful tools for reconstructing complex scenes. However, while effective for novel view synthesis, these methods typically struggle with challenging scene editing tasks, such as phy… ▽ More

    Submitted 1 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  48. arXiv:2503.20746  [pdf, other

    cs.CV

    PhysGen3D: Crafting a Miniature Interactive World from a Single Image

    Authors: Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang

    Abstract: Envisioning physically plausible outcomes from a single image requires a deep understanding of the world's dynamics. To address this, we introduce PhysGen3D, a novel framework that transforms a single image into an amodal, camera-centric, interactive 3D scene. By combining advanced image-based geometric and semantic understanding with physics-based simulation, PhysGen3D creates an interactive 3D w… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, Project page: https://by-luckk.github.io/PhysGen3D

  49. arXiv:2503.20672  [pdf, other

    cs.CV

    BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

    Authors: Yuyang Peng, Shishi Xiao, Keming Wu, Qisheng Liao, Bohan Chen, Kevin Lin, Danqing Huang, Ji Li, Yuhui Yuan

    Abstract: Recently, state-of-the-art text-to-image generation models, such as Flux and Ideogram 2.0, have made significant progress in sentence-level visual text rendering. In this paper, we focus on the more challenging scenarios of article-level visual text rendering and address a novel task of generating high-quality business content, including infographics and slides, based on user provided article-leve… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025. Project Page: https://bizgen-msra.github.io

  50. arXiv:2503.19876  [pdf

    cs.SE

    SLA-Awareness for AI-assisted coding

    Authors: Kishanthan Thangarajah, Arthur Leung, Boyuan Chen, Ahmed E. Hassan

    Abstract: The integration of AI-assisted coding tools within development environments drastically reduces development time, and allows developers to focus more on creative and critical aspects of software engineering through the use of Code Large Language Models (CodeLLMs). These coding assistants automate repetitive and time-consuming coding tasks such as code generation, code completion, code summarizatio… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.