Skip to main content

Showing 1–50 of 409 results for author: Shi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05244  [pdf, ps, other

    cs.AI cs.MA

    Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration

    Authors: Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Michael Lewis, Katia Sycara, Simon Stepputtis

    Abstract: In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In thi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Best Paper Award at the RSS 2025 Generative Models x HRI (GenAI-HRI) Workshop

  2. arXiv:2506.23729  [pdf, ps, other

    cs.CV

    Proteus-ID: ID-Consistent and Motion-Coherent Video Customization

    Authors: Guiyu Zhang, Chen Shi, Zijian Jiang, Xunzhi Xiang, Jingjing Qian, Shaoshuai Shi, Li Jiang

    Abstract: Video identity customization seeks to synthesize realistic, temporally coherent videos of a specific subject, given a single reference image and a text prompt. This task presents two core challenges: (1) maintaining identity consistency while aligning with the described appearance and actions, and (2) generating natural, fluid motion without unrealistic stiffness. To address these challenges, we i… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Preprint. Work in progress

  3. arXiv:2506.23120  [pdf, ps, other

    cs.CV

    Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation

    Authors: Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi, Guangming Lu, Daojing He, Wenjie Pei, Li Jiang

    Abstract: Recent advances in point cloud perception have demonstrated remarkable progress in scene understanding through vision-language alignment leveraging large language models (LLMs). However, existing methods may still encounter challenges in handling complex instructions that require accurate spatial reasoning, even if the 3D point cloud data provides detailed spatial cues such as size and position fo… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  4. arXiv:2506.21121  [pdf, ps, other

    cs.CV cs.RO

    GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction

    Authors: Muleilan Pei, Shaoshuai Shi, Lu Zhang, Peiliang Li, Shaojie Shen

    Abstract: Trajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context r… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  5. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  6. arXiv:2506.10507  [pdf, ps, other

    cs.GR cs.CV

    Edit360: 2D Image Edits to 3D Assets from Any Angle

    Authors: Junchao Huang, Xinting Hu, Shaoshuai Shi, Zhuotao Tian, Li Jiang

    Abstract: Recent advances in diffusion models have significantly improved image generation and editing, but extending these capabilities to 3D assets remains challenging, especially for fine-grained edits that require multi-view consistency. Existing methods typically restrict editing to predetermined viewing angles, severely limiting their flexibility and practical applications. We introduce Edit360, a tun… ▽ More

    Submitted 30 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 11 pages, 9 figures

  7. arXiv:2506.10399  [pdf, ps, other

    cs.CR

    FicGCN: Unveiling the Homomorphic Encryption Efficiency from Irregular Graph Convolutional Networks

    Authors: Zhaoxuan Kan, Husheng Han, Shangyi Shi, Tenghui Hua, Hang Lu, Xiaowei Li, Jianan Mu, Xing Hu

    Abstract: Graph Convolutional Neural Networks (GCNs) have gained widespread popularity in various fields like personal healthcare and financial systems, due to their remarkable performance. Despite the growing demand for cloud-based GCN services, privacy concerns over sensitive graph data remain significant. Homomorphic Encryption (HE) facilitates Privacy-Preserving Machine Learning (PPML) by allowing compu… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  8. arXiv:2506.08541  [pdf, ps, other

    cs.CV cs.AI

    TrajFlow: Multi-modal Motion Prediction via Flow Matching

    Authors: Qi Yan, Brian Zhang, Yutong Zhang, Daniel Yang, Joshua White, Di Chen, Jiachao Liu, Langechuan Liu, Binnan Zhuang, Shaoshuai Shi, Renjie Liao

    Abstract: Efficient and accurate motion prediction is crucial for ensuring safety and informed decision-making in autonomous driving, particularly under dynamic real-world conditions that necessitate multi-modal forecasts. We introduce TrajFlow, a novel flow matching-based motion prediction framework that addresses the scalability and efficiency challenges of existing generative trajectory prediction method… ▽ More

    Submitted 5 July, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: IROS 2025

  9. arXiv:2506.07600  [pdf, ps, other

    cs.CV cs.AI

    SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding

    Authors: Nianbo Zeng, Haowen Hou, Fei Richard Yu, Si Shi, Ying Tiffany He

    Abstract: Despite recent advances in retrieval-augmented generation (RAG) for video understanding, effectively understanding long-form video content remains underexplored due to the vast scale and high complexity of video data. Current RAG approaches typically segment videos into fixed-length chunks, which often disrupts the continuity of contextual information and fails to capture authentic scene boundarie… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  10. arXiv:2506.01037  [pdf, ps, other

    cs.CV

    Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution

    Authors: Shijun Shi, Jing Xu, Lijing Lu, Zhihang Li, Kai Hu

    Abstract: Existing diffusion-based video super-resolution (VSR) methods are susceptible to introducing complex degradations and noticeable artifacts into high-resolution videos due to their inherent randomness. In this paper, we propose a noise-robust real-world VSR framework by incorporating self-supervised learning and Mamba into pre-trained latent diffusion models. To ensure content consistency across ad… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 11 pages, 10 figures, accepted by CVPR 2025

    ACM Class: I.4.4; I.2.6

  11. arXiv:2505.19239  [pdf, ps, other

    cs.CV

    DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

    Authors: Chen Shi, Shaoshuai Shi, Kehua Sheng, Bo Zhang, Li Jiang

    Abstract: Data-driven learning has advanced autonomous driving, yet task-specific models struggle with out-of-distribution scenarios due to their narrow optimization objectives and reliance on costly annotated data. We present DriveX, a self-supervised world model that learns generalizable scene dynamics and holistic representations (geometric, semantic, and motion) from large-scale driving videos. DriveX i… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  12. arXiv:2505.17513  [pdf, ps, other

    cs.LG cs.CL cs.SD eess.AS

    What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection

    Authors: Binh Nguyen, Shuji Shi, Ryan Ofman, Thai Le

    Abstract: Recent advances in text-to-speech technologies have enabled realistic voice generation, fueling audio-based deepfake attacks such as fraud and impersonation. While audio anti-spoofing systems are critical for detecting such threats, prior work has predominantly focused on acoustic-level perturbations, leaving the impact of linguistic variation largely unexplored. In this paper, we investigate the… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 15 pages, 2 fogures

    MSC Class: 53-04

  13. arXiv:2505.16805  [pdf, other

    cs.CV

    SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

    Authors: Xuesong Chen, Linjiang Huang, Tao Ma, Rongyao Fang, Shaoshuai Shi, Hongsheng Li

    Abstract: The integration of Vision-Language Models (VLMs) into autonomous driving systems has shown promise in addressing key challenges such as learning complexity, interpretability, and common-sense reasoning. However, existing approaches often struggle with efficient integration and realtime decision-making due to computational demands. In this paper, we introduce SOLVE, an innovative framework that syn… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  14. arXiv:2505.14519  [pdf, ps, other

    quant-ph cs.AR cs.DC

    Distributed quantum computing with black-box subroutines

    Authors: X. Xu, Y. -D. Liu, S. Shi, Y. -J. Wang, D. -S. Wang

    Abstract: In this work, we propose a general protocol for distributed quantum computing that accommodates arbitrary unknown subroutines. It can be applied to scale up quantum computing through multi-chip interconnection, as well as to tasks such as estimating unknown parameters or processes for circuit depth reduction and constructing secure quantum cryptographic protocols. Our protocol builds upon a few te… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  15. arXiv:2505.05512  [pdf, other

    cs.CV cs.RO

    Occupancy World Model for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Jingkai Sun, Jiahang Cao, Jiaxu Wang, Hao Cheng, Xiaozhu Ju, Zhengping Che, Renjing Xu, Jian Tang

    Abstract: Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structure… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  16. arXiv:2505.02390  [pdf, ps, other

    cs.LG cs.AI

    Quantitative Analysis of Performance Drop in DeepSeek Model Quantization

    Authors: Enbo Zhao, Yi Shen, Shuming Shi, Jieyun Huang, Zhihao Chen, Ning Wang, Siqi Xiao, Jian Zhang, Kai Wang, Shiguo Lian

    Abstract: Recently, there is a high demand for deploying DeepSeek-R1 and V3 locally, possibly because the official service often suffers from being busy and some organizations have data privacy concerns. While single-machine deployment offers infrastructure simplicity, the models' 671B FP8 parameter configuration exceeds the practical memory limits of a standard 8-GPU machine. Quantization is a widely used… ▽ More

    Submitted 13 June, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: This version added the results of DeepSeek-V3-0324

  17. arXiv:2504.14709  [pdf, other

    cs.CV cs.AI

    Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline

    Authors: Hui Zhou, Shaoshuai Shi, Hongsheng Li

    Abstract: Machine learning (ML)-based planners have recently gained significant attention. They offer advantages over traditional optimization-based planning algorithms. These advantages include fewer manually selected parameters and faster development. Within ML-based planning, imitation learning (IL) is a common algorithm. It primarily learns driving policies directly from supervised trajectory data. Whil… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  18. arXiv:2504.14604  [pdf, other

    cs.RO

    RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Hengle Ren, Renjing Xu, Jian Tang

    Abstract: 3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits t… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  19. arXiv:2504.04032  [pdf

    cs.LG cs.AI

    Contrastive and Variational Approaches in Self-Supervised Learning for Complex Data Mining

    Authors: Yingbin Liang, Lu Dai, Shuo Shi, Minghao Dai, Junliang Du, Haige Wang

    Abstract: Complex data mining has wide application value in many fields, especially in the feature extraction and classification tasks of unlabeled data. This paper proposes an algorithm based on self-supervised learning and verifies its effectiveness through experiments. The study found that in terms of the selection of optimizer and learning rate, the combination of AdamW optimizer and 0.002 learning rate… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 5 pages

  20. arXiv:2504.02312  [pdf, other

    cs.CV cs.AI

    OmniCam: Unified Multimodal Video Generation via Camera Control

    Authors: Xiaoda Yang, Jiayang Xu, Kaixuan Luan, Xinyu Zhan, Hongshun Qiu, Shijun Shi, Hao Li, Shuai Yang, Li Zhang, Checheng Yu, Cewu Lu, Lixin Yang

    Abstract: Camera control, which achieves diverse visual effects by changing camera position and pose, has attracted widespread attention. However, existing methods face challenges such as complex interaction and limited control capabilities. To address these issues, we present OmniCam, a unified multimodal camera control framework. Leveraging large language models and video diffusion models, OmniCam generat… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  21. arXiv:2504.00698  [pdf

    cs.CL cs.AI cs.LG

    Command A: An Enterprise-Ready Large Language Model

    Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

    Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More

    Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 55 pages

  22. arXiv:2503.23496  [pdf, other

    cs.AR

    FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption

    Authors: Shangyi Shi, Husheng Han, Jianan Mu, Xinyao Zheng, Ling Liang, Hang Lu, Zidong Du, Xiaowei Li, Xing Hu, Qi Guo

    Abstract: Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelera… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 9 pages,ICCAD

  23. arXiv:2503.20681  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Benchmarking Machine Learning Methods for Distributed Acoustic Sensing

    Authors: Shuaikai Shi, Qijun Zong

    Abstract: Distributed acoustic sensing (DAS) technology represents an innovative fiber-optic-based sensing methodology that enables real-time acoustic signal monitoring through the detection of minute perturbations along optical fibers. This sensing approach offers compelling advantages, including extensive measurement ranges, exceptional spatial resolution, and an expansive dynamic measurement spectrum.… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  24. arXiv:2503.18584   

    cs.LG

    A Universal Model Combining Differential Equations and Neural Networks for Ball Trajectory Prediction

    Authors: Zhiwei Shi, Chengxi Zhu, Fan Yang, Jun Yan, Zheyun Qin, Songquan Shi, Zhumin Chen

    Abstract: This paper presents a data driven universal ball trajectory prediction method integrated with physics equations. Existing methods are designed for specific ball types and struggle to generalize. This challenge arises from three key factors. First, learning-based models require large datasets but suffer from accuracy drops in unseen scenarios. Second, physics-based models rely on complex formulas a… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: This submission was made without my advisor's consent, and I mistakenly uploaded an incorrect version of the paper. Additionally, some content in the paper should not be made publicly available at this time, as per my advisor's wishes. I apologize for any inconvenience this may have caused

  25. arXiv:2503.18100  [pdf, other

    cs.CV

    M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving

    Authors: Xuesong Chen, Shaoshuai Shi, Tao Ma, Jingqiu Zhou, Simon See, Ka Chun Cheung, Hongsheng Li

    Abstract: The perception system for autonomous driving generally requires to handle multiple diverse sub-tasks. However, current algorithms typically tackle individual sub-tasks separately, which leads to low efficiency when aiming at obtaining full-perception results. Some multi-task learning methods try to unify multiple tasks with one model, but do not solve the conflicts in multi-task learning. In this… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  26. arXiv:2503.14576  [pdf, other

    cs.LG cs.AI

    SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas

    Authors: Zihao Guo, Shuqing Shi, Richard Willis, Tristan Tomilin, Joel Z. Leibo, Yali Du

    Abstract: Sequential social dilemmas pose a significant challenge in the field of multi-agent reinforcement learning (MARL), requiring environments that accurately reflect the tension between individual and collective interests. Previous benchmarks and environments, such as Melting Pot, provide an evaluation protocol that measures generalization to new social partners in various test scenarios. However, run… ▽ More

    Submitted 19 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  27. arXiv:2503.08422  [pdf, other

    cs.CV

    JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

    Authors: Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo

    Abstract: Deep-learning-based autonomous driving (AD) perception introduces a promising picture for safe and environment-friendly transportation. However, the over-reliance on real labeled data in LiDAR perception limits the scale of on-road attempts. 3D real world data is notoriously time-and-energy-consuming to annotate and lacks corner cases like rare traffic participants. On the contrary, in simulators… ▽ More

    Submitted 13 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  28. arXiv:2503.06099  [pdf, other

    cs.HC

    Advancing Problem-Based Learning with Clinical Reasoning for Improved Differential Diagnosis in Medical Education

    Authors: Yuansong Xu, Yuheng Shao, Jiahe Dong, Shaohan Shi, Chang Jiang, Quan Li

    Abstract: Medical education increasingly emphasizes students' ability to apply knowledge in real-world clinical settings, focusing on evidence-based clinical reasoning and differential diagnoses. Problem-based learning (PBL) addresses traditional teaching limitations by embedding learning into meaningful contexts and promoting active participation. However, current PBL practices are often confined to medica… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: In the ACM CHI conference on Human Factors in Computing Systems (CHI) 2025

  29. arXiv:2503.04472  [pdf, ps, other

    cs.LG cs.AI

    DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

    Authors: Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Zhaoxiang Liu, Shiguo Lian

    Abstract: Recent advancements in slow thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking (generating redundant reasoning steps for simple problems), leading to excessive computational resource usage. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks… ▽ More

    Submitted 3 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: working in progress

  30. arXiv:2502.18532  [pdf, other

    cs.AI cs.LG

    CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization

    Authors: Shuming Shi, Ruobing Zuo, Gaolei He, Jianlin Wang, Chenyang Xu, Zhengfeng Yang

    Abstract: Automated theorem proving (ATP) is one of the most challenging mathematical reasoning tasks for Large Language Models (LLMs). Most existing LLM-based ATP methods rely on supervised fine-tuning, which results in a limited alignment between the theorem proving process and human preferences. Direct Preference Optimization (DPO), which aligns LLMs with human preferences, has shown positive effects for… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  31. arXiv:2502.11058  [pdf, other

    cs.DC

    DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization

    Authors: Zhenheng Tang, Zichen Tang, Junlin Huang, Xinglin Pan, Rudan Yan, Yuxin Wang, Amelie Chi Zhou, Shaohuai Shi, Xiaowen Chu, Bo Li

    Abstract: The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in geo-distributed data centers. Communication in geo-distributed data parallel training (DDP) with stochastic gradient descent (S-SGD) is the main bottleneck in low-ba… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  32. Outback: Fast and Communication-efficient Index for Key-Value Store on Disaggregated Memory

    Authors: Yi Liu, Minghao Xie, Shouqian Shi, Yuanchao Xu, Heiner Litz, Chen Qian

    Abstract: Disaggregated memory systems achieve resource utilization efficiency and system scalability by distributing computation and memory resources into distinct pools of nodes. RDMA is an attractive solution to support high-throughput communication between different disaggregated resource pools. However, existing RDMA solutions face a dilemma: one-sided RDMA completely bypasses computation at memory nod… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Journal ref: PVLDB, 18(2): 335-348, 2024

  33. arXiv:2502.07901  [pdf, other

    cs.NI

    StarCast: A Secure and Spectrum-Efficient Group Communication Scheme for LEO Satellite Networks

    Authors: Chaoyu Zhang, Hexuan Yu, Shanghao Shi, Shaoyu Li, Yi Shi, Eric Burger, Y. Thomas Hou, Wenjing Lou

    Abstract: Low Earth Orbit (LEO) satellite networks serve as a cornerstone infrastructure for providing ubiquitous connectivity in areas where terrestrial infrastructure is unavailable. With the emergence of Direct-to-Cell (DTC) satellites, these networks can provide direct access to mobile phones and IoT devices without relying on terrestrial base stations, leading to a surge in massive connectivity demands… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  34. arXiv:2502.03723  [pdf, other

    cs.MA

    Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning

    Authors: Muhan Lin, Shuyang Shi, Yue Guo, Vaishnav Tadiparthi, Behdad Chalaki, Ehsan Moradi Pari, Simon Stepputtis, Woojun Kim, Joseph Campbell, Katia Sycara

    Abstract: Credit assignment, the process of attributing credit or blame to individual agents for their contributions to a team's success or failure, remains a fundamental challenge in multi-agent reinforcement learning (MARL), particularly in environments with sparse rewards. Commonly-used approaches such as value decomposition often lead to suboptimal policies in these settings, and designing dense reward… ▽ More

    Submitted 28 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 11 pages, 6 figures. Added the acknowledgement section

  35. arXiv:2501.17405  [pdf, other

    cs.CR

    When Everyday Devices Become Weapons: A Closer Look at the Pager and Walkie-talkie Attacks

    Authors: Pantha Protim Sarker, Upoma Das, Nitin Varshney, Shang Shi, Akshay Kulkarni, Farimah Farahmandi, Mark Tehranipoor

    Abstract: Battery-powered technologies like pagers and walkie-talkies have long been integral to civilian and military operations. However, the potential for such everyday devices to be weaponized has largely been underestimated in the realm of cybersecurity. In September 2024, Lebanon experienced a series of unprecedented, coordinated explosions triggered through compromised pagers and walkie-talkies, crea… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 18 pages, 10 figures

  36. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  37. arXiv:2501.12909  [pdf, other

    cs.CL cs.GR cs.MA

    FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

    Authors: Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang

    Abstract: Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Motivated by recent advances in automated decision-making with language agent-based societies, this paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework for end-to-end film automation in our constructed 3D vir… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Work in progress. Project Page: https://filmagent.github.io/

  38. FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models

    Authors: Xinglin Pan, Wenxiang Lin, Lin Zhang, Shaohuai Shi, Zhenheng Tang, Rui Wang, Bo Li, Xiaowen Chu

    Abstract: Recent large language models (LLMs) have tended to leverage sparsity to reduce computations, employing the sparsely activated mixture-of-experts (MoE) technique. MoE introduces four modules, including token routing, token communication, expert computation, and expert parallelism, that impact model quality and training efficiency. To enable versatile usage of MoE models, we introduce FSMoE, a flexi… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

  39. arXiv:2501.08313  [pdf, other

    cs.CL cs.CV

    MiniMax-01: Scaling Foundation Models with Lightning Attention

    Authors: MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan , et al. (65 additional authors not shown)

    Abstract: We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, o… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-sourced our MiniMax-01 at https://github.com/MiniMax-AI

  40. arXiv:2501.05580  [pdf, other

    hep-lat cs.LG hep-ph nucl-th

    Physics-Driven Learning for Inverse Problems in Quantum Chromodynamics

    Authors: Gert Aarts, Kenji Fukushima, Tetsuo Hatsuda, Andreas Ipp, Shuzhe Shi, Lingxiao Wang, Kai Zhou

    Abstract: The integration of deep learning techniques and physics-driven designs is reforming the way we address inverse problems, in which accurate physical properties are extracted from complex data sets. This is particularly relevant for quantum chromodynamics (QCD), the theory of strong interactions, with its inherent limitations in observational data and demanding computational approaches. This perspec… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 14 pages, 5 figures, submitted version to Nat Rev Phys

    Report number: RIKEN-iTHEMS-Report-25

    Journal ref: Nature Reviews Physics (2025)

  41. arXiv:2501.03575  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Cosmos World Foundation Model Platform for Physical AI

    Authors: NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman , et al. (54 additional authors not shown)

    Abstract: Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into cu… ▽ More

    Submitted 18 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  42. arXiv:2412.16979  [pdf, other

    cs.CV

    A Conditional Diffusion Model for Electrical Impedance Tomography Image Reconstruction

    Authors: Shuaikai Shi, Ruiyuan Kang, Panos Liatsis

    Abstract: Electrical impedance tomography (EIT) is a non-invasive imaging technique, capable of reconstructing images of the electrical conductivity of tissues and materials. It is popular in diverse application areas, from medical imaging to industrial process monitoring and tactile sensing, due to its low cost, real-time capabilities and non-ionizing nature. EIT visualizes the conductivity distribution wi… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

  43. arXiv:2412.10443  [pdf, other

    cs.CV cs.AI

    SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

    Authors: Zhentao Tan, Ben Xue, Jian Jia, Junhao Wang, Wencai Ye, Shaoyun Shi, Mingjie Sun, Wenjin Wu, Quan Chen, Peng Jiang

    Abstract: This paper presents the \textbf{S}emantic-a\textbf{W}ar\textbf{E} spatial-t\textbf{E}mporal \textbf{T}okenizer (SweetTok), a novel video tokenizer to overcome the limitations in current video tokenization methods for compacted yet effective discretization. Unlike previous approaches that process flattened local visual patches via direct discretization or adaptive query tokenization, SweetTok propo… ▽ More

    Submitted 10 March, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

  44. arXiv:2412.09056  [pdf, other

    cs.AI

    A Context-Enhanced Framework for Sequential Graph Reasoning

    Authors: Shuo Shi, Chao Peng, Chenyang Xu, Zhengfeng Yang

    Abstract: The paper studies sequential reasoning over graph-structured data, which stands as a fundamental task in various trending fields like automated math problem solving and neural graph algorithm learning, attracting a lot of research interest. Simultaneously managing both sequential and graph-structured information in such tasks presents a notable challenge. Over recent years, many neural architectur… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Appeared at IJCAI 2024

  45. arXiv:2412.08577  [pdf, other

    cs.SD cs.MM eess.AS

    Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

    Authors: Hongming Guo, Ruibo Fu, Yizhong Geng, Shuai Liu, Shuchen Shi, Tao Wang, Chunyu Qiang, Chenxing Li, Ya Li, Zhengqi Wen, Yukun Liu, Xuefei Liu

    Abstract: Text-to-audio (TTA) model is capable of generating diverse audio from textual prompts. However, most mainstream TTA models, which predominantly rely on Mel-spectrograms, still face challenges in producing audio with rich content. The intricate details and texture required in Mel-spectrograms for such audio often surpass the models' capacity, leading to outputs that are blurred or lack coherence. I… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  46. arXiv:2412.07094  [pdf, other

    cs.NI cs.AI

    Access Point Deployment for Localizing Accuracy and User Rate in Cell-Free Systems

    Authors: Fanfei Xu, Shengheng Liu, Zihuan Mao, Shangqing Shi, Dazhuan Xu, Dongming Wang, Yongming Huang

    Abstract: Evolving next-generation mobile networks is designed to provide ubiquitous coverage and networked sensing. With utility of multi-view sensing and multi-node joint transmission, cell-free is a promising technique to realize this prospect. This paper aims to tackle the problem of access point (AP) deployment in cell-free systems to balance the sensing accuracy and user rate. By merging the D-optimal… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Presented at MobiCom 2024

  47. arXiv:2412.05943  [pdf, other

    cs.CV

    Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling

    Authors: Jie Ning, Jiebao Sun, Shengzhu Shi, Zhichang Guo, Yao Li, Hongwei Li, Boying Wu

    Abstract: Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern. A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail. Surprisingly, perturbations specifically crafted for one model can easily transfer across vari… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  48. arXiv:2412.05848  [pdf, other

    cs.CV

    MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

    Authors: Shuwei Shi, Biao Gong, Xi Chen, Dandan Zheng, Shuai Tan, Zizheng Yang, Yuyuan Li, Jingwen He, Kecheng Zheng, Jingdong Chen, Ming Yang, Yinqiang Zheng

    Abstract: The image-to-video (I2V) generation is conditioned on the static image, which has been enhanced recently by the motion intensity as an additional control signal. These motion-aware models are appealing to generate diverse motion patterns, yet there lacks a reliable motion estimator for training such models on large-scale video set in the wild. Traditional metrics, e.g., SSIM or optical flow, are h… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  49. arXiv:2412.04261  [pdf, other

    cs.CL

    Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

    Authors: John Dang, Shivalika Singh, Daniel D'souza, Arash Ahmadian, Alejandro Salamanca, Madeline Smith, Aidan Peppin, Sungjin Hong, Manoj Govindassamy, Terrence Zhao, Sandra Kublik, Meor Amer, Viraat Aryabumi, Jon Ander Campos, Yi-Chern Tan, Tom Kocmi, Florian Strub, Nathan Grinsztajn, Yannis Flet-Berliac, Acyr Locatelli, Hangyu Lin, Dwarak Talupuru, Bharat Venkitesh, David Cairuz, Bowen Yang , et al. (20 additional authors not shown)

    Abstract: We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual prefere… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  50. arXiv:2412.03085  [pdf, other

    cs.CV

    Mimir: Improving Video Diffusion Models for Precise Text Understanding

    Authors: Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang

    Abstract: Text serves as the key control signal in video generation due to its narrative nature. To render text descriptions into video clips, current video diffusion models borrow features from text encoders yet struggle with limited text comprehension. The recent success of large language models (LLMs) showcases the power of decoder-only transformers, which offers three clear benefits for text-to-video (T… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.