Skip to main content

Showing 1–50 of 212 results for author: Sha, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14229  [pdf, ps, other

    cs.CV cs.AI

    HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction

    Authors: Changbai Li, Haodong Zhu, Hanlin Chen, Juan Zhang, Tongfei Chen, Shuo Yang, Shuwei Shao, Wenhao Dong, Baochang Zhang

    Abstract: 3D Gaussian Splatting (3DGS) has made significant strides in real-time 3D scene reconstruction, but faces memory scalability issues in high-resolution scenarios. To address this, we propose Hierarchical Gaussian Splatting (HRGS), a memory-efficient framework with hierarchical block-level optimization. First, we generate a global, coarse Gaussian representation from low-resolution data. Then, we pa… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  2. arXiv:2506.13059  [pdf, ps, other

    cs.CL cs.LG

    Multipole Attention for Efficient Long Context Reasoning

    Authors: Coleman Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: Large Reasoning Models (LRMs) have shown promising accuracy improvements on complex problem-solving tasks. While these models have attained high accuracy by leveraging additional computation at test time, they need to generate long chain-of-thought reasoning in order to think before answering, which requires generating thousands of tokens. While sparse attention methods can help reduce the KV cach… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 15 pages

  3. arXiv:2506.11244  [pdf, ps, other

    cs.CL

    Iterative Multilingual Spectral Attribute Erasure

    Authors: Shun Shao, Yftah Ziser, Zheng Zhao, Yifu Qiu, Shay B. Cohen, Anna Korhonen

    Abstract: Multilingual representations embed words with similar meanings to share a common semantic space across languages, creating opportunities to transfer debiasing effects between languages. However, existing methods for debiasing are unable to exploit this opportunity because they operate on individual languages. We present Iterative Multilingual Spectral Attribute Erasure (IMSAE), which identifies an… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 8 pages, 3 figures

  4. arXiv:2506.10741  [pdf, ps, other

    cs.CV

    PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

    Authors: SiXiang Chen, Jianyu Lai, Jialin Gao, Tian Ye, Haoyu Chen, Hengyu Shi, Shitong Shao, Yunlong Lin, Song Fei, Zhaohu Xing, Yeying Jin, Junfeng Luo, Xiaoming Wei, Lei Zhu

    Abstract: Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  5. arXiv:2506.04544  [pdf, other

    cs.AR cs.AI cs.LG cs.PL

    hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

    Authors: Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

    Abstract: Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v ("HDL-to-Verilog"), a dataset which seeks to increa… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  6. arXiv:2506.01942  [pdf, ps, other

    cs.CV

    OD3: Optimization-free Dataset Distillation for Object Detection

    Authors: Salwa K. Al Khatib, Ahmed ElHagry, Shitong Shao, Zhiqiang Shen

    Abstract: Training large neural networks on large-scale datasets requires substantial computational resources, particularly for dense prediction tasks such as object detection. Although dataset distillation (DD) has been proposed to alleviate these demands by synthesizing compact datasets from larger ones, most existing work focuses solely on image classification, leaving the more complex detection setting… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Equal Contribution of the first three authors

  7. arXiv:2506.00618  [pdf, ps, other

    cs.AI

    RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

    Authors: Jingyi Yang, Shuai Shao, Dongrui Liu, Jing Shao

    Abstract: With the rapid development of multimodal large language models (MLLMs), they are increasingly deployed as autonomous computer-use agents capable of accomplishing complex computer tasks. However, a pressing issue arises: Can the safety risk principles designed and aligned for general MLLMs in dialogue scenarios be effectively transferred to real-world computer-use scenarios? Existing research on ev… ▽ More

    Submitted 4 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

    Comments: 40 pages, 6 figures, Project Page: https://yjyddq.github.io/RiOSWorld.github.io/

  8. arXiv:2505.22863  [pdf, other

    cs.HC cs.CL

    Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge

    Authors: Yupei Li, Shuaijie Shao, Manuel Milling, Björn W. Schuller

    Abstract: Depression is a growing concern gaining attention in both public discourse and AI research. While deep neural networks (DNNs) have been used for recognition, they still lack real-world effectiveness. Large language models (LLMs) show strong potential but require domain-specific fine-tuning and struggle with non-textual cues. Since depression is often expressed through vocal tone and behaviour rath… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.18637  [pdf, ps, other

    cs.IT

    Neural Coding Is Not Always Semantic: Towards The Standardized Coding Workflow in Semantic Communications

    Authors: Hai-Long Qin, Jincheng Dai, Sixian Wang, Xiaoqi Qin, Shuo Shao, Kai Niu, Wenjun Xu, Ping Zhang

    Abstract: Semantic communication, leveraging advanced deep learning techniques, emerges as a new paradigm that meets the requirements of next-generation wireless networks. However, current semantic communication systems, which employ neural coding for feature extraction from raw data, have not adequately addressed the fundamental question: Is general feature extraction through deep neural networks sufficien… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  10. arXiv:2505.18574  [pdf, ps, other

    cs.PL cs.AI cs.AR cs.LG

    Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

    Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao

    Abstract: Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise… ▽ More

    Submitted 5 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

  11. arXiv:2505.16505  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Sparse Activation Editing for Reliable Instruction Following in Narratives

    Authors: Runcong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Lv, Shun Shao, Lin Gui, Ruifeng Xu, Yulan He

    Abstract: Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  12. arXiv:2505.14135  [pdf, other

    cs.CV

    Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

    Authors: Ruihuang Li, Caijin Zhou, Shoujian Zheng, Jianxiang Lu, Jiabin Huang, Comi Chen, Junshu Tang, Guangzheng Xu, Jiale Tao, Hongmei Wang, Donghao Li, Wenqing Yu, Senbo Wang, Zhimin Li, Yetshuan Shi, Haoyu Yang, Yukun Wang, Wenxun Dai, Jiaqi Li, Linqing Wang, Qixun Wang, Zhiyong Xu, Yingfang Zhang, Jiangfeng Xiong, Weijie Kong , et al. (33 additional authors not shown)

    Abstract: Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simult… ▽ More

    Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  13. arXiv:2505.11792  [pdf, ps, other

    cs.AI

    Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling

    Authors: Yitian Chen, Jingfan Xia, Siyu Shao, Dongdong Ge, Yinyu Ye

    Abstract: Optimization modeling is fundamental to decision-making across diverse domains. Despite progress in automating optimization formulation from natural language descriptions, Large Language Models (LLMs) often struggle to generate formally correct and usable models against hallucinations, posing a challenge for reliable automation. Inspired by the success of Reinforcement Learning (RL) in enhancing L… ▽ More

    Submitted 28 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  14. arXiv:2504.21738  [pdf, ps, other

    cs.RO

    LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning

    Authors: Yiyang Shao, Xiaoyu Huang, Bike Zhang, Qiayuan Liao, Yuman Gao, Yufeng Chi, Zhongyu Li, Sophia Shao, Koushil Sreenath

    Abstract: General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating language into humanoid whole-body motion remains a significant challenge, primarily due to the gap between linguistic understanding and physical actions. In this work, we present… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  15. arXiv:2504.17249  [pdf, other

    cs.RO

    Demonstrating Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot

    Authors: Yufeng Chi, Qiayuan Liao, Junfeng Long, Xiaoyu Huang, Sophia Shao, Borivoje Nikolic, Zhongyu Li, Koushil Sreenath

    Abstract: Despite significant interest and advancements in humanoid robotics, most existing commercially available hardware remains high-cost, closed-source, and non-transparent within the robotics community. This lack of accessibility and customization hinders the growth of the field and the broader development of humanoid technologies. To address these challenges and promote democratization in humanoid ro… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted in Robotics: Science and Systems (RSS) 2025

  16. arXiv:2504.16960  [pdf, other

    cs.IT eess.IV

    Can Knowledge Improve Security? A Coding-Enhanced Jamming Approach for Semantic Communication

    Authors: Weixuan Chen, Qianqian Yang, Shuo Shao, Zhiguo Shi, Jiming Chen, Xuemin, Shen

    Abstract: As semantic communication (SemCom) attracts growing attention as a novel communication paradigm, ensuring the security of transmitted semantic information over open wireless channels has become a critical issue. However, traditional encryption methods often introduce significant additional communication overhead to maintain stability, and conventional learning-based secure SemCom methods typically… ▽ More

    Submitted 6 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  17. arXiv:2504.14152  [pdf, ps, other

    cs.AR cs.LG

    FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference

    Authors: Coleman Hooper, Charbel Sakr, Ben Keller, Rangharajan Venkatesan, Kurt Keutzer, Sophia Shao, Brucek Khailany

    Abstract: Quantization is a powerful tool to improve large language model (LLM) inference efficiency by utilizing more energy-efficient low-precision datapaths and reducing memory footprint. However, accurately quantizing LLM weights and activations to low precision is challenging without degrading model accuracy. We propose fine-grained mixed precision (FGMP) quantization, a post-training mixed-precision q… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  18. arXiv:2504.13151  [pdf, ps, other

    cs.LG cs.AI cs.CL

    MIB: A Mechanistic Interpretability Benchmark

    Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

    Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization… ▽ More

    Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to ICML 2025. Project website at https://mib-bench.github.io

  19. arXiv:2504.01570  [pdf, other

    stat.ML cs.LG physics.comp-ph stat.ME

    Density estimation via mixture discrepancy and moments

    Authors: Zhengyang Lei, Sihong Shao

    Abstract: With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed [D. Li, K. Yang, W. Wong, Advances in Neural Information Processing Systems (2016) 1099-1107] to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discr… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  20. arXiv:2504.00587  [pdf, ps, other

    cs.MA cs.CL

    AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems

    Authors: Yingxuan Yang, Huacan Chai, Shuai Shao, Yuanyi Song, Siyuan Qi, Renting Rui, Weinan Zhang

    Abstract: The rapid advancement of large language models (LLMs) has enabled the development of multi-agent systems where multiple LLM-based agents collaborate on complex tasks. However, existing systems often rely on centralized coordination, leading to scalability bottlenecks, reduced adaptability, and single points of failure. Privacy and proprietary knowledge concerns further hinder cross-organizational… ▽ More

    Submitted 29 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  21. arXiv:2503.20211  [pdf, other

    cs.CV cs.RO

    Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

    Authors: Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan

    Abstract: Self-supervised depth estimation from monocular cameras in diverse outdoor conditions, such as daytime, rain, and nighttime, is challenging due to the difficulty of learning universal representations and the severe lack of labeled real-world adverse data. Previous methods either rely on synthetic inputs and pseudo-depth labels or directly apply daytime strategies to adverse conditions, resulting i… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  22. arXiv:2503.13319  [pdf, other

    cs.CV

    MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis

    Authors: Shitong Shao, Hongwei Yi, Hanzhong Guo, Tian Ye, Daquan Zhou, Michael Lingelbach, Zhiqiang Xu, Zeke Xie

    Abstract: Recently, open-source video diffusion models (VDMs), such as WanX, Magic141 and HunyuanVideo, have been scaled to over 10 billion parameters. These large-scale VDMs have demonstrated significant improvements over smaller-scale VDMs across multiple dimensions, including enhanced visual quality and more natural motion dynamics. However, these models face two major limitations: (1) High inference ove… ▽ More

    Submitted 31 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  23. arXiv:2503.12387  [pdf, other

    cs.RO

    M2UD: A Multi-model, Multi-scenario, Uneven-terrain Dataset for Ground Robot with Localization and Mapping Evaluation

    Authors: Yanpeng Jia, Shiyi Wang, Shiliang Shao, Yue Wang, Fu Zhang, Ting Wang

    Abstract: Ground robots play a crucial role in inspection, exploration, rescue, and other applications. In recent years, advancements in LiDAR technology have made sensors more accurate, lightweight, and cost-effective. Therefore, researchers increasingly integrate sensors, for SLAM studies, providing robust technical support for ground robots and expanding their application domains. Public datasets are ess… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 18 pages, 12 figures

  24. arXiv:2503.09662  [pdf, other

    cs.CV

    CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

    Authors: Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Zeke Xie

    Abstract: Making text-to-image (T2I) generative model sample both fast and well represents a promising research direction. Previous studies have typically focused on either enhancing the visual quality of synthesized images at the expense of sampling efficiency or dramatically accelerating sampling without improving the base model's generative capacity. Moreover, nearly all inference methods have not been a… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  25. arXiv:2503.05978  [pdf, other

    cs.CV

    MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

    Authors: Hongwei Yi, Tian Ye, Shitong Shao, Xuancheng Yang, Jiantong Zhao, Hanzhong Guo, Terrance Wang, Qingyu Yin, Zeke Xie, Lei Zhu, Wei Li, Michael Lingelbach, Daquan Zhou

    Abstract: We present MagicInfinite, a novel diffusion Transformer (DiT) framework that overcomes traditional portrait animation limitations, delivering high-fidelity results across diverse character types-realistic humans, full-body figures, and stylized anime characters. It supports varied facial poses, including back-facing views, and animates single or multiple characters with input masks for precise spe… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: MagicInfinite is publicly accessible at https://www.hedra.com/. More examples are at https://magicinfinite.github.io/

  26. arXiv:2503.05794  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

    Authors: Yiming Li, Kaiying Yan, Shuo Shao, Tongqing Zhai, Shu-Tao Xia, Zhan Qin, Dacheng Tao

    Abstract: With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)… ▽ More

    Submitted 5 April, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

    Comments: 14 pages. The journal extension of our ICASSP'21 paper (arXiv:2010.11607)

  27. PCL: Prompt-based Continual Learning for User Modeling in Recommender Systems

    Authors: Mingdai Yang, Fan Yang, Yanhui Guo, Shaoyuan Xu, Tianchen Zhou, Yetian Chen, Simone Shao, Jia Liu, Yan Gao

    Abstract: User modeling in large e-commerce platforms aims to optimize user experiences by incorporating various customer activities. Traditional models targeting a single task often focus on specific business metrics, neglecting the comprehensive user behavior, and thus limiting their effectiveness. To develop more generalized user representations, some existing work adopts Multi-task Learning (MTL)approac… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 5 pages. Accepted by www'25 as short paper

  28. arXiv:2502.18508  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

    Authors: Yukun Chen, Shuo Shao, Enhao Huang, Yiming Li, Pin-Yu Chen, Zhan Qin, Kui Ren

    Abstract: Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat, allowing adversaries to implant hidden malicious behaviors during the model training phase. Pre-processing-based defense, which is one of the most important defense paradigms, typically focuses on input transformations or backdoor trigger inversion (BTI) to deactivate or eliminate embedded backdoor trigg… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: This paper is accept by ICLR 2025. The first two authors contributed equally to this work. Our code is available at BackdoorBox (https://github.com/THUYimingLi/BackdoorBox) and Github repository (https://github.com/WhitolfChen/REFINE). 28 pages

  29. arXiv:2502.13575  [pdf, ps, other

    cs.LG

    ETS: Efficient Tree Search for Inference-Time Scaling

    Authors: Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: Test-time compute scaling has emerged as a new axis along which to improve model accuracy, where additional computation is used at inference time to allow the model to think longer for more challenging problems. One promising approach for test-time compute scaling is search against a process reward model, where a model generates multiple potential candidates at each step of the search, and these p… ▽ More

    Submitted 11 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 15 pages

  30. arXiv:2502.07701   

    cs.CV

    Magic 1-For-1: Generating One Minute Video Clips within One Minute

    Authors: Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou

    Abstract: In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video generation task into two separate easier tasks for diffusion step distillation, namely text-to-image generation and image-to-video generation. We verify that with the same optimization algorit… ▽ More

    Submitted 16 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Serious updates are needed

  31. arXiv:2502.07644  [pdf, other

    cs.AI

    SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models

    Authors: Shihao Xia, Mengting He, Shuai Shao, Tingting Yu, Yiying Zhang, Linhai Song

    Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each having a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to manually audit… ▽ More

    Submitted 12 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: 16 pages. arXiv admin note: text overlap with arXiv:2404.04306

  32. arXiv:2501.15509  [pdf, other

    cs.CR cs.AI cs.LG

    FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint

    Authors: Shuo Shao, Haozhe Zhu, Hongwei Yao, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

    Abstract: Model fingerprinting is a widely adopted approach to safeguard the copyright of open-source models by detecting and preventing their unauthorized reuse without modifying the protected model. However, in this paper, we reveal that existing fingerprinting methods are vulnerable to false claim attacks where adversaries falsely assert ownership of third-party non-reused models. We find that this vulne… ▽ More

    Submitted 23 May, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

  33. arXiv:2501.09520  [pdf, other

    cs.IT

    RWZC: A Model-Driven Approach for Learning-based Robust Wyner-Ziv Coding

    Authors: Yuxuan Shi, Shuo Shao, Yongpeng Wu, Wenjun Zhang, Merouane Debbah

    Abstract: In this paper, a novel learning-based Wyner-Ziv coding framework is considered under a distributed image transmission scenario, where the correlated source is only available at the receiver. Unlike other learnable frameworks, our approach demonstrates robustness to non-stationary source correlation, where the overlapping information between image pairs varies. Specifically, we first model the affi… ▽ More

    Submitted 5 February, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

    Comments: 14 pages, 17 figures, accepted by IEEE Journal on Selected Areas in Communications

  34. arXiv:2501.00051  [pdf, other

    cs.LG cs.AI eess.SY

    DDD-GenDT: Dynamic Data-driven Generative Digital Twin Framework

    Authors: Yu-Zheng Lin, Qinxuan Shi, Zhanglong Yang, Banafsheh Saber Latibari, Sicong Shao, Soheil Salehi, Pratik Satam

    Abstract: Digital twin (DT) technology has emerged as a transformative approach to simulate, predict, and optimize the behavior of physical systems, with applications that span manufacturing, healthcare, climate science, and more. However, the development of DT models often faces challenges such as high data requirements, integration complexity, and limited adaptability to dynamic changes in physical system… ▽ More

    Submitted 27 December, 2024; originally announced January 2025.

  35. arXiv:2412.18263  [pdf, other

    cs.LG math-ph physics.chem-ph physics.comp-ph quant-ph

    High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces

    Authors: Shihao Shao, Yikang Li, Zhouchen Lin, Qinghua Cui

    Abstract: Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and chemical physics. Meanwhile, the design space of available linear operations on tensors that preserve symmetry presents a significant challenge. The ICT decomposition and a basis of this equivariant space are difficult to obtain for high-rank tensors.… ▽ More

    Submitted 19 March, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: 48 pages

  36. arXiv:2412.12602  [pdf, other

    cs.RO cs.HC

    Don't Yell at Your Robot: Physical Correction as the Collaborative Interface for Language Model Powered Robots

    Authors: Chuye Zhang, Yifei Simon Shao, Harshil Parekh, Junyao Shi, Pratik Chaudhari, Vijay Kumar, Nadia Figueroa

    Abstract: We present a novel approach for enhancing human-robot collaboration using physical interactions for real-time error correction of large language model (LLM) powered robots. Unlike other methods that rely on verbal or text commands, the robot leverages an LLM to proactively executes 6 DoF linear Dynamical System (DS) commands using a description of the scene in natural language. During motion, a hu… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 7 pages, 3 figures; Generative Modeling meets HRI - RSS'24 Workshop

  37. arXiv:2412.10891  [pdf, other

    cs.CV cs.LG

    Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

    Authors: Lichen Bai, Shitong Shao, Zikai Zhou, Zipeng Qi, Zhiqiang Xu, Haoyi Xiong, Zeke Xie

    Abstract: Diffusion models, the most popular generative paradigm so far, can inject conditional information into the generation path to guide the latent towards desired directions. However, existing text-to-image diffusion models often fail to maintain high image quality and high prompt-image alignment for those challenging prompts. To mitigate this issue and enhance existing pretrained diffusion models, we… ▽ More

    Submitted 17 December, 2024; v1 submitted 14 December, 2024; originally announced December 2024.

  38. arXiv:2412.08604  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Preference Discerning with LLM-Enhanced Generative Retrieval

    Authors: Fabian Paischer, Liu Yang, Linfeng Liu, Shuai Shao, Kaveh Hassani, Jiacheng Li, Ricky Chen, Zhang Gabriel Li, Xialo Gao, Wei Shao, Xue Feng, Nima Noorshams, Sem Park, Bo Long, Hamid Eghbalzadeh

    Abstract: Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 11 pages + references and appendix

  39. arXiv:2412.03754  [pdf, other

    cs.SE

    Enhancing IR-based Fault Localization using Large Language Models

    Authors: Shuai Shao, Tingting Yu

    Abstract: Information Retrieval-based Fault Localization (IRFL) techniques aim to identify source files containing the root causes of reported failures. While existing techniques excel in ranking source files, challenges persist in bug report analysis and query construction, leading to potential information loss. Leveraging large language models like GPT-4, this paper enhances IRFL by categorizing bug repor… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  40. arXiv:2411.19946  [pdf, other

    cs.CV cs.AI cs.LG

    DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

    Authors: Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao

    Abstract: Recent advances in dataset distillation have led to solutions in two main directions. The conventional batch-to-batch matching mechanism is ideal for small-scale datasets and includes bi-level optimization methods on models and syntheses, such as FRePo, RCIG, and RaT-BPTT, as well as other methods like distribution matching, gradient matching, and weight trajectory matching. Conversely, batch-to-g… ▽ More

    Submitted 6 June, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: CVPR 2025

  41. arXiv:2411.18814  [pdf, other

    cs.IR cs.AI

    Unifying Generative and Dense Retrieval for Sequential Recommendation

    Authors: Liu Yang, Fabian Paischer, Kaveh Hassani, Jiacheng Li, Shuai Shao, Zhang Gabriel Li, Yun He, Xue Feng, Nima Noorshams, Sem Park, Bo Long, Robert D Nowak, Xiaoli Gao, Hamid Eghbalzadeh

    Abstract: Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. However, this approach requires storing a unique representation for each item, resulting in significant memory requirements as the number of it… ▽ More

    Submitted 6 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  42. arXiv:2411.11478  [pdf, other

    cs.CR

    SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI

    Authors: Kui Ren, Ziqi Yang, Li Lu, Jian Liu, Yiming Li, Jie Wan, Xiaodi Zhao, Xianheng Feng, Shuo Shao

    Abstract: The rapid advancement of AI technology, particularly in generating AI-generated content (AIGC), has transformed numerous fields, e.g., art video generation, but also brings new risks, including the misuse of AI for misinformation and intellectual property theft. To address these concerns, AIGC watermarks offer an effective solution to mitigate malicious activities. However, existing watermarking s… ▽ More

    Submitted 19 November, 2024; v1 submitted 18 November, 2024; originally announced November 2024.

  43. arXiv:2411.10781  [pdf, other

    cs.CV cs.LG

    Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer

    Authors: Shitong Shao, Zikai Zhou, Tian Ye, Lichen Bai, Zhiqiang Xu, Zeke Xie

    Abstract: Text-to-image diffusion models (DMs) develop at an unprecedented pace, supported by thorough theoretical exploration and empirical analysis. Unfortunately, the discrepancy between DMs and autoregressive models (ARMs) complicates the path toward achieving the goal of unified vision and language generation. Recently, the masked generative Transformer (MGT) serves as a promising intermediary between… ▽ More

    Submitted 27 February, 2025; v1 submitted 16 November, 2024; originally announced November 2024.

  44. arXiv:2411.09502  [pdf, other

    cs.LG cs.CV

    Golden Noise for Diffusion Models: A Learning Framework

    Authors: Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie

    Abstract: Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are ``golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusio… ▽ More

    Submitted 17 January, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

  45. arXiv:2411.02612  [pdf, other

    cs.CC

    Eulerian orientations and Hadamard codes: A novel connection via counting

    Authors: Shuai Shao, Zhuxiao Tang

    Abstract: We discover a novel connection between two classical mathematical notions, Eulerian orientations and Hadamard codes by studying the counting problem of Eulerian orientations (\#EO) with local constraint functions imposed on vertices. We present two special classes of constraint functions and a chain reaction algorithm, and show that the \#EO problem defined by each class alone is polynomial-time s… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    ACM Class: F.0

  46. arXiv:2411.00780  [pdf, other

    cs.IR

    Proactive Detection and Calibration of Seasonal Advertisements with Multimodal Large Language Models

    Authors: Hamid Eghbalzadeh, Shuai Shao, Saurabh Verma, Venugopal Mani, Hongnan Wang, Jigar Madia, Vitali Karpinchyk, Andrey Malevich

    Abstract: A myriad of factors affect large scale ads delivery systems and influence both user experience and revenue. One such factor is proactive detection and calibration of seasonal advertisements to help with increasing conversion and user satisfaction. In this paper, we present Proactive Detection and Calibration of Seasonal Advertisements (PDCaSA), a research problem that is of interest for the ads ra… ▽ More

    Submitted 16 October, 2024; originally announced November 2024.

  47. arXiv:2410.18715  [pdf, other

    cs.CV

    ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

    Authors: Zijia Zhao, Longteng Guo, Tongtian Yue, Erdong Hu, Shuai Shao, Zehuan Yuan, Hua Huang, Jing Liu

    Abstract: In this paper, we investigate the task of general conversational image retrieval on open-domain images. The objective is to search for images based on interactive conversations between humans and computers. To advance this task, we curate a dataset called ChatSearch. This dataset includes a multi-round multimodal conversational context query for each target image, thereby requiring the retrieval s… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  48. arXiv:2410.18371  [pdf, other

    cs.SD cs.AI eess.AS

    Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining

    Authors: Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shitong Shao, Zhiqiang Wang

    Abstract: Audio can disclose PII, particularly when combined with related text data. Therefore, it is essential to develop tools to detect privacy leakage in Contrastive Language-Audio Pretraining(CLAP). Existing MIAs need audio as input, risking exposure of voiceprint and requiring costly shadow models. We first propose PRMID, a membership inference detector based probability ranking given by CLAP, which d… ▽ More

    Submitted 2 November, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

  49. arXiv:2410.13240  [pdf, other

    cs.RO

    TRLO: An Efficient LiDAR Odometry with 3D Dynamic Object Tracking and Removal

    Authors: Yanpeng Jia, Ting Wang, Xieyuanli Chen, Shiliang Shao

    Abstract: Simultaneous state estimation and mapping is an essential capability for mobile robots working in dynamic urban environment. The majority of existing SLAM solutions heavily rely on a primarily static assumption. However, due to the presence of moving vehicles and pedestrians, this assumption does not always hold, leading to localization accuracy decreased and maps distorted. To address this challe… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 8pages, 5figures

  50. arXiv:2410.12142   

    cs.RO eess.SY

    Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control

    Authors: Kris Shengjun Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Christopher Fletcher, Yakun Sophia Shao

    Abstract: Empowering resource-limited robots to execute computationally intensive tasks such as locomotion and manipulation is challenging. This project provides a comprehensive design space exploration to determine optimal hardware computation architectures suitable for model-based control algorithms. We profile and optimize representative architectural designs across general-purpose scalar, vector process… ▽ More

    Submitted 24 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: This submission has been withdrawn following further internal review and discussions with collaborators, as it was determined that the current version does not meet our intended standards, and will not be updated further. This decision aligns with internal changes and agreements that were finalized post-submission