Skip to main content

Showing 1–50 of 313 results for author: Sun, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2507.01634  [pdf, ps, other

    cs.CV cs.AI

    Depth Anything at Any Condition

    Authors: Boyuan Sun, Modi Jin, Bowen Yin, Qibin Hou

    Abstract: We present Depth Anything at Any Condition (DepthAnything-AC), a foundation monocular depth estimation (MDE) model capable of handling diverse environmental conditions. Previous foundation MDE models achieve impressive performance across general scenes but not perform well in complex open-world environments that involve challenging conditions, such as illumination variations, adverse weather, and… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2506.21862  [pdf, ps, other

    cs.CV cs.AI cs.HC cs.MM

    LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

    Authors: Boyuan Sun, Jiaxing Zhao, Xihan Wei, Qibin Hou

    Abstract: In this paper, we present LLaVA-Scissor, a training-free token compression strategy designed for video multimodal large language models. Previous methods mostly attempt to compress tokens based on attention scores, but fail to effectively capture all semantic regions and often lead to token redundancy. Differently, we propose to leverage the Semantic Connected Components (SCC) approach that assign… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 21 pages, 4 figures, 7 tables

  4. arXiv:2506.21277  [pdf, ps, other

    cs.CV cs.CL

    HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

    Authors: Qize Yang, Shimin Yao, Weixuan Chen, Shenghao Fu, Detao Bai, Jiaxing Zhao, Boyuan Sun, Bowen Yin, Xihan Wei, Jingren Zhou

    Abstract: With the rapid evolution of multimodal large language models, the capacity to deeply understand and interpret human intentions has emerged as a critical capability, which demands detailed and thoughtful reasoning. In recent studies, Reinforcement Learning (RL) has demonstrated potential in enhancing the reasoning capabilities of Large Language Models (LLMs). Nonetheless, the challenges associated… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.21017  [pdf, ps, other

    cs.CV cs.AI

    Multimodal Prompt Alignment for Facial Expression Recognition

    Authors: Fuyan Ma, Yiran He, Bin Sun, Shutao Li

    Abstract: Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs) like CLIP for various downstream tasks. Despite their success, current VLM-based facial expression recognition (FER) methods struggle to capture fine-grained textual-visual relationships, which are essential for distinguishing subtle differences between facial expressions. To address this challenge, we propo… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: To appear in ICCV2025

  6. arXiv:2506.19488  [pdf, ps, other

    cs.CV

    SceneCrafter: Controllable Multi-View Driving Scene Editing

    Authors: Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas Guibas, Mingxing Tan, Dragomir Anguelov

    Abstract: Simulation is crucial for developing and evaluating autonomous vehicle (AV) systems. Recent literature builds on a new generation of generative models to synthesize highly realistic images for full-stack simulation. However, purely synthetically generated scenes are not grounded in reality and have difficulty in inspiring confidence in the relevance of its outcomes. Editing models, on the other ha… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: CVPR 2025

  7. arXiv:2506.17945  [pdf, ps, other

    cs.MA

    Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs

    Authors: Ming He, Peizhao Wang, Haihua Chen, Bin Sun, Hongpeng Wang

    Abstract: Multiple unmanned aerial vehicles (UAVs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UAV positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  8. arXiv:2506.17232  [pdf, ps, other

    cs.LG cs.AI cs.CV

    PCaM: A Progressive Focus Attention-Based Information Fusion Method for Improving Vision Transformer Domain Adaptation

    Authors: Zelin Zang, Fei Wang, Liangyu Li, Jinlin Wu, Chunshui Zhao, Zhen Lei, Baigui Sun

    Abstract: Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent UDA methods based on Vision Transformers (ViTs) have achieved strong performance through attention-based feature alignment. However, we identify a key limitation: foreground object mismatch, where the discrepancy in foreground object size and spatial distribution acros… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

  9. arXiv:2506.17007  [pdf, ps, other

    cs.LG

    Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators

    Authors: Marco Jiralerspong, Esther Derman, Danilo Vucetic, Nikolay Malkin, Bilun Sun, Tianyu Zhang, Pierre-Luc Bacon, Gauthier Gidel

    Abstract: A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regula… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  10. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  11. arXiv:2506.11419  [pdf, ps, other

    cs.AI cs.RO

    FocalAD: Local Motion Planning for End-to-End Autonomous Driving

    Authors: Bin Sun, Boao Zhang, Jiayi Lu, Xinjie Feng, Jiachen Shang, Rui Cao, Mengchao Zheng, Chuanye Wang, Shichun Yang, Yaoguang Cao, Ziying Song

    Abstract: In end-to-end autonomous driving,the motion prediction plays a pivotal role in ego-vehicle planning. However, existing methods often rely on globally aggregated motion features, ignoring the fact that planning decisions are primarily influenced by a small number of locally interacting agents. Failing to attend to these critical local interactions can obscure potential risks and undermine planning… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  12. arXiv:2506.08747  [pdf, ps, other

    cs.AI stat.ML

    A Sample Efficient Conditional Independence Test in the Presence of Discretization

    Authors: Boyang Sun, Yu Yao, Xinshuai Dong, Zongfang Liu, Tongliang Liu, Yumou Qiu, Kun Zhang

    Abstract: In many real-world scenarios, interested variables are often represented as discretized values due to measurement limitations. Applying Conditional Independence (CI) tests directly to such discretized data, however, can lead to incorrect conclusions. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.06297  [pdf, ps, other

    cs.LG cs.AI

    Optimal patient allocation for echocardiographic assessments

    Authors: Bozhi Sun, Seda Tierney, Jeffrey A. Feinstein, Frederick Damen, Alison L. Marsden, Daniele E. Schiavazzi

    Abstract: Scheduling echocardiographic exams in a hospital presents significant challenges due to non-deterministic factors (e.g., patient no-shows, patient arrival times, diverse exam durations, etc.) and asymmetric resource constraints between fetal and non-fetal patient streams. To address these challenges, we first conducted extensive pre-processing on one week of operational data from the Echo Laborato… ▽ More

    Submitted 17 May, 2025; originally announced June 2025.

  14. arXiv:2506.05606  [pdf, ps, other

    cs.CL cs.HC

    OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

    Authors: Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, Dakuo Wang

    Abstract: Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasonin… ▽ More

    Submitted 7 July, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  15. arXiv:2506.04627  [pdf, other

    cs.RO physics.flu-dyn

    Enhancing Efficiency and Propulsion in Bio-mimetic Robotic Fish through End-to-End Deep Reinforcement Learning

    Authors: Xinyu Cui, Boai Sun, Yi Zhu, Ning Yang, Haifeng Zhang, Weicheng Cui, Dixia Fan, Jun Wang

    Abstract: Aquatic organisms are known for their ability to generate efficient propulsion with low energy expenditure. While existing research has sought to leverage bio-inspired structures to reduce energy costs in underwater robotics, the crucial role of control policies in enhancing efficiency has often been overlooked. In this study, we optimize the motion of a bio-mimetic robotic fish using deep reinfor… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Journal ref: Physics of Fluids 36 (2024) 031910

  16. arXiv:2506.02929  [pdf, ps, other

    cs.AR

    Large Processor Chip Model

    Authors: Kaiyan Chang, Mingzhi Chen, Yunji Chen, Zhirong Chen, Dongrui Fan, Junfeng Gong, Nan Guo, Yinhe Han, Qinfen Hao, Shuo Hou, Xuan Huang, Pengwei Jin, Changxin Ke, Cangyuan Li, Guangli Li, Huawei Li, Kuan Li, Naipeng Li, Shengwen Liang, Cheng Liu, Hongwei Liu, Jiahua Liu, Junliang Lv, Jianan Mu, Jin Qin , et al. (18 additional authors not shown)

    Abstract: Computer System Architecture serves as a crucial bridge between software applications and the underlying hardware, encompassing components like compilers, CPUs, coprocessors, and RTL designs. Its development, from early mainframes to modern domain-specific architectures, has been driven by rising computational demands and advancements in semiconductor technology. However, traditional paradigms in… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  17. arXiv:2505.24654  [pdf, ps, other

    cs.RO cs.CV

    Black-box Adversarial Attacks on CNN-based SLAM Algorithms

    Authors: Maria Rafaela Gkeka, Bowen Sun, Evgenia Smirni, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas

    Abstract: Continuous advancements in deep learning have led to significant progress in feature detection, resulting in enhanced accuracy in tasks like Simultaneous Localization and Mapping (SLAM). Nevertheless, the vulnerability of deep neural networks to adversarial attacks remains a challenge for their reliable deployment in applications, such as navigation of autonomous agents. Even though CNN-based SLAM… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 9 pages, 8 figures

    MSC Class: 68T40; 68T45; 68M25;

  18. arXiv:2505.16314  [pdf, ps, other

    cs.CV cs.AI

    NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

    Authors: Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao , et al. (90 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspe… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  19. arXiv:2505.12334  [pdf, ps, other

    cs.AI

    Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance

    Authors: Yufeng Wang, Jinwu Hu, Ziteng Huang, Kunyang Lin, Zitian Zhang, Peihao Chen, Yu Hu, Qianyue Wang, Zhuliang Yu, Bin Sun, Xiaofen Xing, Qingfang Zheng, Mingkui Tan

    Abstract: Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has greatly advanced this field by improving context understanding and conversational fluency. However, existing LLM-based dialogue systems often fall short in proactiv… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: 9 pages, 7 figures

  20. arXiv:2505.11554  [pdf, ps, other

    math.OC cs.AR cs.DC cs.OS

    Multi-Objective Memory Bandwidth Regulation and Cache Partitioning for Multicore Real-Time Systems

    Authors: Binqi Sun, Zhihang Wei, Andrea Bastoni, Debayan Roy, Mirco Theile, Tomasz Kloda, Rodolfo Pellizzoni, Marco Caccamo

    Abstract: Memory bandwidth regulation and cache partitioning are widely used techniques for achieving predictable timing in real-time computing systems. Combined with partitioned scheduling, these methods require careful co-allocation of tasks and resources to cores, as task execution times strongly depend on available allocated resources. To address this challenge, this paper presents a 0-1 linear program… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted in the 37th Euromicro Conference on Real-Time Systems (ECRTS 2025)

  21. arXiv:2504.16062  [pdf, other

    cs.RO cs.CV

    ForesightNav: Learning Scene Imagination for Efficient Exploration

    Authors: Hardik Shah, Jiaxu Xing, Nico Messikommer, Boyang Sun, Marc Pollefeys, Davide Scaramuzza

    Abstract: Understanding how humans leverage prior knowledge to navigate unseen environments while making exploratory decisions is essential for developing autonomous robots with similar abilities. In this work, we propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning. Our approach equips robotic agents with the capability to predict contextual information, such as oc… ▽ More

    Submitted 5 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, 2025

  22. arXiv:2504.03612  [pdf, other

    cs.CL

    AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

    Authors: Bingxiang He, Wenbin Zhang, Jiaxi Song, Cheng Qian, Zixuan Fu, Bowen Sun, Ning Ding, Haiwen Hong, Longtao Huang, Hui Xue, Ganqu Cui, Wanxiang Che, Zhiyuan Liu, Maosong Sun

    Abstract: Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \textbf{R}esponse Pairs. Current approaches conflate these components, obscuring their individual impacts and hindering systematic optimization. In this work, we pro… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 29 pages, 11 figures

  23. arXiv:2504.00992  [pdf, other

    cs.CV

    SuperDec: 3D Scene Decomposition with Superquadric Primitives

    Authors: Elisabetta Fedele, Boyang Sun, Leonidas Guibas, Marc Pollefeys, Francis Engelmann

    Abstract: We present SuperDec, an approach for creating compact 3D scene representations via decomposition into superquadric primitives. While most recent works leverage geometric primitives to obtain photorealistic 3D scene representations, we propose to leverage them to obtain a compact yet expressive representation. We propose to solve the problem locally on individual objects and leverage the capabiliti… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  24. arXiv:2503.23307  [pdf, other

    cs.CV

    MoCha: Towards Movie-Grade Talking Character Synthesis

    Authors: Cong Wei, Bo Sun, Haoyu Ma, Ji Hou, Felix Juefei-Xu, Zecheng He, Xiaoliang Dai, Luxin Zhang, Kunpeng Li, Tingbo Hou, Animesh Sinha, Peter Vajda, Wenhu Chen

    Abstract: Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling, a crucial task for automated film, animation generation. We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text. Unlike talking head, Talking Characters aims at generating the full portrait of… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: https://congwei1230.github.io/MoCha/

  25. arXiv:2503.18988  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation

    Authors: Haoliang Shang, Hanyu Wu, Guangyao Zhai, Boyang Sun, Fangjinhua Wang, Federico Tombari, Marc Pollefeys

    Abstract: Scene graphs capture complex relationships among objects, serving as strong priors for content generation and manipulation. Yet, reasonably manipulating scene graphs -- whether by adding nodes or modifying edges -- remains a challenging and untouched task. Tasks such as adding a node to the graph or reasoning about a node's relationships with all others are computationally intractable, as even a s… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: The code will be available at https://github.com/josef5838/SG-Tailor

  26. arXiv:2503.16275  [pdf, other

    cs.RO

    Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors

    Authors: Tian Yi Lim, Boyang Sun, Marc Pollefeys, Hermann Blum

    Abstract: (Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  27. arXiv:2503.12404  [pdf, other

    cs.CV

    SAM2-ELNet: Label Enhancement and Automatic Annotation for Remote Sensing Segmentation

    Authors: Jianhao Yang, Wenshuo Yu, Yuanchao Lv, Jiance Sun, Bokang Sun, Mingyang Liu

    Abstract: Remote sensing image segmentation is crucial for environmental monitoring, disaster assessment, and resource management, directly affecting the accuracy and efficiency of surface information extraction. The performance of existing supervised models in remote sensing image segmentation tasks highly depends on the quality of label data. However, current label data mainly relies on manual annotation,… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  28. arXiv:2503.08864  [pdf, other

    cond-mat.soft cs.RO

    Real-time simulation enabled navigation control of magnetic soft continuum robots in confined lumens

    Authors: Dezhong Tong, Zhuonan Hao, Jiyu Li, Boxi Sun, Mingchao Liu, Liu Wang, Weicheng Huang

    Abstract: Magnetic soft continuum robots (MSCRs) have emerged as a promising technology for minimally invasive interventions, offering enhanced dexterity and remote-controlled navigation in confined lumens. Unlike conventional guidewires with pre-shaped tips, MSCRs feature a magnetic tip that actively bends under applied magnetic fields. Despite extensive studies in modeling and simulation, achieving real-t… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 22 pages, 12 figures

  29. arXiv:2503.07135  [pdf, other

    cs.RO cs.CV

    VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

    Authors: Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger

    Abstract: Future robots are envisioned as versatile systems capable of performing a variety of household tasks. The big question remains, how can we bridge the embodiment gap while minimizing physical robot learning, which fundamentally does not scale well. We argue that learning from in-the-wild human videos offers a promising solution for robotic manipulation tasks, as vast amounts of relevant data alread… ▽ More

    Submitted 27 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  30. arXiv:2502.18527  [pdf, other

    cs.CR cs.AI cs.LG

    GOD model: Privacy Preserved AI School for Personal Assistant

    Authors: PIN AI Team, Bill Sun, Gavin Guo, Regan Peng, Boliang Zhang, Shouqiao Wang, Laura Florescu, Xi Wang, Davide Crapis, Ben Wu

    Abstract: Personal AI assistants (e.g., Apple Intelligence, Meta AI) offer proactive recommendations that simplify everyday tasks, but their reliance on sensitive user data raises concerns about privacy and trust. To address these challenges, we introduce the Guardian of Data (GOD), a secure, privacy-preserving framework for training and evaluating AI assistants directly on-device. Unlike traditional benchm… ▽ More

    Submitted 27 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  31. arXiv:2502.18122  [pdf, other

    cs.LG cs.AI

    EU-Nets: Enhanced, Explainable and Parsimonious U-Nets

    Authors: B. Sun, P. Liò

    Abstract: In this study, we propose MHEX+, a framework adaptable to any U-Net architecture. Built upon MHEX+, we introduce novel U-Net variants, EU-Nets, which enhance explainability and uncertainty estimation, addressing the limitations of traditional U-Net models while improving performance and stability. A key innovation is the Equivalent Convolutional Kernel, which unifies consecutive convolutional laye… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  32. arXiv:2502.16560  [pdf, other

    cs.AI cs.CL cs.SI

    An Analytical Emotion Framework of Rumour Threads on Social Media

    Authors: Rui Xing, Boyang Sun, Kun Zhang, Preslav Nakov, Timothy Baldwin, Jey Han Lau

    Abstract: Rumours in online social media pose significant risks to modern society, motivating the need for better understanding of how they develop. We focus specifically on the interface between emotion and rumours in threaded discourses, building on the surprisingly sparse literature on the topic which has largely focused on single aspect of emotions within the original rumour posts themselves, and largel… ▽ More

    Submitted 13 May, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Accepted to ICWSM 2025 MisD Workshop

  33. arXiv:2502.15867  [pdf

    q-bio.OT cs.AI

    Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

    Authors: Yingying Sun, Jun A, Zhiwei Liu, Rui Sun, Liujia Qian, Samuel H. Payne, Wout Bittremieux, Markus Ralser, Chen Li, Yi Chen, Zhen Dong, Yasset Perez-Riverol, Asif Khan, Chris Sander, Ruedi Aebersold, Juan Antonio VizcaĆ­no, Jonathan R Krieger, Jianhua Yao, Han Wen, Linfeng Zhang, Yunping Zhu, Yue Xuan, Benjamin Boyang Sun, Liang Qiao, Henning Hermjakob , et al. (37 additional authors not shown)

    Abstract: Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights.… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 28 pages, 2 figures, perspective in AI proteomics

  34. arXiv:2502.13163  [pdf, other

    cs.OS cs.CR cs.SE

    A Survey of Fuzzing Open-Source Operating Systems

    Authors: Kun Hu, Qicai Chen, Zilong Lu, Wenzhuo Zhang, Bihuan Chen, You Lu, Haowen Jiang, Bingkun Sun, Xin Peng, Wenyun Zhao

    Abstract: Vulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection technique in various domains, OS fuzzing (OSF) faces unique challenges due to OS complexity and multi-layered interaction, and has not been comprehensively reviewed. Therefore, this work systematicall… ▽ More

    Submitted 20 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 45 pages

  35. arXiv:2502.12644  [pdf, other

    cs.GT

    Computing Efficient Envy-Free Partial Allocations of Indivisible Goods

    Authors: Robert Bredereck, Andrzej Kaczmarczyk, Junjie Luo, Bin Sun

    Abstract: Envy-freeness is one of the most prominent fairness concepts in the allocation of indivisible goods. Even though trivial envy-free allocations always exist, rich literature shows this is not true when one additionally requires some efficiency concept (e.g., completeness, Pareto-efficiency, or social welfare maximization). In fact, in such case even deciding the existence of an efficient envy-free… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Published by AAMAS 2025

  36. arXiv:2502.05542  [pdf, other

    cs.LG

    Democratic Training Against Universal Adversarial Perturbations

    Authors: Bing Sun, Jun Sun, Wei Zhao

    Abstract: Despite their advances and success, real-world deep neural networks are known to be vulnerable to adversarial attacks. Universal adversarial perturbation, an input-agnostic attack, poses a serious threat for them to be deployed in security-sensitive systems. In this case, a single universal adversarial perturbation deceives the model on a range of clean inputs without requiring input-specific opti… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  37. arXiv:2502.03817  [pdf, other

    cs.DS cs.LG

    Knowing When to Stop Matters: A Unified Algorithm for Online Conversion under Horizon Uncertainty

    Authors: Yanzhao Wang, Hasti Nourmohammadi Sigaroudi, Bo Sun, Omid Ardakanian, Xiaoqi Tan

    Abstract: This paper investigates the online conversion problem, which involves sequentially trading a divisible resource (e.g., energy) under dynamically changing prices to maximize profit. A key challenge in online conversion is managing decisions under horizon uncertainty, where the duration of trading is either known, revealed partway, or entirely unknown. We propose a unified algorithm that achieves op… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: 36 pages, 6 figures

  38. arXiv:2502.02543  [pdf, other

    cs.GT cs.DS

    Posted Price Mechanisms for Online Allocation with Diseconomies of Scale

    Authors: Hossein Nekouyan Jazi, Bo Sun, Raouf Boutaba, Xiaoqi Tan

    Abstract: This paper addresses the online $k$-selection problem with diseconomies of scale (OSDoS), where a seller seeks to maximize social welfare by optimally pricing items for sequentially arriving buyers, accounting for increasing marginal production costs. Previous studies have investigated deterministic dynamic pricing mechanisms for such settings. However, significant challenges remain, particularly… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 36 pages, 2 figures, accepted to the ACM Web Conference (WWW) 2025

  39. arXiv:2502.01025  [pdf, other

    cs.CL

    Knowing When to Stop: Dynamic Context Cutoff for Large Language Models

    Authors: Roy Xie, Junlin Wang, Paul Rosu, Chunyuan Deng, Bolun Sun, Zihao Lin, Bhuwan Dhingra

    Abstract: Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient in cases where the information required to answer a query is localized within the context. We present dynamic context cutoff, a human-inspired method enabling LLMs to self-terminate processing upon acquiring sufficient task-relevant information. Through analysis of model internals, we discover that sp… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: Project Website: https://royxie.com/when-to-stop-project/

  40. arXiv:2501.18990  [pdf, ps, other

    cs.LG

    Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data

    Authors: Xinshuai Dong, Ignavier Ng, Boyang Sun, Haoyue Dai, Guang-Yuan Hao, Shunxing Fan, Peter Spirtes, Yumou Qiu, Kun Zhang

    Abstract: Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variab… ▽ More

    Submitted 12 June, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

  41. arXiv:2501.15782  [pdf, other

    cs.GT cs.DS

    Online Allocation with Multi-Class Arrivals: Group Fairness vs Individual Welfare

    Authors: Faraz Zargari, Hossein Nekouyan Jazi, Bo Sun, Xiaoqi Tan

    Abstract: We introduce and study a multi-class online resource allocation problem with group fairness guarantees. The problem involves allocating a fixed amount of resources to a sequence of agents, each belonging to a specific group. The primary objective is to ensure fairness across different groups in an online setting. We focus on three fairness notions: one based on quantity and two based on utility. T… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  42. arXiv:2501.15111  [pdf, other

    cs.CV

    HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding

    Authors: Jiaxing Zhao, Qize Yang, Yixing Peng, Detao Bai, Shimin Yao, Boyuan Sun, Xiang Chen, Shenghao Fu, Weixuan chen, Xihan Wei, Liefeng Bo

    Abstract: In human-centric scenes, the ability to simultaneously understand visual and auditory information is crucial. While recent omni models can process multiple modalities, they generally lack effectiveness in human-centric scenes due to the absence of large-scale, specialized datasets and non-targeted architectures. In this work, we developed HumanOmni, the industry's first human-centric Omni-multimod… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  43. Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction

    Authors: Bo Sun, Hao Kang, Li Guan, Haoxiang Li, Philippos Mordohai, Gang Hua

    Abstract: We present a deep learning model, dubbed Glissando-Net, to simultaneously estimate the pose and reconstruct the 3D shape of objects at the category level from a single RGB image. Previous works predominantly focused on either estimating poses(often at the instance level), or reconstructing shapes, but not both. Glissando-Net is composed of two auto-encoders that are jointly trained, one for RGB im… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 15 pages, 13 Figures, accepted to TPAMI -- IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  44. arXiv:2501.11351  [pdf, other

    cs.CV eess.SP

    Automatic Labelling & Semantic Segmentation with 4D Radar Tensors

    Authors: Botao Sun, Ignacio Roldan, Francesco Fioranelli

    Abstract: In this paper, an automatic labelling process is presented for automotive datasets, leveraging on complementary information from LiDAR and camera. The generated labels are then used as ground truth with the corresponding 4D radar data as inputs to a proposed semantic segmentation network, to associate a class label to each spatial voxel. Promising results are shown by applying both approaches to t… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: Accepted in ICASSP 2025

  45. arXiv:2501.10285  [pdf, ps, other

    cs.NI

    Data-driven Online Slice Admission Control and Resource Allocation for 5G and Beyond Networks

    Authors: Muhammad Sulaiman, Bo Sun, Mohammad Ali Salahuddin, Raouf Boutaba, Aladdin Saleh

    Abstract: Virtualization in 5G and beyond networks allows the creation of virtual networks, or network slices, tailored to meet the requirements of various applications. However, this flexibility introduces several challenges for infrastructure providers (InPs) in slice admission control (AC) and resource allocation. To maximize revenue, InPs must decide in real-time whether to admit new slice requests (SRs… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  46. arXiv:2501.10124  [pdf, other

    cs.LG

    Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders

    Authors: Gongxu Luo, Haoyue Dai, Boyang Sun, Loka Li, Biwei Huang, Petar Stojanov, Kun Zhang

    Abstract: Gene Regulatory Network Inference (GRNI) aims to identify causal relationships among genes using gene expression data, providing insights into regulatory mechanisms. A significant yet often overlooked challenge is selection bias, a process where only cells meeting specific criteria, such as gene expression thresholds, survive or are observed, distorting the true joint distribution of genes and thu… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  47. arXiv:2501.09165  [pdf, other

    cs.HC

    Breaking Barriers or Building Dependency? Exploring Team-LLM Collaboration in AI-infused Classroom Debate

    Authors: Zihan Zhang, Black Sun, Pengcheng An

    Abstract: Classroom debates are a unique form of collaborative learning characterized by fast-paced, high-intensity interactions that foster critical thinking and teamwork. Despite the recognized importance of debates, the role of AI tools, particularly LLM-based systems, in supporting this dynamic learning environment has been under-explored in HCI. This study addresses this opportunity by investigating th… ▽ More

    Submitted 26 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  48. arXiv:2501.07978  [pdf, other

    cs.CV cs.AI

    Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

    Authors: Jiaxing Zhao, Boyuan Sun, Xiang Chen, Xihan Wei

    Abstract: Facial expression captioning has found widespread application across various domains. Recently, the emergence of video Multimodal Large Language Models (MLLMs) has shown promise in general video understanding tasks. However, describing facial expressions within videos poses two major challenges for these models: (1) the lack of adequate datasets and benchmarks, and (2) the limited visual token cap… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  49. arXiv:2501.05067  [pdf, other

    cs.CV cs.AI

    LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

    Authors: Jiaxing Zhao, Boyuan Sun, Xiang Chen, Xihan Wei, Qibin Hou

    Abstract: In this paper, we introduce LLaVA-Octopus, a novel video multimodal large language model. LLaVA-Octopus adaptively weights features from different visual projectors based on user instructions, enabling us to leverage the complementary strengths of each projector. We observe that different visual projectors exhibit distinct characteristics when handling specific tasks. For instance, some projectors… ▽ More

    Submitted 14 March, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

    Comments: 18 pages, 10 figures

  50. arXiv:2501.04597  [pdf, other

    cs.RO cs.CV

    FrontierNet: Learning Visual Cues to Explore

    Authors: Boyang Sun, Hanzhi Chen, Stefan Leutenegger, Cesar Cadena, Marc Pollefeys, Hermann Blum

    Abstract: Exploration of unknown environments is crucial for autonomous robots; it allows them to actively reason and decide on what new data to acquire for different tasks, such as mapping, object discovery, and environmental assessment. Existing solutions, such as frontier-based exploration approaches, rely heavily on 3D map operations, which are limited by map quality and, more critically, often overlook… ▽ More

    Submitted 7 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.