Skip to main content

Showing 1–50 of 524 results for author: Dai, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.05237  [pdf, other

    cs.LG

    Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

    Authors: Ruxue Shi, Hengrui Gu, Hangting Ye, Yiwei Dai, Xu Shen, Xin Wang

    Abstract: Few-shot tabular learning, in which machine learning models are trained with a limited amount of labeled data, provides a cost-effective approach to addressing real-world challenges. The advent of Large Language Models (LLMs) has sparked interest in leveraging their pre-trained knowledge for few-shot tabular learning. Despite promising results, existing approaches either rely on test-time knowledg… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  2. arXiv:2505.04917  [pdf, other

    cs.CV

    A Simple Detector with Frame Dynamics is a Strong Tracker

    Authors: Chenxu Peng, Chenxu Wang, Minrui Zou, Danyang Li, Zhengpeng Yang, Yimian Dai, Ming-Ming Cheng, Xiang Li

    Abstract: Infrared object tracking plays a crucial role in Anti-Unmanned Aerial Vehicle (Anti-UAV) applications. Existing trackers often depend on cropped template regions and have limited motion modeling capabilities, which pose challenges when dealing with tiny targets. To address this, we propose a simple yet effective infrared tiny-object tracker that enhances tracking performance by integrating global… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 2025 CVPR Anti-UAV Workshop

  3. arXiv:2505.04665  [pdf

    cs.CL cs.AI

    Personalized Risks and Regulatory Strategies of Large Language Models in Digital Advertising

    Authors: Haoyang Feng, Yanjun Dai, Yuan Gao

    Abstract: Although large language models have demonstrated the potential for personalized advertising recommendations in experimental environments, in actual operations, how advertising recommendation systems can be combined with measures such as user privacy protection and data security is still an area worthy of in-depth discussion. To this end, this paper studies the personalized risks and regulatory str… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  4. arXiv:2505.03827  [pdf, other

    cs.LG cs.AI

    MISE: Meta-knowledge Inheritance for Social Media-Based Stressor Estimation

    Authors: Xin Wang, Ling Feng, Huijun Zhang, Lei Cao, Kaisheng Zeng, Qi Li, Yang Ding, Yi Dai, David Clifton

    Abstract: Stress haunts people in modern society, which may cause severe health issues if left unattended. With social media becoming an integral part of daily life, leveraging social media to detect stress has gained increasing attention. While the majority of the work focuses on classifying stress states and stress categories, this study introduce a new task aimed at estimating more specific stressors (li… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: WWW2025, Oral Presentation

  5. arXiv:2505.01652  [pdf, other

    cs.LG cs.AI

    Causally Fair Node Classification on Non-IID Graph Data

    Authors: Yucong Dai, Lu Zhang, Yaowei Hu, Susan Gauch, Yongkai Wu

    Abstract: Fair machine learning seeks to identify and mitigate biases in predictions against unfavorable populations characterized by demographic attributes, such as race and gender. Recently, a few works have extended fairness to graph data, such as social networks, but most of them neglect the causal relationships among data instances. This paper addresses the prevalent challenge in fairness-aware ML algo… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  6. arXiv:2504.21302  [pdf, other

    cs.CV cs.RO

    CMD: Constraining Multimodal Distribution for Domain Adaptation in Stereo Matching

    Authors: Zhelun Shen, Zhuo Li, Chenming Wu, Zhibo Rao, Lina Liu, Yuchao Dai, Liangjun Zhang

    Abstract: Recently, learning-based stereo matching methods have achieved great improvement in public benchmarks, where soft argmin and smooth L1 loss play a core contribution to their success. However, in unsupervised domain adaptation scenarios, we observe that these two operations often yield multimodal disparity probability distributions in target domains, resulting in degraded generalization. In this pa… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 13 pages, 5 figures, accepted for publication in Pattern Recognition

  7. arXiv:2504.21136  [pdf

    cs.CV cs.LG

    Legilimens: Performant Video Analytics on the System-on-Chip Edge

    Authors: Murali Ramanujam, Yinwei Dai, Kyle Jamieson, Ravi Netravali

    Abstract: Continually retraining models has emerged as a primary technique to enable high-accuracy video analytics on edge devices. Yet, existing systems employ such adaptation by relying on the spare compute resources that traditional (memory-constrained) edge servers afford. In contrast, mobile edge devices such as drones and dashcams offer a fundamentally different resource profile: weak(er) compute with… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  8. arXiv:2504.17959  [pdf, other

    cs.RO cs.LG eess.SY

    CIVIL: Causal and Intuitive Visual Imitation Learning

    Authors: Yinlong Dai, Robert Ramirez Sanchez, Ryan Jeronimus, Shahabedin Sagheb, Cara M. Nunez, Heramb Nemlekar, Dylan P. Losey

    Abstract: Today's robots learn new tasks by imitating human examples. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding the features that factor into the human's decisions, robot learners often misinterpret the data and fail to perform the task when the environmen… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  9. arXiv:2504.17732  [pdf, other

    cs.CV

    DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model

    Authors: Zhanwen Liu, Sai Zhou, Yuchao Dai, Yang Wang, Yisheng An, Xiangmo Zhao

    Abstract: All-in-One image restoration aims to address multiple image degradation problems using a single model, significantly reducing training costs and deployment complexity compared to traditional methods that design dedicated models for each degradation type. Existing approaches typically rely on Degradation-specific models or coarse-grained degradation prompts to guide image restoration. However, they… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    ACM Class: I.4.4

  10. arXiv:2504.15674  [pdf, other

    cs.CR cs.LG

    TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

    Authors: Yanbo Dai, Songze Li, Zihan Gan, Xueluan Gong

    Abstract: Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rel… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  11. arXiv:2504.15300  [pdf, other

    cs.LG cs.DC cs.MA

    Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions

    Authors: Chaoyue Niu, Yucheng Ding, Junhui Lu, Zhengxiang Huang, Hang Zeng, Yutong Dai, Xuezhen Tu, Chengfei Lv, Fan Wu, Guihai Chen

    Abstract: The conventional cloud-based large model learning framework is increasingly constrained by latency, cost, personalization, and privacy concerns. In this survey, we explore an emerging paradigm: collaborative learning between on-device small model and cloud-based large model, which promises low-latency, cost-efficient, and personalized intelligent services while preserving user privacy. We provide… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  12. arXiv:2504.14500  [pdf, other

    cs.SE

    PinChecker: Identifying Unsound Safe Abstractions of Rust Pinning APIs

    Authors: Yuxuan Dai, Yang Feng

    Abstract: The pinning APIs of Rust language guarantee memory location stability for self-referential and asynchronous constructs, as long as used according to the pinning API contract. Rust ensures violations of such contract are impossible in regular safe code, but not in unsafe code where unsafe pinning APIs can be used. Library authors can encapsulate arbitrary unsafe code within regular library function… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    ACM Class: D.2.5

  13. arXiv:2504.13344  [pdf

    cond-mat.mtrl-sci cs.AI

    Adaptive AI decision interface for autonomous electronic material discovery

    Authors: Yahao Dai, Henry Chan, Aikaterini Vriza, Fredrick Kim, Yunfei Wang, Wei Liu, Naisong Shan, Jing Xu, Max Weires, Yukun Wu, Zhiqiang Cao, C. Suzanne Miller, Ralu Divan, Xiaodan Gu, Chenhui Zhu, Sihong Wang, Jie Xu

    Abstract: AI-powered autonomous experimentation (AI/AE) can accelerate materials discovery but its effectiveness for electronic materials is hindered by data scarcity from lengthy and complex design-fabricate-test-analyze cycles. Unlike experienced human scientists, even advanced AI algorithms in AI/AE lack the adaptability to make informative real-time decisions with limited datasets. Here, we address this… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  14. arXiv:2504.07891  [pdf, other

    cs.LG cs.AI

    SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

    Authors: Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali

    Abstract: Recent advances in inference-time compute have significantly improved performance on complex tasks by generating long chains of thought (CoTs) using Large Reasoning Models (LRMs). However, this improved accuracy comes at the cost of high inference latency due to the length of generated reasoning sequences and the autoregressive nature of decoding. Our key insight in tackling these overheads is tha… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  15. arXiv:2504.07378  [pdf, other

    cs.CV

    BRepFormer: Transformer-Based B-rep Geometric Feature Recognition

    Authors: Yongkang Dai, Xiaoshui Huang, Yunpeng Bai, Hao Guo, Hongping Gan, Ling Yang, Yilei Shi

    Abstract: Recognizing geometric features on B-rep models is a cornerstone technique for multimedia content-based retrieval and has been widely applied in intelligent manufacturing. However, previous research often merely focused on Machining Feature Recognition (MFR), falling short in effectively capturing the intricate topological and geometric characteristics of complex geometry features. In this paper, w… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  16. arXiv:2504.06151  [pdf, other

    cs.OS cs.DB

    Zerrow: True Zero-Copy Arrow Pipelines in Bauplan

    Authors: Yifan Dai, Jacopo Tagliabue, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tyler R. Caraza-Harter

    Abstract: Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel support for shared memory makes it difficult to avoid copying entirely. In this work, we introduce several new techniques to eliminate nearly all copying from… ▽ More

    Submitted 13 May, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

    Comments: Pre-print conf submission

  17. arXiv:2503.22934  [pdf, other

    cs.LG cs.AI

    FairSAM: Fair Classification on Corrupted Data Through Sharpness-Aware Minimization

    Authors: Yucong Dai, Jie Ji, Xiaolong Ma, Yongkai Wu

    Abstract: Image classification models trained on clean data often suffer from significant performance degradation when exposed to testing corrupted data, such as images with impulse noise, Gaussian noise, or environmental noise. This degradation not only impacts overall performance but also disproportionately affects various demographic subgroups, raising critical algorithmic bias concerns. Although robust… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  18. arXiv:2503.21268  [pdf, other

    cs.CV

    ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate

    Authors: Ming Yan, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang

    Abstract: Human Motion Recovery (HMR) research mainly focuses on ground-based motions such as running. The study on capturing climbing motion, an off-ground motion, is sparse. This is partly due to the limited availability of climbing motion datasets, especially large-scale and challenging 3D labeled datasets. To address the insufficiency of climbing motion datasets, we collect AscendMotion, a large-scale w… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: CVPR2025, project in \href{this link}{http://www.lidarhumanmotion.net/climbingcap/}

  19. arXiv:2503.19599  [pdf, other

    cs.SE cs.AI

    HoarePrompt: Structural Reasoning About Program Correctness in Natural Language

    Authors: Dimitrios Stamatios Bouras, Yihan Dai, Tairan Wang, Yingfei Xiong, Sergey Mechtaev

    Abstract: While software requirements are often expressed in natural language, verifying the correctness of a program against natural language requirements is a hard and underexplored problem. Large language models (LLMs) are promising candidates for addressing this challenge, however our experience shows that they are ineffective in this task, often failing to detect even straightforward bugs. To address t… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  20. arXiv:2503.11251  [pdf, other

    cs.CV cs.CL

    Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

    Authors: Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong , et al. (29 additional authors not shown)

    Abstract: We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results de… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  21. arXiv:2503.06998  [pdf, other

    cs.CV

    SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

    Authors: Haoyu Zheng, Qifan Yu, Binghe Yu, Yang Dai, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: Diffusion models have achieved remarkable progress in image and video stylization. However, most existing methods focus on single-style transfer, while video stylization involving multiple styles necessitates seamless transitions between them. We refer to this smooth style transition between video frames as video style morphing. Current approaches often generate stylized video frames with disconti… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  22. arXiv:2503.06542  [pdf, other

    cs.CV cs.AI

    ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy

    Authors: Jianwen Sun, Yukang Feng, Chuanhao Li, Fanrui Zhang, Zizhen Li, Jiaxin Ai, Sizhuo Zhou, Yu Dai, Shenglin Zhang, Kaipeng Zhang

    Abstract: Unified models (UniMs) for multimodal understanding and generation have recently received much attention in the area of vision and language. Existing UniMs are designed to simultaneously learn both multimodal understanding and generation capabilities, demanding substantial computational resources, and often struggle to generate interleaved text-image. We present ARMOR, a resource-efficient and pur… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  23. arXiv:2503.06072  [pdf, other

    cs.CL cs.AI

    A Survey on Post-training of Large Language Models

    Authors: Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific per… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 87 pages, 21 figures, 9 tables

  24. arXiv:2503.05183  [pdf, other

    cs.CV math.OC

    Spectral-Spatial Extraction through Layered Tensor Decomposition for Hyperspectral Anomaly Detection

    Authors: Quan Yu, Yu-Hong Dai, Minru Bai

    Abstract: Low rank tensor representation (LRTR) methods are very useful for hyperspectral anomaly detection (HAD). To overcome the limitations that they often overlook spectral anomaly and rely on large-scale matrix singular value decomposition, we first apply non-negative matrix factorization (NMF) to alleviate spectral dimensionality redundancy and extract spectral anomaly and then employ LRTR to extract… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    MSC Class: 15A69; 47A80; 65K05

  25. arXiv:2503.04018  [pdf

    cs.CV

    NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction

    Authors: Kequan Chen, Pan Liu, Yuxuan Wang, David Z. W. Wang, Yifan Dai, Zhibin Li

    Abstract: Accurate prediction of traffic crash risks for individual vehicles is essential for enhancing vehicle safety. While significant attention has been given to traffic crash risk prediction, existing studies face two main challenges: First, due to the scarcity of individual vehicle data before crashes, most models rely on hypothetical scenarios deemed dangerous by researchers. This raises doubts about… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  26. arXiv:2503.03770  [pdf

    physics.med-ph cs.LG

    Fusion of Various Optimization Based Feature Smoothing Methods for Wearable and Non-invasive Blood Glucose Estimation

    Authors: Yiting Wei, Bingo Wing-Kuen Ling, Danni Chen, Yuheng Dai, Qing Liu

    Abstract: Recently, the wearable and non-invasive blood glucose estimation approach has been proposed. However, due to the unreliability of the acquisition device, the presence of the noise and the variations of the acquisition environments, the obtained features and the reference blood glucose values are highly unreliable. To address this issue, this paper proposes a polynomial fitting approach to smooth t… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: This version corrects several typos

    Journal ref: IET Systems Biology, 2023, 17(3): 107-120

  27. arXiv:2503.01879  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

    Authors: Che Liu, Yingji Zhang, Dong Zhang, Weijie Zhang, Chenggong Gong, Haohan Li, Yu Lu, Shilin Zhou, Yue Lu, Ziliang Gan, Ziao Wang, Junwei Liao, Haipang Wu, Ji Liu, André Freitas, Qifan Wang, Zenglin Xu, Rongjuncheng Zhang, Yong Dai

    Abstract: Human beings perceive the real world through a spectrum of sensory modalities, encompassing auditory, visual, and linguistic faculties. The journey towards achieving Artificial General Intelligence (AGI) necessitates the development of models that can emulate these multifaceted perceptual capabilities and comprehensively understand these diversified data. To this end, we introduce \textbf{Nexus-O}… ▽ More

    Submitted 7 March, 2025; v1 submitted 26 February, 2025; originally announced March 2025.

  28. arXiv:2503.00828  [pdf, other

    cs.CV cs.LG

    Training-Free Dataset Pruning for Instance Segmentation

    Authors: Yalun Dai, Lingao Xiao, Ivor W. Tsang, Yang He

    Abstract: Existing dataset pruning techniques primarily focus on classification tasks, limiting their applicability to more complex and practical tasks like instance segmentation. Instance segmentation presents three key challenges: pixel-level annotations, instance area variations, and class imbalances, which significantly complicate dataset pruning efforts. Directly adapting existing classification-based… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Accepted by ICLR 2025

  29. arXiv:2502.21060  [pdf, other

    cs.LG cs.IT

    Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes

    Authors: Yali Wei, Alan J. X. Guo, Zihui Yan, Yufan Dai

    Abstract: In recent years, the rise of DNA data storage technology has brought significant attention to the challenge of correcting insertion, deletion, and substitution (IDS) errors. Among various coding methods for IDS correction, Varshamov-Tenengolts (VT) codes, primarily designed for single-error correction, have emerged as a central research focus. While existing decoding methods achieve high accuracy… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 9 pages, 2 figures, 9 tables

  30. arXiv:2502.16488  [pdf, other

    cs.CV

    Geometry-Aware 3D Salient Object Detection Network

    Authors: Chen Wang, Liyuan Zhang, Le Hui, Qi Liu, Yuchao Dai

    Abstract: Point cloud salient object detection has attracted the attention of researchers in recent years. Since existing works do not fully utilize the geometry context of 3D objects, blurry boundaries are generated when segmenting objects with complex backgrounds. In this paper, we propose a geometry-aware 3D salient object detection network that explicitly clusters points into superpoints to enhance the… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  31. arXiv:2502.14305  [pdf, other

    cs.IR cs.LG

    Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications

    Authors: Kayhan Behdin, Yun Dai, Ata Fatahibaarzi, Aman Gupta, Qingquan Song, Shao Tang, Hejian Sang, Gregory Dexter, Sirou Zhu, Siyu Zhu, Tejas Dharamsi, Maziar Sanjabi, Vignesh Kothapalli, Hamed Firooz, Zhoutong Fu, Yihan Cao, Pin-Lun Hsu, Fedor Borisyuk, Zhipeng Wang, Rahul Mazumder, Natesh Pillai, Luke Simon

    Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendations to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this p… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  32. arXiv:2502.13311  [pdf, other

    cs.CL cs.AI

    Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

    Authors: Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai

    Abstract: Intelligent tutoring agents powered by large language models (LLMs) have been increasingly explored to deliver personalized guidance in areas such as language learning and science education. However, their capabilities in guiding users to solve complex real-world tasks remain underexplored. To address this limitation, in this work, we focus on coding tutoring, a challenging problem that requires t… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  33. arXiv:2502.12975  [pdf, other

    cs.CV

    Instance-Level Moving Object Segmentation from a Single Image with Events

    Authors: Zhexiong Wan, Bin Fan, Le Hui, Yuchao Dai, Gim Hee Lee

    Abstract: Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects, while the difficulties lie in taking into account both spatial texture structures and temporal motion cues. Existing methods based on video frames encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion due to the… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: accepted by IJCV

  34. arXiv:2502.12834  [pdf, other

    cs.NI cs.LG

    NTP-INT: Network Traffic Prediction-Driven In-band Network Telemetry for High-load Switches

    Authors: Penghui Zhang, Hua Zhang, Yuqi Dai, Cheng Zeng, Jingyu Wang, Jianxin Liao

    Abstract: In-band network telemetry (INT) is essential to network management due to its real-time visibility. However, because of the rapid increase in network devices and services, it has become crucial to have targeted access to detailed network information in a dynamic network environment. This paper proposes an intelligent network telemetry system called NTP-INT to obtain more fine-grained network infor… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  35. arXiv:2502.12548  [pdf, other

    cs.LG cs.AI

    Improving the Stability of GNN Force Field Models by Reducing Feature Correlation

    Authors: Yujie Zeng, Wenlong He, Ihor Vasyltsov, Jiaxin Wei, Ying Zhang, Lin Chen, Yuehua Dai

    Abstract: Recently, Graph Neural Network based Force Field (GNNFF) models are widely used in Molecular Dynamics (MD) simulation, which is one of the most cost-effective means in semiconductor material research. However, even such models provide high accuracy in energy and force Mean Absolute Error (MAE) over trained (in-distribution) datasets, they often become unstable during long-time MD simulation when u… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  36. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  37. arXiv:2502.10706  [pdf, other

    cs.LG cs.AI

    Raising the Bar in Graph OOD Generalization: Invariant Learning Beyond Explicit Environment Modeling

    Authors: Xu Shen, Yixin Liu, Yili Wang, Rui Miao, Yiwei Dai, Shirui Pan, Xin Wang

    Abstract: Out-of-distribution (OOD) generalization has emerged as a critical challenge in graph learning, as real-world graph data often exhibit diverse and shifting environments that traditional models fail to generalize across. A promising solution to address this issue is graph invariant learning (GIL), which aims to learn invariant representations by disentangling label-correlated invariant subgraphs fr… ▽ More

    Submitted 18 February, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

  38. arXiv:2502.10248  [pdf, other

    cs.CV cs.CL

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    Authors: Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang , et al. (90 additional authors not shown)

    Abstract: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded… ▽ More

    Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 14 figures

  39. arXiv:2502.08412  [pdf, ps, other

    cs.GT econ.TH

    Non-Monetary Mechanism Design without Distributional Information: Using Scarce Audits Wisely

    Authors: Yan Dai, Moise Blanchard, Patrick Jaillet

    Abstract: We study a repeated resource allocation problem with strategic agents where monetary transfers are disallowed and the central planner has no prior information on agents' utility distributions. In light of Arrow's impossibility theorem, acquiring information about agent preferences through some form of feedback is necessary. We assume that the central planner can request powerful but expensive audi… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  40. arXiv:2502.06132  [pdf, other

    cs.CV cs.CL

    Enhancing Document Key Information Localization Through Data Augmentation

    Authors: Yue Dai

    Abstract: The Visually Rich Form Document Intelligence and Understanding (VRDIU) Track B focuses on the localization of key information in document images. The goal is to develop a method capable of localizing objects in both digital and handwritten documents, using only digital documents for training. This paper presents a simple yet effective approach that includes a document augmentation phase and an obj… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted as a workshop paper in DOCUI-AAAI2025

  41. arXiv:2502.05130  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    Latent Swap Joint Diffusion for 2D Long-Form Latent Generation

    Authors: Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Jun Du, Kewei Li, Ruoyu Wang, Jiefeng Ma, Lei Sun, Jianqing Gao

    Abstract: This paper introduces Swap Forward (SaFa), a modality-agnostic and efficient method to generate seamless and coherence long spectrum and panorama through latent swap joint diffusion across multi-views. We first investigate the spectrum aliasing problem in spectrum-based audio generation caused by existing joint diffusion methods. Through a comparative analysis of the VAE latent representation of M… ▽ More

    Submitted 18 March, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

  42. arXiv:2502.05034  [pdf, other

    cs.CV

    MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data

    Authors: Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, Jiamin Wu

    Abstract: Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  43. arXiv:2502.03417  [pdf, other

    cs.LG

    From Features to Transformers: Redefining Ranking for Scalable Impact

    Authors: Fedor Borisyuk, Lars Hertel, Ganesh Parameswaran, Gaurav Srivastava, Sudarshan Srinivasa Ramanujam, Borja Ocejo, Peng Du, Andrei Akterskii, Neil Daftary, Shao Tang, Daqi Sun, Qiang Charles Xiao, Deepesh Nathani, Mohit Kothari, Yun Dai, Aman Gupta

    Abstract: We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the dep… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  44. arXiv:2501.16981  [pdf, other

    cs.CV

    Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection

    Authors: Xiangyu Gao, Yu Dai, Benliu Qiu, Lanxiao Wang, Heqian Qiu, Hongliang Li

    Abstract: Owing to large-scale image-text contrastive training, pre-trained vision language model (VLM) like CLIP shows superior open-vocabulary recognition ability. Most existing open-vocabulary object detectors attempt to utilize the pre-trained VLMs to attain generalized representation. F-ViT uses the pre-trained visual encoder as the backbone network and freezes it during training. However, its frozen b… ▽ More

    Submitted 6 March, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  45. arXiv:2501.16450  [pdf, other

    cs.IR cs.AI

    360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation

    Authors: Hamed Firooz, Maziar Sanjabi, Adrian Englhardt, Aman Gupta, Ben Levine, Dre Olgiati, Gungor Polatkan, Iuliia Melnychuk, Karthik Ramgopal, Kirill Talanine, Kutta Srinivasan, Luke Simon, Natesh Sivasubramoniapillai, Necip Fazil Ayan, Qingquan Song, Samira Sriram, Souvik Ghosh, Tao Song, Tejas Dharamsi, Vignesh Kothapalli, Xiaoling Zhai, Ya Xu, Yu Wang, Yun Dai

    Abstract: Ranking and recommendation systems are the foundation for numerous online experiences, ranging from search results to personalized content delivery. These systems have evolved into complex, multilayered architectures that leverage vast datasets and often incorporate thousands of predictive models. The maintenance and enhancement of these models is a labor intensive process that requires extensive… ▽ More

    Submitted 7 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  46. arXiv:2501.16186  [pdf, other

    cs.LG

    Learn to Optimize Resource Allocation under QoS Constraint of AR

    Authors: Shiyong Chen, Yuwei Dai, Shengqian Han

    Abstract: This paper studies the uplink and downlink power allocation for interactive augmented reality (AR) services, where live video captured by an AR device is uploaded to the network edge and then the augmented video is subsequently downloaded. By modeling the AR transmission process as a tandem queuing system, we derive an upper bound for the probabilistic quality of service (QoS) requirement concerni… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 7 pages, 5 figures

  47. arXiv:2501.15529  [pdf, other

    cs.LG cs.AI cs.CR

    UNIDOOR: A Universal Framework for Action-Level Backdoor Attacks in Deep Reinforcement Learning

    Authors: Oubo Ma, Linkang Du, Yang Dai, Chunyi Zhou, Qingming Li, Yuwen Pu, Shouling Ji

    Abstract: Deep reinforcement learning (DRL) is widely applied to safety-critical decision-making scenarios. However, DRL is vulnerable to backdoor attacks, especially action-level backdoors, which pose significant threats through precise manipulation and flexible activation, risking outcomes like vehicle collisions or drone crashes. The key distinction of action-level backdoors lies in the utilization of th… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: 21 pages, 12 figures, 7 tables

  48. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  49. arXiv:2501.13306  [pdf, other

    cs.SD cs.CL eess.AS

    OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

    Authors: Xuelong Geng, Kun Wei, Qijie Shao, Shuiyun Liu, Zhennan Lin, Zhixian Zhao, Guojian Li, Wenjie Tian, Peikun Chen, Yangze Li, Pengcheng Guo, Mingchen Shao, Shuiyuan Wang, Yuang Cao, Chengyou Wang, Tianyi Xu, Yuhang Dai, Xinfa Zhu, Yue Li, Li Zhang, Lei Xie

    Abstract: Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover… ▽ More

    Submitted 16 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: OSUM Technical Report v2. The experimental results reported herein differ from those in v1 because of adding new data and training in more steps

  50. arXiv:2501.11720  [pdf, other

    q-bio.TO cs.LG

    Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database

    Authors: Jeff J. H. Kim, George R. Nahass, Yang Dai, Theja Tulabandhula

    Abstract: Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality, with lung metastases being the most common site of distant spread and significantly worsening prognosis. Despite the growing availability of clinical and demographic data, predictive models for lung metastasis in HCC remain limited in scope and clinical applicability. In this study, we develop and validate an end-to-end… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: JJHK and GRN contributed equally, YD and TT are co-corresponding. 11 pages, 7 figures, 1 Table