Skip to main content

Showing 1–50 of 85 results for author: Sha, K

.
  1. arXiv:2506.18096  [pdf, ps, other

    cs.AI

    Deep Research Agents: A Systematic Examination And Roadmap

    Authors: Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, Jianye Hao, Kun Shao, Jun Wang

    Abstract: The rapid progress of Large Language Models (LLMs) has given rise to a new category of autonomous AI systems, referred to as Deep Research (DR) agents. These agents are designed to tackle complex, multi-turn informational research tasks by leveraging a combination of dynamic reasoning, adaptive long-horizon planning, multi-hop information retrieval, iterative tool use, and the generation of struct… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  2. arXiv:2506.17697  [pdf, ps, other

    cs.AI

    Beyond Syntax: Action Semantics Learning for App Agents

    Authors: Bohan Tang, Dezhao Luo, Jingxuan Chen, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao

    Abstract: The advent of Large Language Models (LLMs) enables the rise of App agents that interpret user intent and operate smartphone Apps through actions such as clicking and scrolling. While prompt-based solutions with closed LLM APIs show promising ability, they incur heavy compute costs and external API dependency. Fine-tuning smaller open-source LLMs solves these limitations. However, current fine-tuni… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  3. arXiv:2506.17346  [pdf, ps, other

    cs.CV cs.AI

    A Novel Multi-layer Task-centric and Data Quality Framework for Autonomous Driving

    Authors: Yuhan Zhou, Haihua Chen, Kewei Sha

    Abstract: The next-generation autonomous vehicles (AVs), embedded with frequent real-time decision-making, will rely heavily on a large volume of multisource and multimodal data. In real-world settings, the data quality (DQ) of different sources and modalities usually varies due to unexpected environmental factors or sensor issues. However, both researchers and practitioners in the AV field overwhelmingly c… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  4. arXiv:2506.11104  [pdf, ps, other

    cs.CL cs.AI

    DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration

    Authors: Hanzhi Zhang, Heng Fan, Kewei Sha, Yan Huang, Yunhe Feng

    Abstract: Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined masks, failing to capture heterogeneous attention patterns. This results in suboptimal token interactions, limiting adaptability and retrieval accuracy in long-s… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  5. arXiv:2506.06017  [pdf, ps, other

    cs.CL

    AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search

    Authors: Yu Li, Lehui Li, Zhihao Wu, Qingmin Liao, Jianye Hao, Kun Shao, Fengli Xu, Yong Li

    Abstract: Large language model (LLM) agents have demonstrated strong capabilities across diverse domains. However, designing high-performing agentic systems remains challenging. Existing agent search methods suffer from three major limitations: (1) an emphasis on optimizing agentic workflows while under-utilizing proven human-designed components such as memory, planning, and tool use; (2) high evaluation co… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 20pages

  6. arXiv:2505.21334  [pdf, ps, other

    cs.CV

    HoliTom: Holistic Token Merging for Fast Video Large Language Models

    Authors: Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang

    Abstract: Video large language models (video LLMs) excel at video comprehension but face significant computational inefficiency due to redundant video tokens. Existing token pruning methods offer solutions. However, approaches operating within the LLM (inner-LLM pruning), such as FastV, incur intrinsic computational overhead in shallow layers. In contrast, methods performing token pruning before the LLM (ou… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: version provides code link: https://github.com/cokeshao/HoliTom

  7. arXiv:2504.19107  [pdf, ps, other

    math.AP

    Slicing method for nonlinear integral inequalities related to critical nonlinear wave equations

    Authors: Takiko Sasaki, Kerun Shao, Hiroyuki Takamura

    Abstract: The so-called "slicing method" is one of the simple and powerful tools to show the blow-up, as well as optimal upper bound of the lifespan, of solutions to critical nonlinear wave equations by iteration with the logarithmic term. It has made strong advantages in various works on nonlinear hyperbolic PDEs. In this paper, we establish one more example as a short and simple proof of the blow-up theor… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 6 pages

    MSC Class: primary 35L71; secondary 35B44

  8. arXiv:2504.13936  [pdf, other

    cs.HC cs.LG eess.SY

    ViMo: A Generative Visual GUI World Model for App Agents

    Authors: Dezhao Luo, Bohan Tang, Kang Li, Georgios Papoudakis, Jifei Song, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao

    Abstract: App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effectiv… ▽ More

    Submitted 20 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: https://ai-agents-2030.github.io/ViMo/

  9. arXiv:2504.05686  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization

    Authors: Keren Shao, Ke Chen, Matthew Baas, Shlomo Dubnov

    Abstract: Robustness is critical in zero-shot singing voice conversion (SVC). This paper introduces two novel methods to strengthen the robustness of the kNN-VC framework for SVC. First, kNN-VC's core representation, WavLM, lacks harmonic emphasis, resulting in dull sounds and ringing artifacts. To address this, we leverage the bijection between WavLM, pitch contours, and spectrograms to perform additive sy… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 5 pages, 6 figures, 1 table, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

  10. arXiv:2504.04471  [pdf, other

    cs.CV

    VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT

    Authors: Zhuo Zhi, Qiangqiang Wu, Minghe shen, Wenbo Li, Yinchuan Li, Kun Shao, Kaiwen Zhou

    Abstract: Long video understanding has emerged as an increasingly important yet challenging task in computer vision. Agent-based approaches are gaining popularity for processing long videos, as they can handle extended sequences and integrate various tools to capture fine-grained information. However, existing methods still face several challenges: (1) they often rely solely on the reasoning ability of larg… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  11. arXiv:2503.10212  [pdf, other

    cs.CV

    MouseGPT: A Large-scale Vision-Language Model for Mouse Behavior Analysis

    Authors: Teng Xu, Taotao Zhou, Youjia Wang, Peng Yang, Simin Tang, Kuixiang Shao, Zifeng Tang, Yifei Liu, Xinyuan Chen, Hongshuang Wang, Xiaohui Wang, Huoqing Luo, Jingya Wang, Ji Hu, Jingyi Yu

    Abstract: Analyzing animal behavior is crucial in advancing neuroscience, yet quantifying and deciphering its intricate dynamics remains a significant challenge. Traditional machine vision approaches, despite their ability to detect spontaneous behaviors, fall short due to limited interpretability and reliance on manual labeling, which restricts the exploration of the full behavioral spectrum. Here, we intr… ▽ More

    Submitted 27 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 53 pages, 5 figures, 7 extended figures

  12. arXiv:2502.16268  [pdf, other

    cs.CL

    ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

    Authors: Shulin Huang, Linyi Yang, Yan Song, Shuang Chen, Leyang Cui, Ziyu Wan, Qingcheng Zeng, Ying Wen, Kun Shao, Weinan Zhang, Jun Wang, Yue Zhang

    Abstract: Evaluating large language models (LLMs) poses significant challenges, particularly due to issues of data contamination and the leakage of correct answers. To address these challenges, we introduce ThinkBench, a novel evaluation framework designed to evaluate LLMs' reasoning capability robustly. ThinkBench proposes a dynamic data generation method for constructing out-of-distribution (OOD) datasets… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  13. arXiv:2502.07949  [pdf, other

    cs.LG cs.AI

    Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning

    Authors: Qingyuan Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao

    Abstract: State-of-the-art (SOTA) reinforcement learning (RL) methods have enabled vision-language model (VLM) agents to learn from interaction with online environments without human supervision. However, these methods often struggle with learning inefficiencies when applied to complex, real-world decision-making tasks with sparse rewards and long-horizon dependencies. We propose a novel framework, Variatio… ▽ More

    Submitted 20 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  14. arXiv:2502.06395  [pdf, other

    cs.AI

    AppVLM: A Lightweight Vision Language Model for Online App Control

    Authors: Georgios Papoudakis, Thomas Coste, Zhihao Wu, Jianye Hao, Jun Wang, Kun Shao

    Abstract: The utilisation of foundation models as smartphone assistants, termed app agents, is a critical research challenge. These agents aim to execute human instructions on smartphones by interpreting textual instructions and performing actions via the device's interface. While promising, current approaches face significant limitations. Methods that use large proprietary models, such as GPT-4o, are compu… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  15. arXiv:2502.05901  [pdf, ps, other

    hep-ph

    Data determination of HQET parameters in inclusive charm decays

    Authors: Kang-Kang Shao, Chun Huang, Qin Qin

    Abstract: This work delves into the phenomenology of electronic inclusive decays of $D$ mesons, encompassing $D^0, D^+, D^+_s\to Xe^{+}ν$. The theoretical formulas for the decay widths and electron energy moments of these decays are presented as expansions with powers of $α_s$ and $Λ_{\rm QCD}/m_c$. Remarkably, the expansion exhibits excellent convergence properties when we choose the 1S mass scheme for cha… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: 10 pages, 3 tables

  16. arXiv:2502.00687  [pdf, other

    cs.AR eess.SY

    A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination

    Authors: Liang Zhao, Kunming Shao, Fengshi Tian, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Yi Zou

    Abstract: Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design flexible hardware accelerators for continuous varying precision operations. However, the previous works have issues on hardware utilization and overhead of reconfigurable logic. In this paper, we prop… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: Accepted by 2025 IEEE International Symposium on Circuits and Systems (ISCAS)

  17. arXiv:2501.13301  [pdf, other

    math.DS

    A Data-Driven Framework for Koopman Semigroup Estimation in Stochastic Dynamical Systems

    Authors: Yuanchao Xu, Kaidi Shao, Isao Ishikawa, Yuka Hashimoto, Nikos Logothetis, Zhongwei Shen

    Abstract: We present Stochastic Dynamic Mode Decomposition (SDMD), a novel data-driven framework for approximating the Koopman semigroup in stochastic dynamical systems. Unlike existing methods, SDMD explicitly incorporates sampling time into its approximation, ensuring numerical stability and precision. By directly approximating the Koopman semigroup instead of the generator, SDMD avoids computationally ex… ▽ More

    Submitted 24 May, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  18. arXiv:2501.00701  [pdf, other

    cs.LG math.DS

    ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals

    Authors: Yuanchao Xu, Kaidi Shao, Nikos Logothetis, Zhongwei Shen

    Abstract: Analyzing the long-term behavior of high-dimensional nonlinear dynamical systems remains a significant challenge. While the Koopman operator framework provides a powerful global linearization tool, current methods for approximating its spectral components often face theoretical limitations and depend on predefined dictionaries. Residual Dynamic Mode Decomposition (ResDMD) advanced the field by int… ▽ More

    Submitted 27 May, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  19. arXiv:2412.08944  [pdf, other

    cs.SD cs.LG eess.AS

    Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise

    Authors: Tornike Karchkhadze, Keren Shao, Shlomo Dubnov

    Abstract: This work presents a novel method for composing and improvising music inspired by Cornelius Cardew's Treatise, using AI to bridge graphic notation and musical expression. By leveraging OpenAI's ChatGPT to interpret the abstract visual elements of Treatise, we convert these graphical images into descriptive textual prompts. These prompts are then input into MusicLDM, a pre-trained latent diffusion… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Journal ref: 2024 IEEE International Conference on Big Data (Big Data)

  20. arXiv:2412.05544  [pdf, ps, other

    math.AP

    Criteria of the existence of global solutions to semilinear wave equations with first-order derivatives on exterior domains

    Authors: Kerun Shao

    Abstract: We study the existence of global solutions to semilinear wave equations on exterior domains $\mathbb{R}^n\setminus\mathcal{K}$, $n\geq2$, with small initial data and nonlinear terms $F(\partial u)$ where $F\in C^κ$ and $\partial^{\leqκ}F(0)=0$. If $n\geq2$ and $κ>n/2$, criteria of the existence of a global solution for general initial data are provided, except for non-empty obstacles… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: 27 pages

    MSC Class: 35L05; 35L71; 35B30; 35B33; 35B44

  21. arXiv:2411.16806  [pdf, other

    cs.AR

    SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis

    Authors: Kunming Shao, Fengshi Tian, Xiaomeng Wang, Jiakun Zheng, Jia Chen, Jingyu He, Hui Wu, Jinbo Chen, Xihao Guan, Yi Deng, Fengbin Tu, Jie Yang, Mohamad Sawan, Tim Kwang-Ting Cheng, Chi-Ying Tsui

    Abstract: Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. However, the need for customized memory cells and logic components currently necessitates significant manual effort in DCIM design. Existing tools for facilitating DCIM macro designs struggle to optimize subc… ▽ More

    Submitted 5 January, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Accepted by 2025 Design, Automation & Test in Europe Conference & Exhibition (DATE) as a regular paper

  22. arXiv:2411.04890  [pdf, other

    cs.AI cs.HC

    GUI Agents with Foundation Models: A Comprehensive Survey

    Authors: Shuai Wang, Weiwen Liu, Jingxuan Chen, Yuqi Zhou, Weinan Gan, Xingshan Zeng, Yuhan Che, Shuai Yu, Xinlong Hao, Kun Shao, Bin Wang, Chuhan Wu, Yasheng Wang, Ruiming Tang, Jianye Hao

    Abstract: Recent advances in foundation models, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), have facilitated the development of intelligent agents capable of performing complex tasks. By leveraging the ability of (M)LLMs to process and interpret Graphical User Interfaces (GUIs), these agents can autonomously execute user instructions, simulating human-like interac… ▽ More

    Submitted 13 February, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  23. arXiv:2411.03562  [pdf, other

    cs.LG cs.AI

    Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

    Authors: Antoine Grosnit, Alexandre Maraval, James Doran, Giuseppe Paolo, Albert Thomas, Refinath Shahul Hameed Nabeezath Beevi, Jonas Gonzalez, Khyati Khandelwal, Ignacio Iacobacci, Abdelhakim Benechehab, Hamza Cherkaoui, Youssef Attia El-Hili, Kun Shao, Jianye Hao, Jun Yao, Balazs Kegl, Haitham Bou-Ammar, Jun Wang

    Abstract: We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learn… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  24. arXiv:2410.17883  [pdf, other

    cs.AI

    Lightweight Neural App Control

    Authors: Filippos Christianos, Georgios Papoudakis, Thomas Coste, Jianye Hao, Jun Wang, Kun Shao

    Abstract: This paper introduces a novel mobile phone control architecture, Lightweight Multi-modal App Control (LiMAC), for efficient interactions and control across various Android apps. LiMAC takes as input a textual goal and a sequence of past mobile observations, such as screenshots and corresponding UI trees, to generate precise actions. To address the computational constraints inherent to smartphones,… ▽ More

    Submitted 12 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 (spotlight)

  25. arXiv:2410.15164  [pdf, other

    cs.AI

    SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

    Authors: Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Kaiwen Zhou, Rui Shao, Liqiang Nie, Yasheng Wang, Jianye Hao, Jun Wang, Kun Shao

    Abstract: Smartphone agents are increasingly important for helping users control devices efficiently, with (Multimodal) Large Language Model (MLLM)-based approaches emerging as key contenders. Fairly comparing these agents is essential but challenging, requiring a varied task scope, the integration of agents with different implementations, and a generalisable evaluation pipeline to assess their strengths an… ▽ More

    Submitted 31 March, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 Spotlight

  26. arXiv:2410.14803  [pdf, other

    cs.LG cs.AI cs.DC eess.SY

    DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

    Authors: Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao

    Abstract: On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control pre… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Paper and Appendix, 26 pages

  27. arXiv:2409.18152  [pdf, other

    cs.GT cs.LG math.OC

    Reinforcement Learning for Finite Space Mean-Field Type Games

    Authors: Kai Shao, Jiacheng Shen, Chijie An, Mathieu Laurière

    Abstract: Mean field type games (MFTGs) describe Nash equilibria between large coalitions: each coalition consists of a continuum of cooperative agents who maximize the average reward of their coalition while interacting non-cooperatively with a finite number of other coalitions. Although the theory has been extensively developed, we are still lacking efficient and scalable computational methods. Here, we d… ▽ More

    Submitted 4 December, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

  28. arXiv:2409.12899  [pdf, other

    cs.RO

    LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction

    Authors: Changjian Jiang, Ruilan Gao, Kele Shao, Yue Wang, Rong Xiong, Yu Zhang

    Abstract: Large-scale 3D reconstruction is critical in the field of robotics, and the potential of 3D Gaussian Splatting (3DGS) for achieving accurate object-level reconstruction has been demonstrated. However, ensuring geometric accuracy in outdoor and unbounded scenes remains a significant challenge. This study introduces LI-GS, a reconstruction system that incorporates LiDAR and Gaussian Splatting to enh… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  29. arXiv:2408.10123  [pdf, other

    cs.RO cs.CV

    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

    Authors: Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

    Abstract: Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks. A deep understanding of affordance can lead to more intelligent AI systems. For example, such knowledge directs an agent to grasp a knife by the handle for cutting and by the blade when passing it to someone. In this paper, we present a streamlined affordance learning system that encompas… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Project page: https://reagan1311.github.io/affgrasp

  30. arXiv:2408.00539  [pdf, other

    cs.CL cs.AI

    Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

    Authors: Mingcong Lu, Jiangcai Zhu, Wang Hao, Zheng Li, Shusheng Zhang, Kailai Shao, Chao Chen, Nan Li, Feng Wang, Xin Lu

    Abstract: Multi-turn dialogues are a key interaction method between humans and Large Language Models (LLMs), as conversations extend over multiple rounds, keeping LLMs' high generation quality and low latency is a challenge. Mainstream LLMs can be grouped into two categories based on masking strategy: causal LLM and prefix LLM. Several works have demonstrated that prefix LLMs tend to outperform causal ones… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  31. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 12 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  32. arXiv:2406.19614  [pdf, other

    cs.LG cs.AI

    A Survey on Data Quality Dimensions and Tools for Machine Learning

    Authors: Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, Haihua Chen

    Abstract: Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by The 6th IEEE International Conference on Artificial Intelligence Testing (IEEE AITest 2024) as an invited paper

  33. arXiv:2406.16968  [pdf, other

    cs.LG cs.AI

    Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

    Authors: Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

    Abstract: Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  34. arXiv:2406.02098  [pdf, ps, other

    math.AP

    Blow-up of solutions to semilinear wave equations with spatial derivatives

    Authors: Kerun Shao, Hiroyuki Takamura, Chengbo Wang

    Abstract: For small-amplitude semilinear wave equations with power type nonlinearity on the first-order spatial derivative, the expected sharp upper bound on the lifespan of solutions is obtained for both critical cases and subcritical cases, for all spatial dimensions $n>1$. It is achieved uniformly by constructing the integral equations, deriving the ordinary differential inequality system, and iteration… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 15 pages

    MSC Class: 35L05; 35L71; 35B30; 35B33; 35B44

  35. arXiv:2405.18849  [pdf, other

    cs.CV

    SFANet: Spatial-Frequency Attention Network for Weather Forecasting

    Authors: Jiaze Wang, Hao Chen, Hongcan Xu, Jinpeng Li, Bowen Wang, Kun Shao, Furui Liu, Huaxi Chen, Guangyong Chen, Pheng-Ann Heng

    Abstract: Weather forecasting plays a critical role in various sectors, driving decision-making and risk management. However, traditional methods often struggle to capture the complex dynamics of meteorological systems, particularly in the presence of high-resolution data. In this paper, we propose the Spatial-Frequency Attention Network (SFANet), a novel deep learning framework designed to address these ch… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  36. arXiv:2405.18679  [pdf, other

    cs.CV

    Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

    Authors: Juntao Zhang, Shaogeng Liu, Kun Bian, You Zhou, Pei Zhang, Wenbo An, Jun Zhou, Kun Shao

    Abstract: In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building efficient and general-purpose visual backbones based on SSMs is a promising direction. Compared to traditional convolutional neural networks (CNNs) and Vision Transfor… ▽ More

    Submitted 7 January, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2404.11116  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

    Authors: Keren Shao, Ke Chen, Shlomo Dubnov

    Abstract: In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Dis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

  38. arXiv:2404.02917  [pdf, ps, other

    math.AP

    On the asymptotic behavior of solutions to the steady Navier-Stokes system in two-dimensional channels

    Authors: Han Li, Kaijian Sha

    Abstract: In this paper, we investigate the incompressible steady Navier-Stokes system with no-slip boundary condition in a two-dimensional channel. Given any flux, the existence of solutions is proved as long as the width of cross-section of the channel grows more slowly than the linear growth. Furthermore, if the flux is suitably small, the solution is unique even when the width of the channel is unbounde… ▽ More

    Submitted 26 March, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.15204

    MSC Class: 35Q30; 35J67; 76D05; 76D03

  39. arXiv:2402.06570  [pdf, other

    cs.LG cs.RO

    Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

    Authors: Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson

    Abstract: Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good per… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  40. arXiv:2312.14878  [pdf, other

    cs.AI cs.LG

    Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

    Authors: Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang

    Abstract: A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: paper and appendix, 27 pages

  41. arXiv:2312.11063  [pdf, ps, other

    cs.GT cs.AI cs.DS cs.LG econ.TH

    A survey on algorithms for Nash equilibria in finite normal-form games

    Authors: Hanyu Li, Wenhan Huang, Zhijian Duan, David Henry Mguni, Kun Shao, Jun Wang, Xiaotie Deng

    Abstract: Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form ga… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The published version is in Computer Science Review

  42. arXiv:2311.16082  [pdf, other

    quant-ph cs.AI cs.AR cs.ET cs.LG

    Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

    Authors: Hanrui Wang, Pengyu Liu, Kevin Shao, Dantong Li, Jiaqi Gu, David Z. Pan, Yongshan Ding, Song Han

    Abstract: Quantum computing has the potential to solve problems that are intractable for classical systems, yet the high error rates in contemporary quantum devices often exceed tolerable limits for useful algorithm execution. Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their stat… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to ICCAD 2023, FAST ML for Science Workshop; 7 pages, 8 figures

  43. A Non-Hermitian Moiré Valley Filter

    Authors: Kai Shao, Hao Geng, Erfu Liu, Jose L. Lado, Wei Chen, D. Y. Xing

    Abstract: A valley filter capable of generating a valley-polarized current is a crucial element in valleytronics, yet its implementation remains challenging. Here, we propose a valley filter made of a graphene bilayer which exhibits a 1D moiré pattern in the overlapping region of the two layers controlled by heterostrain. In the presence of a lattice modulation between layers, electrons propagating in one l… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 6 pages, 3 figures

    Journal ref: PhysRevLett.132.156301(2024)

  44. arXiv:2310.08492  [pdf, ps, other

    math.PR q-fin.MF

    Maximal Martingale Wasserstein Inequality

    Authors: Benjamin Jourdain, Kexin Shao

    Abstract: In this note, we complete the analysis of the Martingale Wasserstein Inequality started in arXiv:2011.11599 by checking that this inequality fails in dimension $d\ge 2$ when the integrability parameter $ρ$ belongs to $[1,2)$ while a stronger Maximal Martingale Wasserstein Inequality holds whatever the dimension $d$ when $ρ\ge 2$.

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 7 pages

  45. arXiv:2308.02723  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

    Authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonic… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

  46. arXiv:2306.09200  [pdf, other

    cs.LG cs.AI

    ChessGPT: Bridging Policy Learning and Language Modeling

    Authors: Xidong Feng, Yicheng Luo, Ziyan Wang, Hongrui Tang, Mengyue Yang, Kun Shao, David Mguni, Yali Du, Jun Wang

    Abstract: When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use his… ▽ More

    Submitted 21 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Published as a conference article in NeurIPS 2023

  47. arXiv:2305.14853  [pdf, ps, other

    math.AP

    Uniqueness and uniform structural stability of Poiseuille flows with large fluxes in two-dimensional strips

    Authors: Kaijian Sha, Yun Wang, Chunjing Xie

    Abstract: In this paper, we prove the uniform nonlinear structural stability of Poiseuille flows with suitably large flux for the steady Navier-Stokes system in a two-dimensional strip with arbitrary period. Furthermore, the well-posedness theory for the Navier-Stokes system is also proved even when the $L^2$-norm of the external force is large. In particular, if the vertical velocity is suitably small wher… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.07467

  48. arXiv:2305.00565  [pdf, ps, other

    math.PR q-fin.MF

    Non-decreasing martingale couplings

    Authors: Benjamin Jourdain, Kexin Shao

    Abstract: For many examples of couples $(μ,ν)$ of probability measures on the real line in the convex order, we observe numerically that the Hobson and Neuberger martingale coupling, which maximizes for $ρ=1$ the integral of $|y-x|^ρ$ with respect to any martingale coupling between $μ$ and $ν$, is still a maximizer for $ρ\in(0,2)$ and a minimizer for $ρ>2$. We investigate the theoretical validity of this nu… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 41 pages, 1 figure

  49. DropDim: A Regularization Method for Transformer Networks

    Authors: Hao Zhang, Dan Qu, Keji Shao, Xukui Yang

    Abstract: We introduceDropDim, a structured dropout method designed for regularizing the self-attention mechanism, which is a key component of the transformer. In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions. In this way, the semantic information can be completely discarded. Thus, the excessive coadapting between different embedding dim… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: IEEE SIGNAL PROCESSING LETTERS, VOL. 29, 2022

  50. arXiv:2303.06697  [pdf, other

    cs.CV

    Traj-MAE: Masked Autoencoders for Trajectory Prediction

    Authors: Hao Chen, Jiaze Wang, Kun Shao, Furui Liu, Jianye Hao, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

    Abstract: Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environm… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.