Skip to main content

Showing 1–50 of 205 results for author: Miao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09406  [pdf, ps, other

    cs.CV

    FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling

    Authors: Yue Wen, Liang Song, Yijia Liu, Siting Zhu, Yanzi Miao, Lijun Han, Hesheng Wang

    Abstract: Dynamic scene reconstruction for autonomous driving enables vehicles to perceive and interpret complex scene changes more precisely. Dynamic Neural Radiance Fields (NeRFs) have recently shown promising capability in scene modeling. However, many existing methods rely heavily on accurate poses inputs and multi-sensor data, leading to increased system complexity. To address this, we propose FreeDriv… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 7 pages, 9 figures, accepted by ICRA2025

  2. arXiv:2505.05740  [pdf, ps, other

    cs.LG

    Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks

    Authors: Xi He, Yi Miao, Max A. Little

    Abstract: This paper introduces the first globally optimal algorithm for the empirical risk minimization problem of two-layer maxout and ReLU networks, i.e., minimizing the number of misclassifications. The algorithm has a worst-case time complexity of $O\left(N^{DK+1}\right)$, where $K$ denotes the number of hidden neurons and $D$ represents the number of features. It can be can be generalized to accommoda… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  3. arXiv:2505.01383  [pdf, other

    cs.RO cs.AI

    FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research

    Authors: Yan Miao, Will Shen, Hang Cui, Sayan Mitra

    Abstract: We present FalconWing -- an open-source, ultra-lightweight (150 g) fixed-wing platform for autonomy research. The hardware platform integrates a small camera, a standard airframe, offboard computation, and radio communication for manual overrides. We demonstrate FalconWing's capabilities by developing and deploying a purely vision-based control policy for autonomous landing (without IMU or motion… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  4. arXiv:2505.00443  [pdf, other

    cs.DC

    Distributed Retrieval-Augmented Generation

    Authors: Chenhao Xu, Longxiang Gao, Yuan Miao, Xi Zheng

    Abstract: As large language models (LLMs) become increasingly adopted on edge devices, Retrieval-Augmented Generation (RAG) is gaining prominence as a solution to address factual deficiencies and hallucinations by integrating external knowledge. However, centralized RAG architectures face significant challenges in data privacy and scalability. For instance, smart healthcare services often rely on collecting… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  5. arXiv:2504.18346  [pdf, other

    cs.CL cs.AI

    Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review

    Authors: Toghrul Abbasli, Kentaroh Toyoda, Yuan Wang, Leon Witt, Muhammad Asif Ali, Yukai Miao, Dan Li, Qingsong Wei

    Abstract: Large Language Models (LLMs) have been transformative across many domains. However, hallucination -- confidently outputting incorrect information -- remains one of the leading challenges for LLMs. This raises the question of how to accurately assess and quantify the uncertainty of LLMs. Extensive literature on traditional models has explored Uncertainty Quantification (UQ) to measure uncertainty a… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  6. arXiv:2504.07733  [pdf, other

    cs.CL econ.GN

    DeepGreen: Effective LLM-Driven Green-washing Monitoring System Designed for Empirical Testing -- Evidence from China

    Authors: Congluo Xu, Yu Miao, Yiling Xiao, Chengmengjia Lin

    Abstract: This paper proposes DeepGreen, an Large Language Model Driven (LLM-Driven) system for detecting corporate green-washing behaviour. Utilizing dual-layer LLM analysis, DeepGreen preliminarily identifies potential green keywords in financial statements and then assesses their implementation degree via iterative semantic analysis of LLM. A core variable GreenImplement is derived from the ratio from th… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  7. arXiv:2504.07491  [pdf, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (68 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 15 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  8. arXiv:2504.06863  [pdf, other

    cs.CV

    MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking

    Authors: Chang Nie, Yiqing Xu, Guangming Wang, Zhe Liu, Yanzi Miao, Hesheng Wang

    Abstract: Moving object segmentation plays a vital role in understanding dynamic visual environments. While existing methods rely on multi-frame image sequences to identify moving objects, single-image MOS is critical for applications like motion intention prediction and handling camera frame drops. However, segmenting moving objects from a single image remains challenging for existing methods due to the ab… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  9. arXiv:2504.06319  [pdf, other

    cs.LG cs.AI

    Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching

    Authors: Yanhao Dong, Yubo Miao, Weinan Li, Xiao Zheng, Chao Wang, Feng Lyu

    Abstract: Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. In this paper, we propose an L2 Cache-oriented asynchronous KV Cache prefetching method to break through the memory bandwidth bottleneck in LLM inference through computation-load overlap. By strategically scheduling idle memory bandwidth during act… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures

  10. arXiv:2504.04717  [pdf, other

    cs.CL cs.AI

    Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

    Authors: Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, Rema Padman

    Abstract: Recent advancements in large language models (LLMs) have revolutionized their ability to handle single-turn tasks, yet real-world applications demand sophisticated multi-turn interactions. This survey provides a comprehensive review of recent advancements in evaluating and enhancing multi-turn interactions in LLMs. Focusing on task-specific scenarios, from instruction following in diverse domains… ▽ More

    Submitted 13 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  11. arXiv:2503.22353  [pdf, other

    cs.CL cs.AI

    Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions

    Authors: Yubo Li, Yidi Miao, Xueying Ding, Ramayya Krishnan, Rema Padman

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities across various tasks, but their deployment in high-stake domains requires consistent performance across multiple interaction rounds. This paper introduces a comprehensive framework for evaluating and improving LLM response consistency, making three key contributions. First, we propose a novel Position-Weighted Consistency (PWC) score… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 8 pages, 5 figures

  12. arXiv:2503.16065  [pdf, other

    cs.CV

    Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model

    Authors: Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen

    Abstract: While virtual try-on for clothes and shoes with diffusion models has gained attraction, virtual try-on for ornaments, such as bracelets, rings, earrings, and necklaces, remains largely unexplored. Due to the intricate tiny patterns and repeated geometric sub-structures in most ornaments, it is much more difficult to guarantee identity and appearance consistency under large pose and scale variances… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  13. arXiv:2503.15870  [pdf, other

    cs.LG

    FedSAF: A Federated Learning Framework for Enhanced Gastric Cancer Detection and Privacy Preservation

    Authors: Yuxin Miao, Xinyuan Yang, Hongda Fan, Yichun Li, Yishu Hong, Xiechen Guo, Ali Braytee, Weidong Huang, Ali Anaissi

    Abstract: Gastric cancer is one of the most commonly diagnosed cancers and has a high mortality rate. Due to limited medical resources, developing machine learning models for gastric cancer recognition provides an efficient solution for medical institutions. However, such models typically require large sample sizes for training and testing, which can challenge patient privacy. Federated learning offers an e… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  14. StructVizor: Interactive Profiling of Semi-Structured Textual Data

    Authors: Yanwei Huang, Yan Miao, Di Weng, Adam Perer, Yingcai Wu

    Abstract: Data profiling plays a critical role in understanding the structure of complex datasets and supporting numerous downstream tasks, such as social media analytics and financial fraud detection. While existing research predominantly focuses on structured data formats, a substantial portion of semi-structured textual data still requires ad-hoc and arduous manual profiling to extract and comprehend its… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: Accepted for CHI 2025

  15. arXiv:2503.04315  [pdf, other

    cs.LG cs.AI math.ST

    Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization

    Authors: Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, Xiao-Shan Gao

    Abstract: Wasserstein distributionally robust optimization (WDRO) optimizes against worst-case distributional shifts within a specified uncertainty set, leading to enhanced generalization on unseen adversarial examples, compared to standard adversarial training which focuses on pointwise adversarial perturbations. However, WDRO still suffers fundamentally from the robust overfitting problem, as it does not… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Journal ref: ICLR 2025

  16. arXiv:2503.04111  [pdf, other

    cs.LG cs.AI math.ST

    Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Ability

    Authors: Lijia Yu, Yibo Miao, Yifan Zhu, Xiao-Shan Gao, Lijun Zhang

    Abstract: The primary objective of learning methods is generalization. Classic uniform generalization bounds, which rely on VC-dimension or Rademacher complexity, fail to explain the significant attribute that over-parameterized models in deep learning exhibit nice generalizability. On the other hand, algorithm-dependent generalization bounds, like stability bounds, often rely on strict assumptions. To esta… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Journal ref: ICLR 2025

  17. arXiv:2503.02198  [pdf, other

    cs.RO

    Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints

    Authors: Yan Miao, Will Shen, Sayan Mitra

    Abstract: We present the first framework demonstrating zero-shot sim-to-real transfer of visual control policies learned in a Neural Radiance Field (NeRF) environment for quadrotors to fly through racing gates. Robust transfer from simulation to real flight poses a major challenge, as standard simulators often lack sufficient visual fidelity. To address this, we construct a photorealistic simulation environ… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  18. arXiv:2503.00697  [pdf, other

    cs.CV cs.AI eess.IV

    CREATE-FFPE: Cross-Resolution Compensated and Multi-Frequency Enhanced FS-to-FFPE Stain Transfer for Intraoperative IHC Images

    Authors: Yiyang Lin, Danling Jiang, Xinyu Liu, Yun Miao, Yixuan Yuan

    Abstract: In the immunohistochemical (IHC) analysis during surgery, frozen-section (FS) images are used to determine the benignity or malignancy of the tumor. However, FS image faces problems such as image contamination and poor nuclear detail, which may disturb the pathologist's diagnosis. In contrast, formalin-fixed and paraffin-embedded (FFPE) image has a higher staining quality, but it requires quite a… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  19. arXiv:2502.11550  [pdf, other

    cs.CR

    Trinity: A Scalable and Forward-Secure DSSE for Spatio-Temporal Range Query

    Authors: Zhijun Li, Kuizhi Liu, Minghui Xu, Xiangyu Wang, Yinbin Miao, Jianfeng Ma, Xiuzhen Cheng

    Abstract: Cloud-based outsourced Location-based services have profound impacts on various aspects of people's lives but bring security concerns. Existing spatio-temporal data secure retrieval schemes have significant shortcomings regarding dynamic updates, either compromising privacy through leakage during updates (forward insecurity) or incurring excessively high update costs that hinder practical applicat… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 14pages,6 figures

  20. arXiv:2502.08678  [pdf, other

    cs.CV eess.IV

    Multispectral Remote Sensing for Weed Detection in West Australian Agricultural Lands

    Authors: Haitian Wang, Muhammad Ibrahim, Yumeng Miao, D ustin Severtson, Atif Mansoor, Ajmal S. Mian

    Abstract: The Kondinin region in Western Australia faces significant agricultural challenges due to pervasive weed infestations, causing economic losses and ecological impacts. This study constructs a tailored multispectral remote sensing dataset and an end-to-end framework for weed detection to advance precision agriculture practices. Unmanned aerial vehicles were used to collect raw multispectral data fro… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 8 pages, 9 figures, 1 table, Accepted for oral presentation at IEEE 25th International Conference on Digital Image Computing: Techniques and Applications (DICTA 2024). Conference Proceeding: 979-8-3503-7903-7/24/\$31.00 (C) 2024 IEEE

    ACM Class: I.4.8; I.5.4

    Journal ref: Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2024, IEEE, ISBN: 979-8-3503-7903-7

  21. arXiv:2502.07487  [pdf, other

    cs.CL

    Multi-Agent Collaboration for Multilingual Code Instruction Tuning

    Authors: Jian Yang, Wei Zhang, Jiaxi Yang, Yibo Miao, Shanghaoran Quan, Zhenhe Wu, Qiyao Peng, Liqun Yang, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin

    Abstract: Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing methods mainly view each programming language in isolation and ignore the knowledge transfer among different programming languages. To bridge the gap among diff… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  22. arXiv:2501.19358  [pdf, other

    cs.LG

    The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

    Authors: Yuchun Miao, Sen Zhang, Liang Ding, Yuqi Zhang, Lefei Zhang, Dacheng Tao

    Abstract: This work identifies the Energy Loss Phenomenon in Reinforcement Learning from Human Feedback (RLHF) and its connection to reward hacking. Specifically, energy loss in the final layer of a Large Language Model (LLM) gradually increases during the RL process, with an excessive increase in energy loss characterizing reward hacking. Beyond empirical analysis, we further provide a theoretical foundati… ▽ More

    Submitted 4 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 28 pages, 21 figures

  23. arXiv:2501.12326  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    UI-TARS: Pioneering Automated GUI Interaction with Native Agents

    Authors: Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li , et al. (10 additional authors not shown)

    Abstract: This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks.… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  24. arXiv:2501.01930  [pdf, other

    cs.LG

    GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction

    Authors: Yuwei Miao, Yuzhi Guo, Hehuan Ma, Jingquan Yan, Feng Jiang, Rui Liao, Junzhou Huang

    Abstract: Exploring the functions of genes and gene products is crucial to a wide range of fields, including medical research, evolutionary biology, and environmental science. However, discovering new functions largely relies on expensive and exhaustive wet lab experiments. Existing methods of automatic function annotation or prediction mainly focus on protein function prediction with sequence, 3D-structure… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: Accept by AAAI-25

  25. arXiv:2501.01257  [pdf, other

    cs.CL

    CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

    Authors: Shanghaoran Quan, Jiaxi Yang, Bowen Yu, Bo Zheng, Dayiheng Liu, An Yang, Xuancheng Ren, Bofei Gao, Yibo Miao, Yunlong Feng, Zekun Wang, Jian Yang, Zeyu Cui, Yang Fan, Yichang Zhang, Binyuan Hui, Junyang Lin

    Abstract: With the increasing code reasoning capabilities of existing large language models (LLMs) and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to develop more challenging and comprehensive benchmarks that effectively test their sophisticated competition-level coding abilities. Existing benchmarks, like LiveCodeBench and USACO, fall short due to the unavailability of… ▽ More

    Submitted 3 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

  26. arXiv:2501.00756  [pdf, other

    cs.LG

    FasterSTS: A Faster Spatio-Temporal Synchronous Graph Convolutional Networks for Traffic flow Forecasting

    Authors: Ben-Ao Dai, Nengchao Lyu, Yongchao Miao

    Abstract: Accurate traffic flow prediction heavily relies on the spatio-temporal correlation of traffic flow data. Most current studies separately capture correlations in spatial and temporal dimensions, making it difficult to capture complex spatio-temporal heterogeneity, and often at the expense of increasing model complexity to improve prediction accuracy. Although there have been groundbreaking attempts… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 13pages,3 figures

  27. arXiv:2412.13571  [pdf, other

    cs.LG math.NA

    PowerMLP: An Efficient Version of KAN

    Authors: Ruichen Qiu, Yibo Miao, Shiwen Wang, Lijia Yu, Yifan Zhu, Xiao-Shan Gao

    Abstract: The Kolmogorov-Arnold Network (KAN) is a new network architecture known for its high accuracy in several tasks such as function fitting and PDE solving. The superior expressive capability of KAN arises from the Kolmogorov-Arnold representation theorem and learnable spline functions. However, the computation of spline functions involves multiple iterations, which renders KAN significantly slower th… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Journal ref: AAAI 2025

  28. arXiv:2412.12688  [pdf, other

    cs.DB

    UniEntrezDB: Large-scale Gene Ontology Annotation Dataset and Evaluation Benchmarks with Unified Entrez Gene Identifiers

    Authors: Yuwei Miao, Yuzhi Guo, Hehuan Ma, Jingquan Yan, Feng Jiang, Weizhi An, Jean Gao, Junzhou Huang

    Abstract: Gene studies are crucial for fields such as protein structure prediction, drug discovery, and cancer genomics, yet they face challenges in fully utilizing the vast and diverse information available. Gene studies require clean, factual datasets to ensure reliable results. Ontology graphs, neatly organized domain terminology graphs, provide ideal sources for domain facts. However, available gene ont… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  29. arXiv:2412.11990  [pdf, other

    cs.CL

    ExecRepoBench: Multi-level Executable Code Completion Evaluation

    Authors: Jian Yang, Jiajun Zhang, Jiaxi Yang, Ke Jin, Lei Zhang, Qiyao Peng, Ken Deng, Yibo Miao, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin

    Abstract: Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  30. arXiv:2412.09453  [pdf, other

    cs.CE cs.LG math.AP

    Finite-PINN: A Physics-Informed Neural Network Architecture for Solving Solid Mechanics Problems with General Geometries

    Authors: Haolin Li, Yuyang Miao, Zahra Sharif Khodaei, M. H. Aliabadi

    Abstract: PINN models have demonstrated impressive capabilities in addressing fluid PDE problems, and their potential in solid mechanics is beginning to emerge. This study identifies two key challenges when using PINN to solve general solid mechanics problems. These challenges become evident when comparing the limitations of PINN with the well-established numerical methods commonly used in solid mechanics,… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  31. arXiv:2412.05210  [pdf, other

    cs.CL

    Evaluating and Aligning CodeLLMs on Human Preference

    Authors: Jian Yang, Jiaxi Yang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Binyuan Hui, Junyang Lin

    Abstract: Code large language models (codeLLMs) have made significant strides in code generation. Most previous code-related benchmarks, which consist of various programming exercises along with the corresponding test cases, are used as a common measure to evaluate the performance and capabilities of code LLMs. However, the current code LLMs focus on synthesizing the correct code snippet, ignoring the align… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  32. arXiv:2412.04683  [pdf, other

    cs.AI

    From Principles to Practice: A Deep Dive into AI Ethics and Regulations

    Authors: Nan Sun, Yuantian Miao, Hao Jiang, Ming Ding, Jun Zhang

    Abstract: In the rapidly evolving domain of Artificial Intelligence (AI), the complex interaction between innovation and regulation has become an emerging focus of our society. Despite tremendous advancements in AI's capabilities to excel in specific tasks and contribute to diverse sectors, establishing a high degree of trust in AI-generated outputs and decisions necessitates meticulous caution and continuo… ▽ More

    Submitted 6 February, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Submitted to JAIR

  33. arXiv:2412.02252  [pdf, other

    cs.CL

    Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

    Authors: Da Ma, Lu Chen, Situo Zhang, Yuxun Miao, Su Zhu, Zhi Chen, Hongshen Xu, Hanqi Li, Shuai Fan, Lei Pan, Kai Yu

    Abstract: The increasing context window size in Large Language Models (LLMs), such as the GPT and LLaMA series, has improved their ability to tackle complex, long-text tasks, but at the cost of inference efficiency, particularly regarding memory and computational complexity. Existing methods, including selective token retention and window-based attention, improve efficiency but risk discarding important tok… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: preprint

  34. arXiv:2412.01398  [pdf, other

    cs.CV cs.RO

    Holistic Understanding of 3D Scenes as Universal Scene Description

    Authors: Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, Danda Pani Paudel

    Abstract: 3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. Providing a solution to these applications requires a multifaceted approach that covers scene-centric, object-centric, as well as interaction-centric capabilities. While there exist numerous datasets approaching the former two problems, the task… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  35. arXiv:2411.16027  [pdf, other

    cs.CV cs.AI

    From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events

    Authors: Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra

    Abstract: Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance. However, converting real-world driving videos into simulation scenarios is a significant challenge due to the complexity of interpreting high-dimensional video data and the time-consuming nature of precise manual scenario reconstruction. In this work, we propose a no… ▽ More

    Submitted 27 January, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

  36. arXiv:2411.11798  [pdf

    cs.IT cs.AI eess.SP

    COST CA20120 INTERACT Framework of Artificial Intelligence Based Channel Modeling

    Authors: Ruisi He, Nicola D. Cicco, Bo Ai, Mi Yang, Yang Miao, Mate Boban

    Abstract: Accurate channel models are the prerequisite for communication-theoretic investigations as well as system design. Channel modeling generally relies on statistical and deterministic approaches. However, there are still significant limits for the traditional modeling methods in terms of accuracy, generalization ability, and computational complexity. The fundamental reason is that establishing a quan… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    Comments: to appear in IEEE Wireless Communications Magazine

  37. arXiv:2411.08045  [pdf

    physics.pop-ph astro-ph.IM cs.CY cs.GR cs.HC

    Audience Reach of Scientific Data Visualizations in Planetarium-Screened Films

    Authors: Kalina Borkiewicz, Eric Jensen, Yiwen Miao, Stuart Levy, J. P. Naiman, Jeff Carpenter, Katherine E. Isaacs

    Abstract: Quantifying the global reach of planetarium dome shows presents significant challenges due to the lack of standardized viewership tracking mechanisms across diverse planetarium venues. We present an analysis of the global impact of dome shows, presenting data regarding four documentary films from a single visualization lab. Specifically, we designed and administered a viewership survey of four lon… ▽ More

    Submitted 30 October, 2024; originally announced November 2024.

  38. arXiv:2411.00372  [pdf, ps, other

    cs.LG cs.AI

    Generalizability of Memorization Neural Networks

    Authors: Lijia Yu, Xiao-Shan Gao, Lijun Zhang, Yibo Miao

    Abstract: The neural network memorization problem is to study the expressive power of neural networks to interpolate a finite dataset. Although memorization is widely believed to have a close relationship with the strong generalizability of deep learning when using over-parameterized models, to the best of our knowledge, there exists no theoretical study on the generalizability of memorization neural networ… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  39. arXiv:2410.18585  [pdf, other

    cs.AI cs.LG

    Aligning CodeLLMs with Direct Preference Optimization

    Authors: Yibo Miao, Bofei Gao, Shanghaoran Quan, Junyang Lin, Daoguang Zan, Jiaheng Liu, Jian Yang, Tianyu Liu, Zhijie Deng

    Abstract: The last year has witnessed the rapid progress of large language models (LLMs) across diverse domains. Among them, CodeLLMs have garnered particular attention because they can not only assist in completing various programming tasks but also represent the decision-making and logical reasoning capabilities of LLMs. However, current CodeLLMs mainly focus on pre-training and supervised fine-tuning sce… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  40. arXiv:2410.10190  [pdf, other

    cs.LG cs.AI

    Predicting from Strings: Language Model Embeddings for Bayesian Optimization

    Authors: Tung Nguyen, Qiuyi Zhang, Bangding Yang, Chansoo Lee, Jorg Bornschein, Yingjie Miao, Sagi Perel, Yutian Chen, Xingyou Song

    Abstract: Bayesian Optimization is ubiquitous in the field of experimental design and blackbox optimization for improving search efficiency, but has been traditionally restricted to regression models which are only applicable to fixed search spaces and tabular input features. We propose Embed-then-Regress, a paradigm for applying in-context regression over string inputs, through the use of string embedding… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

  41. arXiv:2410.07985  [pdf, other

    cs.CL

    Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

    Authors: Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang

    Abstract: Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8\% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benc… ▽ More

    Submitted 23 December, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: 30 pages

  42. arXiv:2410.04366  [pdf, other

    eess.SP cs.AI cs.HC

    RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals

    Authors: Yuyang Miao, Zehua Chen, Chang Li, Danilo Mandic

    Abstract: Respiratory rate (RR) is a critical health indicator often monitored under inconvenient scenarios, limiting its practicality for continuous monitoring. Photoplethysmography (PPG) sensors, increasingly integrated into wearable devices, offer a chance to continuously estimate RR in a portable manner. In this paper, we propose RespDiff, an end-to-end multi-scale RNN diffusion model for respiratory wa… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  43. arXiv:2409.20291  [pdf, other

    cs.RO

    RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

    Authors: Yuxuan Wu, Lei Pan, Wenhua Wu, Guangming Wang, Yanzi Miao, Fan Xu, Hesheng Wang

    Abstract: Sim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, with the emergence of radiance field reconstruction methods, especially… ▽ More

    Submitted 22 February, 2025; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: 7 pages, 5 figures, 4 tables. Accepted by ICRA2025

  44. arXiv:2409.12186  [pdf, other

    cs.CL

    Qwen2.5-Coder Technical Report

    Authors: Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin

    Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes six models: Qwen2.5-Coder-(0.5B/1.5B/3B/7B/14B/32B). As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens. Through meticulous data cleaning, scalable synthetic data genera… ▽ More

    Submitted 12 November, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  45. arXiv:2409.03878  [pdf, other

    cs.CV eess.SP physics.geo-ph

    Ground-roll Separation From Land Seismic Records Based on Convolutional Neural Network

    Authors: Zhuang Jia, Wenkai Lu, Meng Zhang, Yongkang Miao

    Abstract: Ground-roll wave is a common coherent noise in land field seismic data. This Rayleigh-type surface wave usually has low frequency, low apparent velocity, and high amplitude, therefore obscures the reflection events of seismic shot gathers. Commonly used techniques focus on the differences of ground-roll and reflection in transformed domain such as $f-k$ domain, wavelet domain, or curvelet domain.… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  46. arXiv:2409.03393  [pdf, other

    cs.NI

    VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

    Authors: Yongyi Miao, Zhongdang Li, Yang Wang, Die Hu, Jun Yan, Youfang Wang

    Abstract: In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which i… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  47. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 31 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 6 figures

  48. ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model

    Authors: Dawei Wang, Geng Zhou, Li Chen, Dan Li, Yukai Miao

    Abstract: Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resultin… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Preprint

  49. arXiv:2407.12164  [pdf, other

    cs.CV cs.AI cs.LG

    Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

    Authors: Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li

    Abstract: Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant p… ▽ More

    Submitted 22 December, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024

  50. arXiv:2407.11011  [pdf, other

    cs.CR cs.CV cs.LG

    Toward Availability Attacks in 3D Point Clouds

    Authors: Yifan Zhu, Yibo Miao, Yinpeng Dong, Xiao-Shan Gao

    Abstract: Despite the great progress of 3D vision, data privacy and security issues in 3D deep learning are not explored systematically. In the domain of 2D images, many availability attacks have been proposed to prevent data from being illicitly learned by unauthorized deep models. However, unlike images represented on a fixed dimensional grid, point clouds are characterized as unordered and unstructured s… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: ICML 2024, 21 pages