Skip to main content

Showing 1–50 of 1,184 results for author: Feng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10003  [pdf, ps, other

    cs.LG eess.SP

    AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

    Authors: Tianyu Jiao, Zhuoran Xiao, Yihang Huang, Chenhui Ye, Yijia Feng, Liyu Cai, Jiang Chang, Fangkun Liu, Yin Xu, Dazhi He, Yunfeng Guan, Wenjun Zhang

    Abstract: Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09936  [pdf

    cs.HC cs.GR cs.MA cs.MM

    CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation

    Authors: Chenglong Wang, Yuhao Kang, Zhaoya Gong, Pengjun Zhao, Yu Feng, Wenjia Zhang, Ge Li

    Abstract: The rapid development of generative artificial intelligence (GenAI) presents new opportunities to advance the cartographic process. Previous studies have either overlooked the artistic aspects of maps or faced challenges in creating both accurate and informative maps. In this study, we propose CartoAgent, a novel multi-agent cartographic framework powered by multimodal large language models (MLLMs… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 57 pages, 17 figures

  3. arXiv:2505.09450  [pdf, ps, other

    cs.CV

    MrTrack: Register Mamba for Needle Tracking with Rapid Reciprocating Motion during Ultrasound-Guided Aspiration Biopsy

    Authors: Yuelin Zhang, Qingpeng Ding, Long Lei, Yongxuan Feng, Raymond Shing-Yan Tang, Shing Shin Cheng

    Abstract: Ultrasound-guided fine needle aspiration (FNA) biopsy is a common minimally invasive diagnostic procedure. However, an aspiration needle tracker addressing rapid reciprocating motion is still missing. MrTrack, an aspiration needle tracker with a mamba-based register mechanism, is proposed. MrTrack leverages a Mamba-based register extractor to sequentially distill global context from each historica… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Early Accepted by MICCAI 2025

  4. arXiv:2505.08957  [pdf, ps, other

    cs.CG cs.DS

    Even Faster Algorithm for the Chamfer Distance

    Authors: Ying Feng, Piotr Indyk

    Abstract: For two d-dimensional point sets A, B of size up to n, the Chamfer distance from A to B is defined as CH(A,B) = \sum_{a \in A} \min_{b \in B} \|a-b\|. The Chamfer distance is a widely used measure for quantifying dissimilarity between sets of points, used in many machine learning and computer vision applications. A recent work of Bakshi et al, NeuriPS'23, gave the first near-linear time (1+eps)-ap… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: ICALP 2025

  5. arXiv:2505.08600  [pdf, other

    cs.CL

    Automatic Task Detection and Heterogeneous LLM Speculative Decoding

    Authors: Danying Ge, Jianhua Gao, Qizhi Jiang, Yifei Feng, Weixing Ji

    Abstract: Speculative decoding, which combines a draft model with a target model, has emerged as an effective approach to accelerate large language model (LLM) inference. However, existing methods often face a trade-off between the acceptance rate and decoding speed in downstream tasks due to the limited capacity of the draft model, making it difficult to ensure efficiency across diverse tasks. To address t… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 10 pages, 10 figures, 2 tables

    ACM Class: I.2.7

  6. arXiv:2505.06166  [pdf, other

    cs.CV

    DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

    Authors: Radu Alexandru Rosu, Keyu Wu, Yao Feng, Youyi Zheng, Michael J. Black

    Abstract: We address the task of generating 3D hair geometry from a single image, which is challenging due to the diversity of hairstyles and the lack of paired image-to-3D hair data. Previous methods are primarily trained on synthetic data and cope with the limited amount of such data by using low-dimensional intermediate representations, such as guide strands and scalp-level embeddings, that require post-… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025

  7. arXiv:2505.04918  [pdf, other

    cs.LG cs.AI

    Physics-Assisted and Topology-Informed Deep Learning for Weather Prediction

    Authors: Jiaqi Zheng, Qing Ling, Yerong Feng

    Abstract: Although deep learning models have demonstrated remarkable potential in weather prediction, most of them overlook either the \textbf{physics} of the underlying weather evolution or the \textbf{topology} of the Earth's surface. In light of these disadvantages, we develop PASSAT, a novel Physics-ASSisted And Topology-informed deep learning model for weather prediction. PASSAT attributes the weather… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: International Joint Conferences on Artificial Intelligence (IJCAI 2025)

  8. arXiv:2505.04270  [pdf, ps, other

    cs.CV cs.AI

    Object-Shot Enhanced Grounding Network for Egocentric Video

    Authors: Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie

    Abstract: Egocentric video grounding is a crucial task for embodied intelligence applications, distinct from exocentric video moment localization. Existing methods primarily focus on the distributional differences between egocentric and exocentric videos but often neglect key characteristics of egocentric videos and the fine-grained information emphasized by question-type queries. To address these limitatio… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  9. arXiv:2505.03850  [pdf, other

    cs.CR cs.AI

    Impact Analysis of Inference Time Attack of Perception Sensors on Autonomous Vehicles

    Authors: Hanlin Chen, Simin Chen, Wenyu Li, Wei Yang, Yiheng Feng

    Abstract: As a safety-critical cyber-physical system, cybersecurity and related safety issues for Autonomous Vehicles (AVs) have been important research topics for a while. Among all the modules on AVs, perception is one of the most accessible attack surfaces, as drivers and AVs have no control over the outside environment. Most current work targeting perception security for AVs focuses on perception correc… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted and presented in TRBAM 2024

  10. arXiv:2505.03075  [pdf, other

    cs.IR

    Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models

    Authors: Zhengliang Shi, Lingyong Yan, Weiwei Sun, Yue Feng, Pengjie Ren, Xinyu Ma, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Retrieval-augmented generation (RAG) integrates large language models ( LLM s) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLM s or trains the LLMs to use documents retrieved by off-the-shelf retrievers, lacking end-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  11. arXiv:2505.02972  [pdf, ps, other

    stat.ML cs.LG stat.ME

    GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds

    Authors: Aoran Chen, Yang Feng

    Abstract: Multi-Task Learning (MTL) seeks to boost statistical power and learning efficiency by discovering structure shared across related tasks. State-of-the-art MTL representation methods, however, usually treat the latent representation matrix as a point in ordinary Euclidean space, ignoring its often non-Euclidean geometry, thus sacrificing robustness when tasks are heterogeneous or even adversarial. W… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  12. arXiv:2505.02625  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

    Authors: Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng

    Abstract: Real-time, intelligent, and natural speech interaction is an essential part of the next-generation human-computer interaction. Recent advancements have showcased the potential of building intelligent spoken chatbots based on large language models (LLMs). In this paper, we introduce LLaMA-Omni 2, a series of speech language models (SpeechLMs) ranging from 0.5B to 14B parameters, capable of achievin… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Preprint. Project: https://github.com/ictnlp/LLaMA-Omni2

  13. arXiv:2505.02385  [pdf, other

    eess.IV cs.CV

    An Arbitrary-Modal Fusion Network for Volumetric Cranial Nerves Tract Segmentation

    Authors: Lei Xie, Huajun Zhou, Junxiong Huang, Jiahao Huang, Qingrun Zeng, Jianzhong He, Jiawei Zhang, Baohua Fan, Mingchu Li, Guoqiang Xie, Hao Chen, Yuanjing Feng

    Abstract: The segmentation of cranial nerves (CNs) tract provides a valuable quantitative tool for the analysis of the morphology and trajectory of individual CNs. Multimodal CNs tract segmentation networks, e.g., CNTSeg, which combine structural Magnetic Resonance Imaging (MRI) and diffusion MRI, have achieved promising segmentation performance. However, it is laborious or even infeasible to collect comple… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  14. arXiv:2505.02214  [pdf, other

    cs.LG

    An Empirical Study of Qwen3 Quantization

    Authors: Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, Xianglong Liu

    Abstract: The Qwen series has emerged as a leading family of open-source Large Language Models (LLMs), demonstrating remarkable capabilities in natural language understanding tasks. With the recent release of Qwen3, which exhibits superior performance across diverse benchmarks, there is growing interest in deploying these models efficiently in resource-constrained environments. Low-bit quantization presents… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  15. arXiv:2504.21263  [pdf, other

    cs.CV cs.LG cs.MM

    Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

    Authors: Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia

    Abstract: Visual In-Context Learning (VICL) enables adaptively solving vision tasks by leveraging pixel demonstrations, mimicking human-like task completion through analogy. Prompt selection is critical in VICL, but current methods assume the existence of a single "ideal" prompt in a pool of candidates, which in practice may not hold true. Multiple suitable prompts may exist, but individually they often fal… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR'25. 10 pages, 5 figures, 6 tables

  16. arXiv:2504.19295  [pdf, other

    cs.CV

    FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement

    Authors: Kangbiao Shi, Yixu Feng, Tao Hu, Yu Cao, Peng Wu, Yijin Liang, Yanning Zhang, Qingsen Yan

    Abstract: The advent of Deep Neural Networks (DNNs) has driven remarkable progress in low-light image enhancement (LLIE), with diverse architectures (e.g., CNNs and Transformers) and color spaces (e.g., sRGB, HSV, HVI) yielding impressive results. Recent efforts have sought to leverage the complementary strengths of these paradigms, offering promising solutions to enhance performance across varying degradat… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  17. arXiv:2504.19144  [pdf, other

    cs.AI cs.AR cs.SE

    ChiseLLM: Unleashing the Power of Reasoning LLMs for Chisel Agile Hardware Development

    Authors: Bowei Wang, Jiaran Gao, Yelai Feng, Renzhi Chen, Shanshan Li, Lei Wang

    Abstract: The growing demand for Domain-Specific Architecture (DSA) has driven the development of Agile Hardware Development Methodology (AHDM). Hardware Construction Language (HCL) like Chisel offers high-level abstraction features, making it an ideal language for HCL-Based AHDM. While Large Language Models (LLMs) excel in code generation tasks, they still face challenges with Chisel generation, particular… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  18. arXiv:2504.18765  [pdf, other

    cs.AI

    A Vision for Auto Research with LLM Agents

    Authors: Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lvye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, Xiang Li, Xinfeng Li, Yang Liu, Yebo Feng, Yihao Huang, Yijia Xu, Yuqiang Sun, Zhenhong Zhou, Zhengzi Xu

    Abstract: This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writi… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  19. arXiv:2504.18260  [pdf, other

    cs.CL

    MAGI: Multi-Agent Guided Interview for Psychiatric Assessment

    Authors: Guanqun Bi, Zhuang Chen, Zhoufu Liu, Hongkai Wang, Xiyao Xiao, Yuqiang Xie, Wen Zhang, Yongkang Huang, Yuxuan Chen, Libiao Peng, Yi Feng, Minlie Huang

    Abstract: Automating structured clinical interviews could revolutionize mental healthcare accessibility, yet existing large language models (LLMs) approaches fail to align with psychiatric diagnostic protocols. We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational workflows through coordinated multi-agent coll… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: In progress

  20. arXiv:2504.17200  [pdf, other

    cs.CL

    A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation

    Authors: Yangxinyu Xie, Bowen Jiang, Tanwi Mallick, Joshua David Bergerson, John K. Hutchison, Duane R. Verner, Jordan Branham, M. Ross Alexander, Robert B. Ross, Yan Feng, Leslie-Anne Levy, Weijie Su, Camillo J. Taylor

    Abstract: Large language models (LLMs) are a transformational capability at the frontier of artificial intelligence and machine learning that can support decision-makers in addressing pressing societal challenges such as extreme natural hazard events. As generalized models, LLMs often struggle to provide context-specific information, particularly in areas requiring specialized knowledge. In this work we pro… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  21. arXiv:2504.15780  [pdf, other

    cs.AI cs.CL

    TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

    Authors: Daocheng Fu, Zijun Chen, Renqiu Xia, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Junchi Yan, Botian Shi, Bo Zhang, Yu Qiao

    Abstract: Mathematical geometric problem solving (GPS) often requires effective integration of multimodal information and verifiable logical coherence. Despite the fast development of large language models in general problem solving, it remains unresolved regarding with both methodology and benchmarks, especially given the fact that exiting synthetic GPS benchmarks are often not self-verified and contain no… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  22. arXiv:2504.15313  [pdf, other

    cs.AI cs.LG

    PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind

    Authors: Yajie Yu, Yue Feng

    Abstract: Multi-agents has exhibited significant intelligence in real-word simulations with Large language models (LLMs) due to the capabilities of social cognition and knowledge retrieval. However, existing research on agents equipped with effective cognition chains including reasoning, planning, decision-making and reflecting remains limited, especially in the dynamically interactive scenarios. In additio… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  23. arXiv:2504.14868  [pdf, ps, other

    cs.CV

    Twin Co-Adaptive Dialogue for Progressive Image Generation

    Authors: Jianhui Wang, Yangfan He, Yan Zhong, Xinyuan Song, Jiayi Su, Yuheng Feng, Hongyang He, Wenyu Zhu, Xinhang Yuan, Kuan Lu, Menghao Huo, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang

    Abstract: Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, i… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  24. arXiv:2504.14837  [pdf, other

    cs.DB

    SQL-Factory: A Multi-Agent Framework for High-Quality and Large-Scale SQL Generation

    Authors: Jiahui Li, Tongwang Wu, Yuren Mao, Yunjun Gao, Yajie Feng, Huaizhong Liu

    Abstract: High quality SQL corpus is essential for intelligent database. For example, Text-to-SQL requires SQL queries and correspond natural language questions as training samples. However, collecting such query corpus remains challenging in practice due to the high cost of manual annotation, which highlights the importance of automatic SQL generation. Despite recent advances, existing generation methods s… ▽ More

    Submitted 1 May, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  25. arXiv:2504.14788  [pdf, ps, other

    cs.IR

    The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

    Authors: Junchen Fu, Xuri Ge, Xin Xin, Haitao Yu, Yue Feng, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

    Abstract: Multimodal representation learning has garnered significant attention in the AI community, largely due to the success of large pre-trained multimodal foundation models like LLaMA, GPT, Mistral, and CLIP. These models have achieved remarkable performance across various tasks of multimodal information retrieval (MIR), including web search, cross-modal retrieval, and recommender systems, etc. However… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: WWW2025 Workshop Summary

  26. arXiv:2504.14500  [pdf, other

    cs.SE

    PinChecker: Identifying Unsound Safe Abstractions of Rust Pinning APIs

    Authors: Yuxuan Dai, Yang Feng

    Abstract: The pinning APIs of Rust language guarantee memory location stability for self-referential and asynchronous constructs, as long as used according to the pinning API contract. Rust ensures violations of such contract are impossible in regular safe code, but not in unsafe code where unsafe pinning APIs can be used. Library authors can encapsulate arbitrary unsafe code within regular library function… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    ACM Class: D.2.5

  27. arXiv:2504.14158  [pdf, ps, other

    cs.LO quant-ph

    Refinement orders for quantum programs

    Authors: Yuan Feng, Li Zhou

    Abstract: Refinement is an influential technique used in the verification and development of computer programs. It provides a systematic and rigorous approach to software development through stepwise refinement, where a high-level abstract specification is progressively transformed into an implementation that meets the desired requirements. Central to this technique is the notion of a refinement order, whic… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    ACM Class: F.3.1

  28. arXiv:2504.13871  [pdf

    cs.HC cs.AI econ.GN

    Human aversion? Do AI Agents Judge Identity More Harshly Than Performance

    Authors: Yuanjun Feng, Vivek Chodhary, Yash Raj Shrestha

    Abstract: This study examines the understudied role of algorithmic evaluation of human judgment in hybrid decision-making systems, a critical gap in management research. While extant literature focuses on human reluctance to follow algorithmic advice, we reverse the perspective by investigating how AI agents based on large language models (LLMs) assess and integrate human input. Our work addresses a pressin… ▽ More

    Submitted 30 March, 2025; originally announced April 2025.

  29. arXiv:2504.13054  [pdf, other

    cs.CL cs.AI

    Aspect-Based Summarization with Self-Aspect Retrieval Enhanced Generation

    Authors: Yichao Feng, Shuai Zhao, Yueqiu Li, Luwei Xiao, Xiaobao Wu, Anh Tuan Luu

    Abstract: Aspect-based summarization aims to generate summaries tailored to specific aspects, addressing the resource constraints and limited generalizability of traditional summarization approaches. Recently, large language models have shown promise in this task without the need for training. However, they rely excessively on prompt engineering and face token limits and hallucination challenges, especially… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  30. arXiv:2504.12663  [pdf, other

    cs.CL cs.AI

    Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment

    Authors: Xiaotian Zhang, Ruizhe Chen, Yang Feng, Zuozhu Liu

    Abstract: Aligning language models with human preferences presents significant challenges, particularly in achieving personalization without incurring excessive computational costs. Existing methods rely on reward signals and additional annotated data, limiting their scalability and adaptability to diverse human values. To address these challenges, we introduce Persona-judge, a novel discriminative paradigm… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  31. arXiv:2504.11999  [pdf, other

    cs.CV

    A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning

    Authors: Mengyu Wang, Hanbo Bi, Yingchao Feng, Linlin Xin, Shuo Gong, Tianqi Wang, Zhiyuan Yan, Peijin Wang, Wenhui Diao, Xian Sun

    Abstract: Vision foundation models in remote sensing have been extensively studied due to their superior generalization on various downstream tasks. Synthetic Aperture Radar (SAR) offers all-day, all-weather imaging capabilities, providing significant advantages for Earth observation. However, establishing a foundation model for SAR image interpretation inevitably encounters the challenges of insufficient i… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  32. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  33. arXiv:2504.10210  [pdf, other

    cs.AI

    Can Competition Enhance the Proficiency of Agents Powered by Large Language Models in the Realm of News-driven Time Series Forecasting?

    Authors: Yuxuan Zhang, Yangyang Feng, Daifeng Li, Kexin Zhang, Junlan Chen, Bowen Deng

    Abstract: Multi-agents-based news-driven time series forecasting is considered as a potential paradigm shift in the era of large language models (LLMs). The challenge of this task lies in measuring the influences of different news events towards the fluctuations of time series. This requires agents to possess stronger abilities of innovative thinking and the identifying misleading logic. However, the existi… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  34. arXiv:2504.09841  [pdf, other

    cs.CR cs.AI

    StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models

    Authors: Yang Feng, Xudong Pan

    Abstract: The proliferation of autonomous agents powered by large language models (LLMs) has revolutionized popular business applications dealing with tabular data, i.e., tabular agents. Although LLMs are observed to be vulnerable against prompt injection attacks from external data sources, tabular agents impose strict data formats and predefined rules on the attacker's payload, which are ineffective unless… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Work in Progress

  35. arXiv:2504.09669  [pdf, other

    cs.GT cs.DS

    Nash Social Welfare with Submodular Valuations: Approximation Algorithms and Integrality Gaps

    Authors: Xiaohui Bei, Yuda Feng, Yang Hu, Shi Li, Ruilong Zhang

    Abstract: We study the problem of allocating items to agents such that the (un)weighted Nash social welfare (NSW) is maximized under submodular valuations. The best-known results for unweighted and weighted problems are the $(4+ε)$ approximation given by Garg, Husic, Li, Vega, and Vondrak~\cite{stoc/GargHLVV23} and the $(233+ε)$ approximation given by Feng, Hu, Li, and Zhang~\cite{stoc/FHLZ25}, respectively… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  36. arXiv:2504.09480  [pdf, other

    cs.CV cs.AI

    Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation

    Authors: Yongchao Feng, Yajie Liu, Shuai Yang, Wenrui Cai, Jinqing Zhang, Qiqi Zhan, Ziyue Huang, Hongxi Yan, Qiao Wan, Chenguang Liu, Junzhe Wang, Jiahui Lv, Ziqi Liu, Tengyuan Shi, Qingjie Liu, Yunhong Wang

    Abstract: Vision-Language Model (VLM) have gained widespread adoption in Open-Vocabulary (OV) object detection and segmentation tasks. Despite they have shown promise on OV-related tasks, their effectiveness in conventional vision tasks has thus far been unevaluated. In this work, we present the systematic review of VLM-based detection and segmentation, view VLM as the foundational model and conduct compreh… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: A Review and Evaluation about Vision-Language Model for Object Detection and Segmentation

  37. arXiv:2504.09211  [pdf, ps, other

    cs.LG eess.SP

    Accurate Diagnosis of Respiratory Viruses Using an Explainable Machine Learning with Mid-Infrared Biomolecular Fingerprinting of Nasopharyngeal Secretions

    Authors: Wenwen Zhang, Zhouzhuo Tang, Yingmei Feng, Xia Yu, Qi Jie Wang, Zhiping Lin

    Abstract: Accurate identification of respiratory viruses (RVs) is critical for outbreak control and public health. This study presents a diagnostic system that combines Attenuated Total Reflectance Fourier Transform Infrared Spectroscopy (ATR-FTIR) from nasopharyngeal secretions with an explainable Rotary Position Embedding-Sparse Attention Transformer (RoPE-SAT) model to accurately identify multiple RVs wi… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  38. arXiv:2504.08758  [pdf, other

    cs.IR cs.AI

    Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation

    Authors: Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, Yue Gao

    Abstract: Large language models (LLMs) have transformed various sectors, including education, finance, and medicine, by enhancing content generation and decision-making processes. However, their integration into the medical field is cautious due to hallucinations, instances where generated content deviates from factual accuracy, potentially leading to adverse outcomes. To address this, we introduce Hyper-RA… ▽ More

    Submitted 30 March, 2025; originally announced April 2025.

  39. arXiv:2504.08543  [pdf, other

    cs.CL

    UoB-NLP at SemEval-2025 Task 11: Leveraging Adapters for Multilingual and Cross-Lingual Emotion Detection

    Authors: Frances Laureano De Leon, Yixiao Wang, Yue Feng, Mark G. Lee

    Abstract: Emotion detection in natural language processing is a challenging task due to the complexity of human emotions and linguistic diversity. While significant progress has been made in high-resource languages, emotion detection in low-resource languages remains underexplored. In this work, we address multilingual and cross-lingual emotion detection by leveraging adapter-based fine-tuning with multilin… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted to appear in Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

  40. arXiv:2504.07589  [pdf, other

    cs.SE

    Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts

    Authors: Zexu Wang, Jiachi Chen, Tao Zhang, Yu Zhang, Weizhe Zhang, Yuming Feng, Zibin Zheng

    Abstract: As the development of Solidity contracts on Ethereum, more developers are reusing them on other compatible blockchains. However, developers may overlook the differences between the designs of the blockchain system, such as the Gas Mechanism and Consensus Protocol, leading to the same contracts on different blockchains not being able to achieve consistent execution as on Ethereum. This inconsistenc… ▽ More

    Submitted 11 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by ISSTA2025

  41. arXiv:2504.05782  [pdf, other

    cs.CV cs.AI

    MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

    Authors: Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang

    Abstract: Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited da… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 11 pages, 8 figures

  42. arXiv:2504.05590  [pdf, other

    cs.CV

    CoA: Towards Real Image Dehazing via Compression-and-Adaptation

    Authors: Long Ma, Yuxin Feng, Yan Zhang, Jinyuan Liu, Weimin Wang, Guang-Yong Chen, Chengpei Xu, Zhuo Su

    Abstract: Learning-based image dehazing algorithms have shown remarkable success in synthetic domains. However, real image dehazing is still in suspense due to computational resource constraints and the diversity of real-world scenes. Therefore, there is an urgent need for an algorithm that excels in both efficiency and adaptability to address real image dehazing effectively. This work proposes a Compressio… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  43. arXiv:2504.04535  [pdf, other

    cs.CV cs.AI

    SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision

    Authors: Weikai Lin, Tianrui Ma, Adith Boloor, Yu Feng, Ruofan Xing, Xuan Zhang, Yuhao Zhu

    Abstract: Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications where the sensor node has weak compute capabilities and must transmit data to a remote server/cloud for processing. To reduce the edge energy consumption, this paper proposes a sensor-algorithm co-designed system called SnapPix, which compresses raw pixels in the analog domain inside the sensor. We u… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: 7 pages, Accepted to Design Automation Conference (DAC), 2025

    ACM Class: I.2

  44. arXiv:2504.04178  [pdf, other

    cs.IR

    MSL: Not All Tokens Are What You Need for Tuning LLM as a Recommender

    Authors: Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: Large language models (LLMs), known for their comprehension capabilities and extensive knowledge, have been increasingly applied to recommendation systems (RS). Given the fundamental gap between the mechanism of LLMs and the requirement of RS, researchers have focused on fine-tuning LLMs with recommendation-specific data to enhance their performance. Language Modeling Loss (LML), originally design… ▽ More

    Submitted 30 April, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

    Comments: Accepted by SIGIR2025

  45. arXiv:2504.04159  [pdf

    cs.LG

    Vehicle Acceleration Prediction Considering Environmental Influence and Individual Driving Behavior

    Authors: Wenxuan Wang, Lexing Zhang, Jiale Lei, Yin Feng, Hengxu Hu

    Abstract: Accurate vehicle acceleration prediction is critical for intelligent driving control and energy efficiency management, particularly in environments with complex driving behavior dynamics. This paper proposes a general short-term vehicle acceleration prediction framework that jointly models environmental influence and individual driving behavior. The framework adopts a dual input design by incorpor… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  46. arXiv:2504.04141  [pdf, other

    cs.CL

    Cognitive Debiasing Large Language Models for Decision-Making

    Authors: Yougang Lyu, Shijie Ren, Yue Feng, Zihan Wang, Zhumin Chen, Zhaochun Ren, Maarten de Rijke

    Abstract: Large language models (LLMs) have shown potential in supporting decision-making applications, particularly as personal conversational assistants in the financial, healthcare, and legal domains. While prompt engineering strategies have enhanced the capabilities of LLMs in decision-making, cognitive biases inherent to LLMs present significant challenges. Cognitive biases are systematic patterns of d… ▽ More

    Submitted 10 April, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

  47. arXiv:2504.03211  [pdf, ps, other

    cs.LG cs.AI cs.GT econ.TH

    Persuasive Calibration

    Authors: Yiding Feng, Wei Tang

    Abstract: We introduce and study the persuasive calibration problem, where a principal aims to provide trustworthy predictions about underlying events to a downstream agent to make desired decisions. We adopt the standard calibration framework that regulates predictions to be unbiased conditional on their own value, and thus, they can reliably be interpreted at the face value by the agent. Allowing a small… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  48. arXiv:2504.03166  [pdf, other

    cs.CV

    RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation

    Authors: Hanbo Bi, Yingchao Feng, Boyuan Tong, Mengyu Wang, Haichen Yu, Yongqiang Mao, Hao Chang, Wenhui Diao, Peijin Wang, Yue Yu, Hanyang Peng, Yehong Zhang, Kun Fu, Xian Sun

    Abstract: The rapid advancement of foundation models has revolutionized visual representation learning in a self-supervised manner. However, their application in remote sensing (RS) remains constrained by a fundamental gap: existing models predominantly handle single or limited modalities, overlooking the inherently multi-modal nature of RS observations. Optical, synthetic aperture radar (SAR), and multi-sp… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  49. arXiv:2504.02059  [pdf, other

    cs.DB

    Towards Operationalizing Heterogeneous Data Discovery

    Authors: Jin Wang, Yanlin Feng, Chen Shen, Sajjadur Rahman, Eser Kandogan

    Abstract: Querying and exploring massive collections of data sources, such as data lakes, has been an essential research topic in the database community. Although many efforts have been paid in the field of data discovery and data integration in data lakes, they mainly focused on the scenario where the data lake consists of structured tables. However, real-world enterprise data lakes are always more complic… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  50. arXiv:2504.01460  [pdf, other

    cs.DB

    K2: On Optimizing Distributed Transactions in a Multi-region Data Store with TrueTime Clocks (Extended Version)

    Authors: Haoze Song, Yongqi Wang, Xusheng Chen, Hao Feng, Yazhi Feng, Xieyun Fang, Heming Cui, Linghe Kong

    Abstract: TrueTime clocks (TTCs) that offer accurate and reliable time within limited uncertainty bounds have been increasingly implemented in many clouds. Multi-region data stores that seek decentralized synchronization for high performance represent an ideal application of TTC. However, the co-designs between the two were often undervalued or failed to realize their full potential. This paper proposes K… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: To appear in PVLDB'25