Skip to main content

Showing 1–50 of 755 results for author: Wu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10300  [pdf, ps, other

    cs.HC cs.AI

    AI LEGO: Scaffolding Cross-Functional Collaboration in Industrial Responsible AI Practices during Early Design Stages

    Authors: Muzhe Wu, Yanzhi Zhao, Shuyi Han, Michael Xieyang Liu, Hong Shen

    Abstract: Responsible AI (RAI) efforts increasingly emphasize the importance of addressing potential harms early in the AI development lifecycle through social-technical lenses. However, in cross-functional industry teams, this work is often stalled by a persistent knowledge handoff challenge: the difficulty of transferring high-level, early-stage technical design rationales from technical experts to non-te… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10281  [pdf, other

    cs.CV

    MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting

    Authors: Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo

    Abstract: Deep learning approaches for marine fog detection and forecasting have outperformed traditional methods, demonstrating significant scientific and practical importance. However, the limited availability of open-source datasets remains a major challenge. Existing datasets, often focused on a single region or satellite, restrict the ability to evaluate model performance across diverse conditions and… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.07868  [pdf, ps, other

    cs.RO

    VISTA: Generative Visual Imagination for Vision-and-Language Navigation

    Authors: Yanjia Huang, Mingyang Wu, Renjie Li, Zhengzhong Tu

    Abstract: Vision-and-Language Navigation (VLN) tasks agents with locating specific objects in unseen environments using natural language instructions and visual cues. Many existing VLN approaches typically follow an 'observe-and-reason' schema, that is, agents observe the environment and decide on the next action to take based on the visual observations of their surroundings. They often face challenges in l… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 13 pages, 5 figures, CoRL 2025

  4. arXiv:2505.06991  [pdf, ps, other

    cs.CV

    Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding

    Authors: Chih-Chung Hsu, I-Hsuan Wu, Wen-Hai Tseng, Ching-Heng Cheng, Ming-Hsuan Wu, Jin-Hui Jiang, Yu-Jou Hsiao

    Abstract: This report presents our semantic segmentation framework developed by team ACVLAB for the ICRA 2025 GOOSE 2D Semantic Segmentation Challenge, which focuses on parsing outdoor scenes into nine semantic categories under real-world conditions. Our method integrates a Swin Transformer backbone enhanced with Rotary Position Embedding (RoPE) for improved spatial generalization, alongside a Color Shift E… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  5. arXiv:2505.06892  [pdf, other

    cs.LG

    Learning Soft Sparse Shapes for Efficient Time-Series Classification

    Authors: Zhen Liu, Yicheng Luo, Boyuan Li, Emadeldeen Eldele, Min Wu, Qianli Ma

    Abstract: Shapelets are discriminative subsequences (or shapes) with high interpretability in time series classification. Due to the time-intensive nature of shapelet discovery, existing shapelet-based methods mainly focus on selecting discriminative shapes while discarding others to achieve candidate subsequence sparsification. However, this approach may exclude beneficial shapes and overlook the varying c… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted in ICML 2025

  6. arXiv:2505.03336  [pdf, other

    cs.IR cs.AI

    Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs

    Authors: Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie

    Abstract: Large Language Models (LLMs) have shown promise for generative recommender systems due to their transformative capabilities in user interaction. However, ensuring they do not recommend out-of-domain (OOD) items remains a challenge. We study two distinct methods to address this issue: RecLM-ret, a retrieval-based method, and RecLM-cgen, a constrained generation method. Both methods integrate seamle… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 13 pages

  7. arXiv:2505.03153  [pdf, other

    cs.CV

    Robust Fairness Vision-Language Learning for Medical Image Analysis

    Authors: Sparsh Bansal, Mingyang Wu, Xin Wang, Shu Hu

    Abstract: The advent of Vision-Language Models (VLMs) in medical image analysis has the potential to help process multimodal inputs and increase performance over traditional inference methods. However, when considering the domain in which these models will be implemented, fairness and robustness are important to ensure the model stays true for any patient. In this paper, we introduce a framework for ensurin… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  8. arXiv:2505.03007  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results

    Authors: Nikolay Safonov, Alexey Bryncev, Andrey Moskalenko, Dmitry Kulikov, Dmitry Vatolin, Radu Timofte, Haibo Lei, Qifan Gao, Qing Luo, Yaqing Li, Jie Song, Shaozhe Hao, Meisong Zheng, Jingyi Xu, Chengbin Wu, Jiahui Liu, Ying Chen, Xin Deng, Mai Xu, Peipei Liang, Jie Ma, Junjie Jin, Yingxue Pang, Fangzhou Luo, Kai Chen , et al. (6 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Challenge on UGC Video Enhancement. The challenge constructed a set of 150 user-generated content videos without reference ground truth, which suffer from real-world degradations such as noise, blur, faded colors, compression artifacts, etc. The goal of the participants was to develop an algorithm capable of improving the visual quality of such vid… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  9. arXiv:2504.21023  [pdf, other

    cs.CL cs.LG

    Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

    Authors: Sheng Cao, Mingrui Wu, Karthik Prasad, Yuandong Tian, Zechun Liu

    Abstract: The post-training phase of large language models is essential for enhancing capabilities such as instruction-following, reasoning, and alignment with human preferences. However, it demands extensive high-quality data and poses risks like overfitting, alongside significant computational costs due to repeated post-training and evaluation after each base model update. This paper introduces $ParamΔ$,… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Published as a conference paper at ICLR 2025

    Journal ref: ICLR 2025

  10. arXiv:2504.18880  [pdf, other

    cs.AI cond-mat.mtrl-sci

    Reshaping MOFs Text Mining with a Dynamic Multi-Agent Framework of Large Language Agents

    Authors: Zuhong Lin, Daoyuan Ren, Kai Ran, Sun Jing, Xiaotiang Huang, Haiyang He, Pengxu Pan, Xiaohang Zhang, Ying Fang, Tianying Wang, Minli Wu, Zhanglin Li, Xiaochuan Zhang, Haipu Li, Jingjing Yao

    Abstract: The mining of synthesis conditions for metal-organic frameworks (MOFs) is a significant focus in materials science. However, identifying the precise synthesis conditions for specific MOFs within the vast array of possibilities presents a considerable challenge. Large Language Models (LLMs) offer a promising solution to this problem. We leveraged the capabilities of LLMs, specifically gpt-4o-mini,… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  11. arXiv:2504.18468  [pdf, other

    cs.CV

    RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects

    Authors: Georgios Kouros, Minye Wu, Tinne Tuytelaars

    Abstract: We introduce RGS-DR, a novel inverse rendering method for reconstructing and rendering glossy and reflective objects with support for flexible relighting and scene editing. Unlike existing methods (e.g., NeRF and 3D Gaussian Splatting), which struggle with view-dependent effects, RGS-DR utilizes a 2D Gaussian surfel representation to accurately estimate geometry and surface normals, an essential p… ▽ More

    Submitted 5 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

  12. arXiv:2504.17441  [pdf, other

    cs.CV

    Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

    Authors: Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, Angjoo Kanazawa

    Abstract: Humans can resort to long-form inspection to build intuition on predicting the 3D configurations of unseen objects. The more we observe the object motion, the better we get at predicting its 3D state immediately. Existing systems either optimize underlying representations from multi-view observations or train a feed-forward predictor from supervised datasets. We introduce Predict-Optimize-Distill… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: See our website at: https://predict-optimize-distill.github.io/pod.github.io First two authors contributed equally

  13. arXiv:2504.17355  [pdf, other

    cs.LG cs.AI

    Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization

    Authors: Xiaohan Huang, Dongjie Wang, Zhiyuan Ning, Ziyue Qiao, Qingqing Long, Haowei Zhu, Yi Du, Min Wu, Yuanchun Zhou, Meng Xiao

    Abstract: Feature transformation methods aim to find an optimal mathematical feature-feature crossing process that generates high-value features and improves the performance of downstream machine learning tasks. Existing frameworks, though designed to mitigate manual costs, often treat feature transformations as isolated operations, ignoring dynamic dependencies between transformation steps. To address the… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 13 pages, Keywords: Automated Feature Transformation, Tabular Dataset, Reinforcement Learning

  14. arXiv:2504.16552  [pdf, other

    cs.DC

    DTVM: Revolutionizing Smart Contract Execution with Determinism and Compatibility

    Authors: Wei Zhou, Changzheng Wei, Ying Yan, Wei Tang, Zhihao Chen, Xiong Xu, Xuebing Huang, Wengang Chen, Jie Zhang, Yang Chen, Xiaofu Zheng, Hanghang Wu, Shenglong Chen, Ermei Wang, Xiangfei Chen, Yang Yu, Meng Wu, Tao Zhu, Liwei Yuan, Feng Yu, Alex Zhang, Wei Wang, Ji Luo, Zhengyu He, Wenbiao Zhao

    Abstract: We introduce the DeTerministic Virtual Machine (DTVM) Stack, a next-generation smart contract execution framework designed to address critical performance, determinism, and ecosystem compatibility challenges in blockchain networks. Building upon WebAssembly (Wasm) while maintaining full Ethereum Virtual Machine (EVM) ABI compatibility, DTVM introduces a Deterministic Middle Intermediate Representa… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  15. arXiv:2504.15521  [pdf, other

    cs.CL

    The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

    Authors: Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: As large language models (LLMs) continue to advance in linguistic capabilities, robust multilingual evaluation has become essential for promoting equitable technological progress. This position paper examines over 2,000 multilingual (non-English) benchmarks from 148 countries, published between 2021 and 2024, to evaluate past, present, and future practices in multilingual benchmarking. Our finding… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: work in progress; 22 pages, 8 figures, 3 tables;

  16. arXiv:2504.14363  [pdf, other

    cs.LG cs.CL

    Improving RL Exploration for LLM Reasoning through Retrospective Replay

    Authors: Shihan Dou, Muling Wu, Jingwen Xu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited ca… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 13 pages, 3 figures

  17. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  18. arXiv:2504.12753  [pdf, other

    cs.CV

    Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

    Authors: Siyu Chen, Ting Han, Changshe Zhang, Xin Luo, Meiliu Wu, Guorong Cai, Jinhe Su

    Abstract: Vision Foundation Models (VFMs) have delivered remarkable performance in Domain Generalized Semantic Segmentation (DGSS). However, recent methods often overlook the fact that visual cues are susceptible, whereas the underlying geometry remains stable, rendering depth information more robust. In this paper, we investigate the potential of integrating depth information with features from VFMs, to im… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  19. arXiv:2504.11321  [pdf, other

    cs.LG

    Subset-Contrastive Multi-Omics Network Embedding

    Authors: Pedro Henrique da Costa Avelar, Min Wu, Sophia Tsoka

    Abstract: Motivation: Network-based analyses of omics data are widely used, and while many of these methods have been adapted to single-cell scenarios, they often remain memory- and space-intensive. As a result, they are better suited to batch data or smaller datasets. Furthermore, the application of network-based methods in multi-omics often relies on similarity-based networks, which lack structurally-disc… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  20. arXiv:2504.10900  [pdf, other

    cs.LG cs.AI

    Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization

    Authors: Peiliang Gong, Emadeldeen Eldele, Min Wu, Zhenghua Chen, Xiaoli Li, Daoqiang Zhang

    Abstract: Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware ada… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  21. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  22. arXiv:2504.09967  [pdf, other

    cs.CV cs.AI cs.LG

    Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data

    Authors: Xun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu, Chuang Zhang, Miao Li, Ji Wu

    Abstract: The emergence of medical generalist foundation models has revolutionized conventional task-specific model development paradigms, aiming to better handle multiple tasks through joint training on large-scale medical datasets. However, recent advances prioritize simple data scaling or architectural component enhancement, while neglecting to re-examine multi-task learning from a data-centric perspecti… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  23. arXiv:2504.07596  [pdf, other

    cs.AI

    Boosting Universal LLM Reward Design through Heuristic Reward Observation Space Evolution

    Authors: Zen Kit Heng, Zimeng Zhao, Tianhao Wu, Yuanfei Wang, Mingdong Wu, Yangang Wang, Hao Dong

    Abstract: Large Language Models (LLMs) are emerging as promising tools for automated reinforcement learning (RL) reward design, owing to their robust capabilities in commonsense reasoning and code generation. By engaging in dialogues with RL agents, LLMs construct a Reward Observation Space (ROS) by selecting relevant environment states and defining their internal operations. However, existing frameworks ha… ▽ More

    Submitted 10 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures

  24. arXiv:2504.06273  [pdf, other

    cs.IR cs.AI cs.CL

    A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge

    Authors: Jiaming Luo, Weiyi Luo, Guoqing Sun, Mengchen Zhu, Haifeng Tang, Kunyao Lan, Mengyue Wu, Kenny Q. Zhu

    Abstract: Designing effective debt collection systems is crucial for improving operational efficiency and reducing costs in the financial industry. However, the challenges of maintaining script diversity, contextual relevance, and coherence make this task particularly difficult. This paper presents a debt collection system based on real debtor-collector data from a major commercial bank. We construct a scri… ▽ More

    Submitted 3 March, 2025; originally announced April 2025.

    Comments: Accepted by NAACL 2025, Industry Track

  25. arXiv:2504.04787  [pdf, other

    cs.CV cs.AI

    Dynamic Vision Mamba

    Authors: Mengxuan Wu, Zekai Li, Zhiyuan Liang, Moyang Li, Xuanlei Zhao, Samir Khaki, Zheng Zhu, Xiaojiang Peng, Konstantinos N. Plataniotis, Kai Wang, Wangbo Zhao, Yang You

    Abstract: Mamba-based vision models have gained extensive attention as a result of being computationally more efficient than attention-based models. However, spatial redundancy still exists in these models, represented by token and block redundancy. For token redundancy, we analytically find that early token pruning methods will result in inconsistency between training and inference or introduce extra compu… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  26. arXiv:2504.02454  [pdf, other

    cs.CV

    Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation

    Authors: Changshuo Wang, Shuting He, Xiang Fang, Meiqing Wu, Siew-Kei Lam, Prayag Tiwari

    Abstract: Few-shot point cloud semantic segmentation aims to accurately segment "unseen" new categories in point cloud scenes using limited labeled data. However, pretraining-based methods not only introduce excessive time overhead but also overlook the local structure representation among irregular point clouds. To address these issues, we propose a pretraining-free local structure fitting network for few-… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Journal ref: AAAI 2025

  27. arXiv:2504.01463  [pdf, other

    physics.optics cs.AR

    Versatile silicon integrated photonic processor: a reconfigurable solution for netx-generation AI clusters

    Authors: Ying Zhu, Yifan Liu, Xinyu Yang, Kailai Liu, Xin Hua, Ming Luo, Jia Liu, Siyao Chang, Shengxiang Zhang, Miao Wu, Zhicheng Wang, Hongguang Zhang, Daigao Chen, Xi Xiao, Shaohua Yu

    Abstract: The Artificial Intelligence models pose serious challenges in intensive computing and high-bandwidth communication for conventional electronic circuit-based computing clusters. Silicon photonic technologies, owing to their high speed, low latency, large bandwidth, and complementary metal-oxide-semiconductor compatibility, have been widely implemented for data transfer and actively explored as phot… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  28. arXiv:2504.01373  [pdf, other

    cs.LG

    UniFault: A Fault Diagnosis Foundation Model from Bearing Data

    Authors: Emadeldeen Eldele, Mohamed Ragab, Xu Qing, Edward, Zhenghua Chen, Min Wu, Xiaoli Li, Jay Lee

    Abstract: Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization cap… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  29. arXiv:2503.20394  [pdf, other

    cs.LG cs.AI

    FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies

    Authors: Tianqi He, Xiaohan Huang, Yi Du, Qingqing Long, Ziyue Qiao, Min Wu, Yanjie Fu, Yuanchun Zhou, Meng Xiao

    Abstract: Feature Transformation is crucial for classic machine learning that aims to generate feature combinations to enhance the performance of downstream tasks from a data-centric perspective. Current methodologies, such as manual expert-driven processes, iterative-feedback techniques, and exploration-generative tactics, have shown promise in automating such data engineering workflow by minimizing human… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 14 pages, Accepted by ICDE 2025

  30. arXiv:2503.17940  [pdf, other

    cs.CV

    FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

    Authors: Dong Zhao, Jinlong Li, Shuang Wang, Mengyao Wu, Qi Zang, Nicu Sebe, Zhun Zhong

    Abstract: Vision Foundation Models (VFMs) excel in generalization due to large-scale pretraining, but fine-tuning them for Domain Generalized Semantic Segmentation (DGSS) while maintaining this ability remains challenging. Existing approaches either selectively fine-tune parameters or freeze the VFMs and update only the adapters, both of which may underutilize the VFMs' full potential in DGSS tasks. We obse… ▽ More

    Submitted 1 April, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Journal ref: Conference on Computer Vision and Pattern Recognition 2025 Conference on Computer Vision and Pattern Recognition 2025 Conference on Computer Vision and Pattern Recognition 2025

  31. arXiv:2503.17671  [pdf, other

    cs.MA cs.AI

    ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

    Authors: Oucheng Huang, Yuhang Ma, Zeng Zhao, Mingrui Wu, Jiayi Ji, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun, Rongrong Ji

    Abstract: ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI wo… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  32. arXiv:2503.16779  [pdf, other

    cs.CL cs.AI

    Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

    Authors: Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen

    Abstract: Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representatio… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 11 pages, 10 figures

  33. What Makes a Good TODO Comment?

    Authors: Haoye Wang, Zhipeng Gao, Tingting Bi, John Grundy, Xinyu Wang, Minghui Wu, Xiaohu Yang

    Abstract: Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7\% of TODO comments i… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  34. arXiv:2503.13961  [pdf, other

    cs.GR cs.CV

    BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering

    Authors: Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, Jingyi Yu

    Abstract: Differentiable rendering enables efficient optimization by allowing gradients to be computed through the rendering process, facilitating 3D reconstruction, inverse rendering and neural scene representation learning. To ensure differentiability, existing solutions approximate or re-formulate traditional rendering operations using smooth, probabilistic proxies such as volumes or Gaussian primitives.… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  35. Rubikon: Intelligent Tutoring for Rubik's Cube Learning Through AR-enabled Physical Task Reconfiguration

    Authors: Haocheng Ren, Muzhe Wu, Gregory Croisdale, Anhong Guo, Xu Wang

    Abstract: Learning to solve a Rubik's Cube requires the learners to repeatedly practice a skill component, e.g., identifying a misplaced square and putting it back. However, for 3D physical tasks such as this, generating sufficient repeated practice opportunities for learners can be challenging, in part because it is difficult for novices to reconfigure the physical object to specific states. We propose Rub… ▽ More

    Submitted 14 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: DIS 2025

  36. arXiv:2503.12389  [pdf, other

    cs.AI

    FedGAI: Federated Style Learning with Cloud-Edge Collaboration for Generative AI in Fashion Design

    Authors: Mingzhu Wu, Jianan Jiang, Xinglin Li, Hanhui Deng, Di Wu

    Abstract: Collaboration can amalgamate diverse ideas, styles, and visual elements, fostering creativity and innovation among different designers. In collaborative design, sketches play a pivotal role as a means of expressing design creativity. However, designers often tend to not openly share these meticulously crafted sketches. This phenomenon of data island in the design area hinders its digital transform… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  37. arXiv:2503.10351  [pdf, other

    cs.CL

    New Trends for Modern Machine Translation with Large Reasoning Models

    Authors: Sinuo Liu, Chenyang Lyu, Minghao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Zifu Shang

    Abstract: Recent advances in Large Reasoning Models (LRMs), particularly those leveraging Chain-of-Thought reasoning (CoT), have opened brand new possibility for Machine Translation (MT). This position paper argues that LRMs substantially transformed traditional neural MT as well as LLMs-based MT paradigms by reframing translation as a dynamic reasoning task that requires contextual, cultural, and linguisti… ▽ More

    Submitted 14 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: arXiv admin note: text overlap with arXiv:1701.04715 by other authors

  38. arXiv:2503.10076  [pdf, other

    cs.CV

    VMBench: A Benchmark for Perception-Aligned Video Motion Generation

    Authors: Xinran Ling, Chen Zhu, Meiqi Wu, Hangyu Li, Xiaokun Feng, Cundian Yang, Aiming Hao, Jiashu Zhu, Jiahong Wu, Xiangxiang Chu

    Abstract: Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce VMBench--a comprehensive Video Motion Benchmark that has perception-aligned motion m… ▽ More

    Submitted 16 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  39. arXiv:2503.10020  [pdf, other

    cs.CV cs.LG

    One-Shot Federated Unsupervised Domain Adaptation with Scaled Entropy Attention and Multi-Source Smoothed Pseudo Labeling

    Authors: Ali Abedi, Q. M. Jonathan Wu, Ning Zhang, Farhad Pourpanah

    Abstract: Federated Learning (FL) is a promising approach for privacy-preserving collaborative learning. However, it faces significant challenges when dealing with domain shifts, especially when each client has access only to its source data and cannot share it during target domain adaptation. Moreover, FL methods often require high communication overhead due to multiple rounds of model updates between clie… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  40. arXiv:2503.07914  [pdf, other

    cs.AI cs.CL cs.LG

    Demystifying the Accuracy-Interpretability Trade-Off: A Case Study of Inferring Ratings from Reviews

    Authors: Pranjal Atrey, Michael P. Brundage, Min Wu, Sanghamitra Dutta

    Abstract: Interpretable machine learning models offer understandable reasoning behind their decision-making process, though they may not always match the performance of their black-box counterparts. This trade-off between interpretability and model performance has sparked discussions around the deployment of AI, particularly in critical applications where knowing the rationale of decision-making is essentia… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted at DAI Workshop, AAAI-2025

  41. arXiv:2503.06709  [pdf, other

    cs.CL cs.AI

    Delusions of Large Language Models

    Authors: Hongshen Xu, Zixv yang, Zichen Zhu, Kunyao Lan, Zihan Wang, Mengyue Wu, Ziwei Ji, Lu Chen, Pascale Fung, Kai Yu

    Abstract: Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to mo… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  42. arXiv:2503.06052  [pdf, other

    cs.LG q-bio.QM

    Interpretable High-order Knowledge Graph Neural Network for Predicting Synthetic Lethality in Human Cancers

    Authors: Xuexin Chen, Ruichu Cai, Zhengting Huang, Zijian Li, Jie Zheng, Min Wu

    Abstract: Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trus… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 15 pages. Accepted by Briefings in Bioinformatics

    Journal ref: Briefings in Bioinformatics 2025

  43. arXiv:2503.05244  [pdf, other

    cs.AI cs.CL

    WritingBench: A Comprehensive Benchmark for Generative Writing

    Authors: Yuning Wu, Jiahao Mei, Ming Yan, Chenliang Li, Shaopeng Lai, Yuran Ren, Zijia Wang, Ji Zhang, Mengyue Wu, Qin Jin, Fei Huang

    Abstract: Recent advancements in large language models (LLMs) have significantly enhanced text generation capabilities, yet evaluating their performance in generative writing remains a challenge. Existing benchmarks primarily focus on generic text generation or limited in writing tasks, failing to capture the diverse requirements of high-quality written contents across various domains. To bridge this gap, w… ▽ More

    Submitted 20 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  44. arXiv:2503.05242  [pdf, other

    cs.CL

    MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

    Authors: Xuenan Xu, Jiahao Mei, Chenliang Li, Yuning Wu, Ming Yan, Shaopeng Lai, Ji Zhang, Mengyue Wu

    Abstract: The rapid advancement of large language models (LLMs) and artificial intelligence-generated content (AIGC) has accelerated AI-native applications, such as AI-based storybooks that automate engaging story production for children. However, challenges remain in improving story attractiveness, enriching storytelling expressiveness, and developing open-source evaluation benchmarks and frameworks. There… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  45. arXiv:2503.05040  [pdf, other

    cs.SE

    No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic

    Authors: John C. Flournoy, Carol S. Lee, Maggie Wu, Catherine M. Hicks

    Abstract: Understanding factors that influence software development velocity is crucial for engineering teams and organizations, yet empirical evidence at scale remains limited. A more robust understanding of the dynamics of cycle time may help practitioners avoid pitfalls in relying on velocity measures while evaluating software work. We analyze cycle time, a widely-used metric measuring time from ticket c… ▽ More

    Submitted 11 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  46. arXiv:2503.02374  [pdf, other

    cs.CL

    MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics

    Authors: Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, Mengyue Wu

    Abstract: Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models' grasp of m… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  47. arXiv:2503.01461  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Widening The Distillation Bottleneck for Reasoning Models

    Authors: Huifeng Yin, Yu Zhao, Minghao Wu, Xuanfan Ni, Bo Zeng, Hao Wang, Tianqi Shi, Liangying Shao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: Large Reasoning Models(LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought(CoT). Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data p… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  48. arXiv:2503.00780  [pdf, other

    cs.CV cs.AI

    Enhanced Multi-Class Classification of Gastrointestinal Endoscopic Images with Interpretable Deep Learning Model

    Authors: Astitva Kamble, Vani Bandodkar, Saakshi Dharmadhikary, Veena Anand, Pradyut Kumar Sanki, Mei X. Wu, Biswabandhu Jana

    Abstract: Endoscopy serves as an essential procedure for evaluating the gastrointestinal (GI) tract and plays a pivotal role in identifying GI-related disorders. Recent advancements in deep learning have demonstrated substantial progress in detecting abnormalities through intricate models and data augmentation methods.This research introduces a novel approach to enhance classification accuracy using 8,000 l… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  49. arXiv:2503.00461  [pdf, other

    cs.AR cs.AI

    Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs

    Authors: Zhantong Zhu, Hongou Li, Wenjie Ren, Meng Wu, Le Ye, Ru Huang, Tianyu Jia

    Abstract: With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. Tensor Processing Units (TPUs) are designed to accelerate AI workloads, but their high power consumption necessitates innovations for improving efficiency. Compute-in-memory (CIM) has emerged as a promising paradigm with superior area and energy efficiency. In this work, we p… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: Accepted to appear at DATE 2025

  50. arXiv:2502.21290  [pdf, other

    cs.AI cs.LG q-bio.QM

    Contextualizing biological perturbation experiments through language

    Authors: Menghua Wu, Russell Littman, Jacob Levine, Lin Qiu, Tommaso Biancalani, David Richmond, Jan-Christian Huetter

    Abstract: High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: The Thirteenth International Conference on Learning Representations (2025)