Skip to main content

Showing 1–50 of 120 results for author: Lyu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.17029  [pdf, ps, other

    cs.LG

    Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

    Authors: Leizhen Wang, Peibo Duan, Cheng Lyu, Zewen Wang, Zhiqiang He, Nan Zheng, Zhenliang Ma

    Abstract: The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalabilit… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  2. arXiv:2506.11870  [pdf, ps, other

    cs.DB

    LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection

    Authors: Ce Lyu, Minghao Zhao, Yanhao Wang, Liang Jie

    Abstract: Database connectors are critical components enabling applications to interact with underlying database management systems (DBMS), yet their security vulnerabilities often remain overlooked. Unlike traditional software defects, connector vulnerabilities exhibit subtle behavioral patterns and are inherently challenging to detect. Besides, nonstandardized implementation of connectors leaves potential… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 5 pages

    MSC Class: 68N99 ACM Class: H.2.4; D.2.5

  3. arXiv:2506.11820  [pdf, ps, other

    cs.CV cs.CL

    Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation

    Authors: Xintong Wang, Jingheng Pan, Yixiao Liu, Xiaohu Zhao, Chenyang Lyu, Minghao Wu, Chris Biemann, Longyue Wang, Linlong Xu, Weihua Luo, Kaifu Zhang

    Abstract: Vision-Language Translation (VLT) is a challenging task that requires accurately recognizing multilingual text embedded in images and translating it into the target language with the support of visual context. While recent Large Vision-Language Models (LVLMs) have demonstrated strong multilingual and visual understanding capabilities, there is a lack of systematic evaluation and understanding of t… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  4. arXiv:2506.11066  [pdf, ps, other

    cs.SE cs.AI

    CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

    Authors: Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

    Abstract: Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across fo… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  5. arXiv:2506.09278  [pdf, ps, other

    cs.CV cs.LG cs.RO

    UFM: A Simple Path towards Unified Dense Correspondence with Flow

    Authors: Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, Wenshan Wang

    Abstract: Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), whic… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Project Page: https://uniflowmatch.github.io/

  6. arXiv:2506.00088  [pdf, ps, other

    cs.CL cs.AI cs.LG

    HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs

    Authors: Qing Li, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray

    Abstract: In recent years, large language models (LLMs) have made remarkable advancements, yet hallucination, where models produce inaccurate or non-factual statements, remains a significant challenge for real-world deployment. Although current classification-based methods, such as SAPLMA, are highly efficient in mitigating hallucinations, they struggle when non-factual information arises in the early or mi… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  7. arXiv:2505.20889  [pdf, ps, other

    cs.AI

    Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment

    Authors: Leizhen Wang, Peibo Duan, Cheng Lyu, Zhenliang Ma

    Abstract: Modern navigation systems and shared mobility platforms increasingly rely on personalized route recommendations to improve individual travel experience and operational efficiency. However, a key question remains: can such sequential, personalized routing decisions collectively lead to system-optimal (SO) traffic assignment? This paper addresses this question by proposing a learning-based framework… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.20362  [pdf, other

    cs.IR cs.AI

    VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration

    Authors: Jiahui Geng, Qing Li, Zongxiong Chen, Yuxia Wang, Derui Zhu, Zhuohan Xie, Chenyang Lyu, Xiuying Chen, Preslav Nakov, Fakhri Karray

    Abstract: The rapid advancement of vision-language models (VLMs) has brought a lot of attention to their safety alignment. However, existing methods have primarily focused on model undersafety, where the model responds to hazardous queries, while neglecting oversafety, where the model refuses to answer safe queries. In this paper, we introduce the concept of $\textit{safety calibration}$, which systematical… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  9. arXiv:2505.14244  [pdf, ps, other

    cs.CL

    TransBench: Benchmarking Machine Translation for Industrial-Scale Applications

    Authors: Haijun Li, Tianqi Shi, Zifu Shang, Yuxuan Han, Xueyu Zhao, Hao Wang, Yu Qian, Zhiqiang Qian, Linlong Xu, Minghao Wu, Chenyang Lyu, Longyue Wang, Gongbo Tang, Weihua Luo, Zhao Xu, Kaifu Zhang

    Abstract: Machine translation (MT) has become indispensable for cross-border communication in globalized industries like e-commerce, finance, and legal services, with recent advancements in large language models (LLMs) significantly enhancing translation quality. However, applying general-purpose MT models to industrial scenarios reveals critical limitations due to domain-specific terminology, cultural nuan… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  10. arXiv:2504.15521  [pdf, other

    cs.CL

    The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

    Authors: Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: As large language models (LLMs) continue to advance in linguistic capabilities, robust multilingual evaluation has become essential for promoting equitable technological progress. This position paper examines over 2,000 multilingual (non-English) benchmarks from 148 countries, published between 2021 and 2024, to evaluate past, present, and future practices in multilingual benchmarking. Our finding… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: work in progress; 22 pages, 8 figures, 3 tables;

  11. arXiv:2504.12605  [pdf, other

    cs.CV

    AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting

    Authors: Xin Su, Chen Wu, Yu Zhang, Chen Lyu, Zhuoran Zheng

    Abstract: Restoring images afflicted by complex real-world degradations remains challenging, as conventional methods often fail to adapt to the unique mixture and severity of artifacts present. This stems from a reliance on indirect cues which poorly capture the true perceptual quality deficit. To address this fundamental limitation, we introduce AdaQual-Diff, a diffusion-based framework that integrates per… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  12. arXiv:2503.23440  [pdf, other

    cs.RO

    VET: A Visual-Electronic Tactile System for Immersive Human-Machine Interaction

    Authors: Cong Zhang, Yisheng Yang, Shilong Mu, Chuqiao Lyu, Shoujie Li, Xinyue Chai, Wenbo Ding

    Abstract: In the pursuit of deeper immersion in human-machine interaction, achieving higher-dimensional tactile input and output on a single interface has become a key research focus. This study introduces the Visual-Electronic Tactile (VET) System, which builds upon vision-based tactile sensors (VBTS) and integrates electrical stimulation feedback to enable bidirectional tactile communication. We propose a… ▽ More

    Submitted 1 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  13. arXiv:2503.20950  [pdf, other

    cs.AI

    DEMENTIA-PLAN: An Agent-Based Framework for Multi-Knowledge Graph Retrieval-Augmented Generation in Dementia Care

    Authors: Yutong Song, Chenhan Lyu, Pengfei Zhang, Sabine Brunswicker, Nikil Dutt, Amir Rahmani

    Abstract: Mild-stage dementia patients primarily experience two critical symptoms: severe memory loss and emotional instability. To address these challenges, we propose DEMENTIA-PLAN, an innovative retrieval-augmented generation framework that leverages large language models to enhance conversational support. Our model employs a multiple knowledge graph architecture, integrating various dimensional knowledg… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025 Workshop on Knowledge Graphs for Personalized Public Health

  14. arXiv:2503.14530   

    cs.CV cs.AI

    SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

    Authors: Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

    Abstract: Unlearning methods for vision-language models (VLMs) have primarily adapted techniques from large language models (LLMs), relying on weight updates that demand extensive annotated forget sets. Moreover, these methods perform unlearning at a coarse granularity, often leading to excessive forgetting and reduced model utility. To address this issue, we introduce SAUCE, a novel method that leverages s… ▽ More

    Submitted 20 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: More comparative experiments are needed

  15. arXiv:2503.12218  [pdf, other

    cs.CV

    Adaptive Label Correction for Robust Medical Image Segmentation with Noisy Labels

    Authors: Chengxuan Qian, Kai Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen, Zhe Liu

    Abstract: Deep learning has shown remarkable success in medical image analysis, but its reliance on large volumes of high-quality labeled data limits its applicability. While noisy labeled data are easier to obtain, directly incorporating them into training can degrade model performance. To address this challenge, we propose a Mean Teacher-based Adaptive Label Correction (ALC) self-ensemble framework for ro… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  16. arXiv:2503.10351  [pdf, other

    cs.CL

    New Trends for Modern Machine Translation with Large Reasoning Models

    Authors: Sinuo Liu, Chenyang Lyu, Minghao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Zifu Shang

    Abstract: Recent advances in Large Reasoning Models (LRMs), particularly those leveraging Chain-of-Thought reasoning (CoT), have opened brand new possibility for Machine Translation (MT). This position paper argues that LRMs substantially transformed traditional neural MT as well as LLMs-based MT paradigms by reframing translation as a dynamic reasoning task that requires contextual, cultural, and linguisti… ▽ More

    Submitted 14 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: arXiv admin note: text overlap with arXiv:1701.04715 by other authors

  17. arXiv:2503.06534  [pdf, other

    cs.CL

    SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

    Authors: Xingwei Tan, Chen Lyu, Hafiz Muhammad Umer, Sahrish Khan, Mahathi Parvatham, Lois Arthurs, Simon Cullen, Shelley Wilson, Arshad Jhumka, Gabriele Pergola

    Abstract: Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensi… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: NAACL 2025 system demonstration camera-ready

  18. arXiv:2503.06456  [pdf, other

    cs.CV

    DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

    Authors: Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chen, Zhe Liu

    Abstract: Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and modality representation capabilities. To address this, we introduce DynCIM, a novel dynamic curriculum learning framework designed to quantify the inherent imbalance… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: 10 pages, 7 figures

  19. arXiv:2503.02846  [pdf, other

    cs.CL

    Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

    Authors: Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains. Since hallucinations always come with truthful content in the LLM responses, previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training. Therefore, this paper proposes a fine-grain… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted by ICLR 2025. Code is available at https://github.com/open-compass/ANAH

  20. arXiv:2503.01543  [pdf, other

    cs.RO

    Exo-ViHa: A Cross-Platform Exoskeleton System with Visual and Haptic Feedback for Efficient Dexterous Skill Learning

    Authors: Xintao Chao, Shilong Mu, Yushan Liu, Shoujie Li, Chuqiao Lyu, Xiao-Ping Zhang, Wenbo Ding

    Abstract: Imitation learning has emerged as a powerful paradigm for robot skills learning. However, traditional data collection systems for dexterous manipulation face challenges, including a lack of balance between acquisition efficiency, consistency, and accuracy. To address these issues, we introduce Exo-ViHa, an innovative 3D-printed exoskeleton system that enables users to collect data from a first-per… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 8 pages, 6 figures

  21. arXiv:2503.01461  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models

    Authors: Huifeng Yin, Yu Zhao, Minghao Wu, Xuanfan Ni, Bo Zeng, Hao Wang, Tianqi Shi, Liangying Shao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: Large Reasoning Models(LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought(CoT). Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data p… ▽ More

    Submitted 31 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  22. arXiv:2503.01439  [pdf, other

    cs.RO

    AVR: Active Vision-Driven Robotic Precision Manipulation with Viewpoint and Focal Length Optimization

    Authors: Yushan Liu, Shilong Mu, Xintao Chao, Zizhen Li, Yao Mu, Tianxing Chen, Shoujie Li, Chuqiao Lyu, Xiao-ping Zhang, Wenbo Ding

    Abstract: Robotic manipulation within dynamic environments presents challenges to precise control and adaptability. Traditional fixed-view camera systems face challenges adapting to change viewpoints and scale variations, limiting perception and manipulation precision. To tackle these issues, we propose the Active Vision-driven Robotic (AVR) framework, a teleoperation hardware solution that supports dynamic… ▽ More

    Submitted 23 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Previously, there were some problems with our experimental data, and the conclusions need to be further verified. Now that we have completed a full-scale experiment and analysis, and added supporting materials to our website, we hope to be able to resubmit it

  23. arXiv:2502.16886  [pdf, ps, other

    cs.CL cs.AI

    DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance

    Authors: Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie Zhou, Piji Li

    Abstract: To alleviate memory burden during inference of large language models (LLMs), numerous studies have focused on compressing the KV cache by exploring aspects such as attention sparsity. These techniques are often designed with a pre-defined KV budget; however, as the optimal budget varies by different input lengths and task types, the existence of a fixed budget could result in inconsistent performa… ▽ More

    Submitted 9 June, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  24. arXiv:2502.14743  [pdf, ps, other

    cs.MA cs.AI

    Multi-Agent Coordination across Diverse Applications: A Survey

    Authors: Lijun Sun, Yijun Yang, Qiqi Duan, Yuhui Shi, Chao Lyu, Yu-Cheng Chang, Chin-Teng Lin, Yang Shen

    Abstract: Multi-agent coordination studies the underlying mechanism enabling the trending spread of diverse multi-agent systems (MAS) and has received increasing attention, driven by the expansion of emerging applications and rapid AI advances. This survey outlines the current state of coordination research across applications through a unified understanding that answers four fundamental coordination questi… ▽ More

    Submitted 20 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 23 pages, 4 figures, 2 tables

  25. arXiv:2502.13474  [pdf, other

    cs.CL

    Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models

    Authors: Chenyu Zhu, Yefeng Liu, Chenyang Lyu, Xue Yang, Guanhua Chen, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: Multi-aspect controllable text generation aims to control text generation in attributes from multiple aspects, making it a complex but powerful task in natural language processing. Supervised fine-tuning methods are often employed for this task due to their simplicity and effectiveness. However, they still have some limitations: low rank adaptation (LoRA) only fine-tunes a few parameters and has s… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 17 pages,9 figures

  26. arXiv:2502.06781  [pdf, other

    cs.CL cs.LG

    Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

    Authors: Chengqi Lyu, Songyang Gao, Yuzhe Gu, Wenwei Zhang, Jianfei Gao, Kuikun Liu, Ziyi Wang, Shuaibin Li, Qian Zhao, Haian Huang, Weihan Cao, Jiangning Liu, Hongwei Liu, Junnan Liu, Songyang Zhang, Dahua Lin, Kai Chen

    Abstract: Reasoning abilities, especially those for solving complex math problems, are crucial components of general intelligence. Recent advances by proprietary companies, such as o-series models of OpenAI, have made remarkable progress on reasoning tasks. However, the complete technical details remain unrevealed, and the techniques that are believed certainly to be adopted are only reinforcement learning… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: We released our code, data, and model on https://github.com/InternLM/OREAL

  27. arXiv:2501.10674  [pdf, other

    cs.CV cs.CL

    Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!

    Authors: Mohamed Fazli Imam, Chenyang Lyu, Alham Fikri Aji

    Abstract: Multimodal Large Language Models (MLLMs) have achieved significant advancements in tasks like Visual Question Answering (VQA) by leveraging foundational Large Language Models (LLMs). However, their abilities in specific areas such as visual temporal understanding, which is crucial for comprehending real-world dynamics, remain underexplored. To address this, we propose a challenging evaluation benc… ▽ More

    Submitted 18 February, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

    Comments: Our dataset can be found at \url{https://huggingface.co/datasets/fazliimam/temporal-vqa}

  28. arXiv:2412.11732  [pdf, other

    cs.CL

    Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation

    Authors: Longyue Wang, Siyou Liu, Chenyang Lyu, Wenxiang Jiao, Xing Wang, Jiahao Xu, Zhaopeng Tu, Yan Gu, Weiyu Chen, Minghao Wu, Liting Zhou, Philipp Koehn, Andy Way, Yulin Yuan

    Abstract: Following last year, we have continued to host the WMT translation shared task this year, the second edition of the Discourse-Level Literary Translation. We focus on three language directions: Chinese-English, Chinese-German, and Chinese-Russian, with the latter two ones newly added. This year, we totally received 10 submissions from 5 academia and industry teams. We employ both automatic and huma… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: WMT2024

  29. arXiv:2412.04003  [pdf, other

    cs.CL

    Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

    Authors: Lingfeng Ming, Bo Zeng, Chenyang Lyu, Tianqi Shi, Yu Zhao, Xue Yang, Yefeng Liu, Yiyu Wang, Linlong Xu, Yangyang Liu, Xiaohu Zhao, Hao Wang, Heng Liu, Hao Zhou, Huifeng Yin, Zifu Shang, Haijun Li, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: Large Language Models (LLMs) have achieved remarkable progress in recent years; however, their excellent performance is still largely limited to major world languages, primarily English. Many LLMs continue to face challenges with multilingual tasks, especially when it comes to low-resource languages. To address this issue, we introduced Marco-LLM: Massive multilingual training for cross-lingual en… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  30. arXiv:2412.03338  [pdf, other

    cs.LG cs.AI

    AI-Driven Day-to-Day Route Choice

    Authors: Leizhen Wang, Peibo Duan, Zhengbing He, Cheng Lyu, Xin Chen, Nan Zheng, Li Yao, Zhenliang Ma

    Abstract: Understanding travelers' route choices can help policymakers devise optimal operational and planning strategies for both normal and abnormal circumstances. However, existing choice modeling methods often rely on predefined assumptions and struggle to capture the dynamic and adaptive nature of travel behavior. Recently, Large Language Models (LLMs) have emerged as a promising alternative, demonstra… ▽ More

    Submitted 31 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

  31. arXiv:2411.18884  [pdf, other

    cs.RO cs.CV

    ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection

    Authors: Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Long Bai, Chaoyang Lyu, Xiaoxiao Yang, Zhen Li, Hongliang Ren

    Abstract: Robot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy. Accurate prediction of dissection trajectories is crucial for better decision-making, reducing intraoperative errors, and improving surgical training. Neverthel… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  32. arXiv:2411.14405  [pdf, other

    cs.CL

    Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

    Authors: Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: ''Can the o1 model effe… ▽ More

    Submitted 25 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

  33. arXiv:2410.15287  [pdf, other

    cs.CL

    Training Language Models to Critique With Multi-agent Feedback

    Authors: Tian Lan, Wenwei Zhang, Chengqi Lyu, Shuaibin Li, Chen Xu, Heyan Huang, Dahua Lin, Xian-Ling Mao, Kai Chen

    Abstract: Critique ability, a meta-cognitive capability of humans, presents significant challenges for LLMs to improve. Recent works primarily rely on supervised fine-tuning (SFT) using critiques generated by a single LLM like GPT-4. However, these model-generated critiques often exhibit flaws due to the inherent complexity of the critique. Consequently, fine-tuning LLMs on such flawed critiques typically l… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  34. arXiv:2410.09632  [pdf, other

    cs.CL

    SciGisPy: a Novel Metric for Biomedical Text Simplification via Gist Inference Score

    Authors: Chen Lyu, Gabriele Pergola

    Abstract: Biomedical literature is often written in highly specialized language, posing significant comprehension challenges for non-experts. Automatic text simplification (ATS) offers a solution by making such texts more accessible while preserving critical information. However, evaluating ATS for biomedical texts is still challenging due to the limitations of existing evaluation metrics. General-domain me… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: Accepted by he Third Workshop on Text Simplification, Accessibility and Readability

  35. arXiv:2410.09631  [pdf, other

    cs.CL

    Society of Medical Simplifiers

    Authors: Chen Lyu, Gabriele Pergola

    Abstract: Medical text simplification is crucial for making complex biomedical literature more accessible to non-experts. Traditional methods struggle with the specialized terms and jargon of medical texts, lacking the flexibility to adapt the simplification process dynamically. In contrast, recent advancements in large language models (LLMs) present unique opportunities by offering enhanced control over te… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: Accepted by Third Workshop on Text Simplification, Accessibility and Readability

  36. arXiv:2410.06667  [pdf, other

    cs.CL cs.AI

    Large Language Models as Code Executors: An Exploratory Study

    Authors: Chenyang Lyu, Lecheng Yan, Rui Xing, Wenxi Li, Younes Samih, Tianbo Ji, Longyue Wang

    Abstract: The capabilities of Large Language Models (LLMs) have significantly evolved, extending from natural language processing to complex tasks like code understanding and generation. We expand the scope of LLMs' capabilities to a broader context, using LLMs to execute code snippets to obtain the output. This paper pioneers the exploration of LLMs as code executors, where code snippets are directly fed t… ▽ More

    Submitted 10 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

  37. arXiv:2409.10983  [pdf, other

    cs.RO

    MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

    Authors: Tong Wu, Shoujie Li, Chuqiao Lyu, Kit-Wa Sou, Wang-Sing Chan, Wenbo Ding

    Abstract: Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the… ▽ More

    Submitted 11 May, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: 21 pages

  38. arXiv:2408.13976  [pdf, other

    cs.SE

    Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates

    Authors: Zhihong Sun, Yao Wan, Jia Li, Hongyu Zhang, Zhi Jin, Ge Li, Chen Lyu

    Abstract: Large Language Models (LLMs), such as GPT-4, StarCoder, and CodeLlama, are transforming the way developers approach programming by automatically generating code based on given natural language descriptions. Despite advancements, generating syntactically and semantically correct code remains challenging, especially for complex programming tasks. Existing approaches typically generate multiple candi… ▽ More

    Submitted 19 September, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

  39. arXiv:2408.12960  [pdf, other

    cs.SE

    Measuring Code Efficiency Optimization Capabilities with ACEOB

    Authors: Yue Pan, Xiuting Shao, Chen Lyu

    Abstract: As Moore's Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models' abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of "c… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  40. arXiv:2408.12948  [pdf, other

    cs.SE

    E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group

    Authors: Yue Pan, Chen Lyu, Zhenyu Yang, Lantian Li, Qi Liu, Xiuting Shao

    Abstract: Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by Large Language Models(LLMs). However, traditional strategies for rectify… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  41. arXiv:2408.05767  [pdf, other

    cs.CL cs.AI

    Reference-free Hallucination Detection for Large Vision-Language Models

    Authors: Qing Li, Jiahui Geng, Chenyang Lyu, Derui Zhu, Maxim Panov, Fakhri Karray

    Abstract: Large vision-language models (LVLMs) have made significant progress in recent years. While LVLMs exhibit excellent ability in language understanding, question answering, and conversations of visual inputs, they are prone to producing hallucinations. While several methods are proposed to evaluate the hallucinations in LVLMs, most are reference-based and depend on external tools, which complicates t… ▽ More

    Submitted 19 November, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

  42. arXiv:2407.19376  [pdf, other

    cs.CE

    CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference

    Authors: Qibin Zhang, Chengshang Lyu, Lingxi Chen, Qiqi Jin, Luonan Chen

    Abstract: Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is also different from inferring causal nodes. While Graph Neural Network (GNN) Explainers have shown potential in subgraph identification, existing methods with GNN often offer associative rather than causal insights. This lack of transparenc… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  43. arXiv:2407.04693  [pdf, other

    cs.CL cs.AI

    ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

    Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucin… ▽ More

    Submitted 19 December, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted by NeurIPS 2024. Dataset, code, and model are released at https://github.com/open-compass/ANAH

  44. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (51 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 4 November, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  45. arXiv:2405.20315  [pdf, other

    cs.CL cs.AI

    ANAH: Analytical Annotation of Hallucinations in Large Language Models

    Authors: Ziwei Ji, Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024

  46. arXiv:2405.19265  [pdf, other

    cs.CL

    AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

    Authors: Zifan Song, Yudong Wang, Wenwei Zhang, Kuikun Liu, Chengqi Lyu, Demin Song, Qipeng Guo, Hang Yan, Dahua Lin, Kai Chen, Cairong Zhao

    Abstract: Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality and diversity, which may insufficiently elicit the potential of pre-trained Code LLMs. In this paper, we present AlchemistCoder, a series of Code LLMs with enh… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint with 20 pages and 20 figures. Source code and models at https://github.com/InternLM/AlchemistCoder

  47. arXiv:2404.17342  [pdf, other

    cs.CL cs.AI

    From Multiple-Choice to Extractive QA: A Case Study for English and Arabic

    Authors: Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Kirill Chirkunov, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash

    Abstract: The rapid evolution of Natural Language Processing (NLP) has favoured major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when… ▽ More

    Submitted 24 January, 2025; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Paper 8 pages, Appendix 12 pages. Published at COLING2025

  48. arXiv:2403.13271  [pdf, other

    cs.SE

    Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

    Authors: Zhihong Sun, Chen Lyu, Bolun Li, Yao Wan, Hongyu Zhang, Ge Li, Zhi Jin

    Abstract: Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate programming challenges, thereby improving its performance in code generation. Nevertheless, smaller models have been struggling to keep up with LLMs in deducing these… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted for LREC-COLING 2024

    ACM Class: D.2.3

  49. arXiv:2403.11324  [pdf, other

    cs.CV

    GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

    Authors: Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

    Abstract: During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue,… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: accepted to ECCV 2024

  50. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.