Skip to main content

Showing 1–50 of 76 results for author: Cong, X

.
  1. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  2. arXiv:2506.03103  [pdf, ps, other

    cs.CV

    DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation

    Authors: Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar

    Abstract: Reconstructing dynamic hand-object contacts is essential for realistic manipulation in AI character animation, XR, and robotics, yet it remains challenging due to heavy occlusions, complex surface details, and limitations in existing capture techniques. In this paper, we introduce DyTact, a markerless capture method for accurately capturing dynamic contact in hand-object manipulations in a non-int… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2506.02395  [pdf, ps, other

    cs.CV

    The Devil is in the Darkness: Diffusion-Based Nighttime Dehazing Anchored in Brightness Perception

    Authors: Xiaofeng Cong, Yu-Xin Zhang, Haoran Wei, Yeying Jin, Junming Hou, Jie Gui, Jing Zhang, Dacheng Tao

    Abstract: While nighttime image dehazing has been extensively studied, converting nighttime hazy images to daytime-equivalent brightness remains largely unaddressed. Existing methods face two critical limitations: (1) datasets overlook the brightness relationship between day and night, resulting in the brightness mapping being inconsistent with the real world during image synthesis; and (2) models do not ex… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  4. arXiv:2506.01391  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

    Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun

    Abstract: The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, whi… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: The project is available at https://github.com/OpenBMB/AgentCPM-GUI

    ACM Class: I.2.8; I.2.7; I.2.10; H.5.2

  5. arXiv:2505.11833  [pdf, ps, other

    cs.AI

    ToLeaP: Rethinking Development of Tool Learning with Large Language Models

    Authors: Haotian Chen, Zijun Song, Boye Niu, Ke Zhang, Litu Ou, Yaxi Lu, Zhong Zhang, Xin Cong, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Tool learning, which enables large language models (LLMs) to utilize external tools effectively, has garnered increasing attention for its potential to revolutionize productivity across industries. Despite rapid development in tool learning, key challenges and opportunities remain understudied, limiting deeper insights and future advancements. In this paper, we investigate the tool learning abilit… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  6. arXiv:2505.08750  [pdf, other

    cs.CL

    AC-Reason: Towards Theory-Guided Actual Causality Reasoning with Large Language Models

    Authors: Yanxi Zhang, Xin Cong, Zhong Zhang, Xiao Liu, Dongyan Zhao, Yesai Wu

    Abstract: Actual causality (AC), a fundamental aspect of causal reasoning (CR), is responsible for attribution and responsibility assignment in real-world scenarios. However, existing LLM-based methods lack grounding in formal AC theory, resulting in limited interpretability. Therefore, we propose AC-Reason, a semi-formal reasoning framework that identifies causally relevant events within an AC scenario, in… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2504.10466  [pdf, other

    cs.CV

    Art3D: Training-Free 3D Generation from Flat-Colored Illustration

    Authors: Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

    Abstract: Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a traini… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report. Course Project of Brown CSCI 1430 Computer Vision. Project Page: https://joy-jy11.github.io/

  8. arXiv:2503.14906  [pdf, other

    eess.IV cs.CV

    FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis

    Authors: Yaofei Duan, Tao Tan, Zhiyuan Zhu, Yuhao Huang, Yuanji Zhang, Rui Gao, Patrick Cheong-Iao Pang, Xinru Gao, Guowei Tao, Xiang Cong, Zhou Li, Lianying Liang, Guangzhi He, Linliang Yin, Xuedong Deng, Xin Yang, Dong Ni

    Abstract: Fetal ultrasound (US) examinations require the acquisition of multiple planes, each providing unique diagnostic information to evaluate fetal development and screening for congenital anomalies. However, obtaining a comprehensive, multi-plane annotated fetal US dataset remains challenging, particularly for rare or complex anomalies owing to their low incidence and numerous subtypes. This poses diff… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 18 pages, 10 figures

  9. arXiv:2502.18878  [pdf, other

    cs.CL

    Learning to Generate Structured Output with Schema Reinforcement Learning

    Authors: Yaxi Lu, Haolun Li, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Zhiyuan Liu, Fangming Liu, Maosong Sun

    Abstract: This study investigates the structured generation capabilities of large language models (LLMs), focusing on producing valid JSON outputs against a given schema. Despite the widespread use of JSON in integrating language models with programs, there is a lack of comprehensive analysis and benchmarking of these capabilities. We explore various aspects of JSON generation, such as structure understandi… ▽ More

    Submitted 6 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 8 pages, 4 figures

    ACM Class: I.2.7

  10. arXiv:2502.18407  [pdf, other

    cs.CL cs.AI cs.LG

    AgentRM: Enhancing Agent Generalization with Reward Modeling

    Authors: Yu Xia, Jingru Fan, Weize Chen, Siyu Yan, Xin Cong, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Existing LLM-based agents have achieved strong performance on held-in tasks, but their generalizability to unseen tasks remains poor. Hence, some recent work focus on fine-tuning the policy model with more diverse tasks to improve the generalizability. In this work, we find that finetuning a reward model to guide the policy model is more robust than directly finetuning the policy model. Based on t… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  11. arXiv:2502.15690  [pdf, other

    cs.IR cs.AI cs.CL

    Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents

    Authors: Chuanrui Hu, Shichong Xie, Baoxin Wang, Bin Chen, Xiaofeng Cong, Jun Zhang

    Abstract: Large language models (LLMs), adopted to understand human language, drive the development of artificial intelligence (AI) web search agents. Compared to traditional search engines, LLM-powered AI search agents are capable of understanding and responding to complex queries with greater depth, enabling more accurate operations and better context recognition. However, little attention and effort has… ▽ More

    Submitted 20 December, 2024; originally announced February 2025.

  12. arXiv:2502.11678  [pdf, other

    cs.CY cs.CL

    Exploring LLM-based Student Simulation for Metacognitive Cultivation

    Authors: Haoxuan Li, Jifan Yu, Xin Cong, Yang Dang, Daniel Zhang-li, Yisi Zhan, Huiqin Liu, Zhiyuan Liu

    Abstract: Metacognitive education plays a crucial role in cultivating students' self-regulation and reflective thinking, providing essential support for those with learning difficulties through academic advising. Simulating students with insufficient learning capabilities using large language models offers a promising approach to refining pedagogical methods without ethical concerns. However, existing simul… ▽ More

    Submitted 27 April, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  13. arXiv:2411.11135  [pdf, other

    cs.CV

    Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

    Authors: Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Lanqing guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

    Abstract: We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both toy experiments and real-world diffusion m… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  14. arXiv:2411.05451  [pdf, other

    cs.SE cs.AI cs.CL

    WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

    Authors: Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Recent advancements in large language models (LLMs) have driven a revolutionary paradigm shift in process automation from Robotic Process Automation to Agentic Process Automation by automating the workflow orchestration procedure based on LLMs. However, existing LLMs (even the advanced OpenAI GPT-4o) are confined to achieving satisfactory capability in workflow orchestration. To address this limit… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  15. arXiv:2410.14641  [pdf, other

    cs.CL cs.AI

    Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs

    Authors: Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu

    Abstract: Positional bias in large language models (LLMs) hinders their ability to effectively process long inputs. A prominent example is the "lost in the middle" phenomenon, where LLMs struggle to utilize relevant information situated in the middle of the input. While prior research primarily focuses on single pieces of relevant information, real-world applications often involve multiple relevant informat… ▽ More

    Submitted 27 May, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: ACL 2025 Findings

  16. arXiv:2410.12361  [pdf, other

    cs.AI cs.CL

    Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

    Authors: Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, Maosong Sun

    Abstract: Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose… ▽ More

    Submitted 2 December, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 9 pages, 4 figures

    ACM Class: I.2.7

  17. arXiv:2410.06617  [pdf, other

    cs.CL cs.AI

    Learning Evolving Tools for Large Language Models

    Authors: Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang

    Abstract: Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the… ▽ More

    Submitted 27 February, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Camera ready version for ICLR 2025

  18. arXiv:2409.19685  [pdf, other

    cs.CV

    Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation

    Authors: Xiaofeng Cong, Jing Zhang, Yeying Jin, Junming Hou, Yu Zhao, Jie Gui, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our ap… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  19. arXiv:2409.06420  [pdf, other

    eess.IV cs.CV

    Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models

    Authors: Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Learning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  20. arXiv:2406.10167  [pdf, other

    cs.CV

    4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

    Authors: Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit Bajaj, Qixing Huang

    Abstract: This paper presents a novel approach 4DRecons that takes a single camera RGB-D sequence of a dynamic subject as input and outputs a complete textured deforming 3D model over time. 4DRecons encodes the output as a 4D neural implicit surface and presents an optimization procedure that combines a data term and two regularization terms. The data term fits the 4D implicit surface to the input partial o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  21. arXiv:2405.19684  [pdf, other

    cs.CV

    A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

    Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

    Abstract: Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics,… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: A survey on the underwater image enhancement task

  22. arXiv:2404.13830  [pdf, other

    cs.CV

    Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy

    Authors: Yu-Xin Zhang, Jie Gui, Baosheng Yu, Xiaofeng Cong, Xin Gong, Wenbing Tao, Dacheng Tao

    Abstract: Point cloud registration involves determining a rigid transformation to align a source point cloud with a target point cloud. This alignment is fundamental in applications such as autonomous driving, robotics, and medical imaging, where precise spatial correspondence is essential. Deep learning has greatly advanced point cloud registration by providing robust and efficient methods that address the… ▽ More

    Submitted 1 February, 2025; v1 submitted 21 April, 2024; originally announced April 2024.

  23. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  24. arXiv:2404.08364  [pdf, other

    cs.DC

    FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

    Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

    Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  25. arXiv:2404.05661  [pdf, other

    cs.CV

    Automatic Controllable Colorization via Imagination

    Authors: Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

    Abstract: We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://xy-cong.github.io/imagine-colorization

  26. arXiv:2404.00695  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Even-integer Quantum Hall Effect in an Oxide Caused by Hidden Rashba Effect

    Authors: Jingyue Wang, Junwei Huang, Daniel Kaplan, Xuehan Zhou, Congwei Tan, Jing Zhang, Gangjian Jin, Xuzhong Cong, Yongchao Zhu, Xiaoyin Gao, Yan Liang, Huakun Zuo, Zengwei Zhu, Ruixue Zhu, Ady Stern, Hongtao Liu, Peng Gao, Binghai Yan, Hongtao Yuan, Hailin Peng

    Abstract: In the presence of high magnetic field, quantum Hall systems usually host both even- and odd-integer quantized states because of lifted band degeneracies. Selective control of these quantized states is challenging but essential to understand the exotic ground states and manipulate the spin textures. Here, we study the quantum Hall effect in Bi2O2Se thin films. In magnetic fields as high as 50 T, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 6 Figures, 23 pages

    Journal ref: Nature Nanotechnology 19, 1452-1459 (2024)

  27. arXiv:2403.18548  [pdf, other

    cs.CV

    A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

    Authors: Xiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen

    Abstract: Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by CVPR2024

  28. arXiv:2402.16667  [pdf, other

    cs.CL cs.AI

    RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

    Authors: Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, Xiaoyin Che, Zhiyuan Liu, Maosong Sun

    Abstract: Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains underexplored. To this end, we introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code docu… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    ACM Class: I.2.7; F.2.2

  29. arXiv:2402.11453  [pdf, other

    cs.CL

    MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

    Authors: Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

    Abstract: Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  30. arXiv:2402.09205  [pdf, other

    cs.CL cs.AI cs.HC

    Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

    Authors: Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark des… ▽ More

    Submitted 15 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 26 pages, 5 tables, 6 figures

  31. arXiv:2402.03009  [pdf, other

    cs.CL cs.AI

    UniMem: Towards a Unified View of Long-Context Large Language Models

    Authors: Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun

    Abstract: Long-context processing is a critical ability that constrains the applicability of large language models (LLMs). Although there exist various methods devoted to enhancing the long-context processing ability of LLMs, they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a Unified… ▽ More

    Submitted 19 August, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: COLM 2024

  32. arXiv:2401.13996  [pdf, other

    cs.CL cs.AI

    Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution

    Authors: Cheng Qian, Shihao Liang, Yujia Qin, Yining Ye, Xin Cong, Yankai Lin, Yesai Wu, Zhiyuan Liu, Maosong Sun

    Abstract: This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution. Unlike existing methods focused on intra-task learning, ICE promotes the transfer of knowledge between tasks for genuine self-evolution, similar to human experience learning. The strategy dynamically investigates planning and e… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 18 pages, 5 figures

  33. arXiv:2401.07358  [pdf, other

    cs.CV

    Harnessing Machine Learning for Discerning AI-Generated Synthetic Images

    Authors: Yuyang Wang, Yizhi Hao, Amando Xu Cong

    Abstract: In the realm of digital media, the advent of AI-generated synthetic images has introduced significant challenges in distinguishing between real and fabricated visual content. These images, often indistinguishable from authentic ones, pose a threat to the credibility of digital media, with potential implications for disinformation and fraud. Our research addresses this challenge by employing machin… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  34. arXiv:2401.04621  [pdf, other

    cs.SE cs.AI cs.CL

    DebugBench: Evaluating Debugging Capability of Large Language Models

    Authors: Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs. To overcome these deficiencies… ▽ More

    Submitted 6 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted as Findings of ACL 2024

  35. arXiv:2312.17294  [pdf, ps, other

    cs.SE cs.AI cs.IR

    Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub

    Authors: Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) excel in traditional natural language processing tasks but struggle with problems that require complex domain-specific calculations or simulations. While equipping LLMs with external tools to build LLM-based agents can enhance their capabilities, existing approaches lack the flexibility to address diverse and ever-evolving user queries in open domains. Currently, there… ▽ More

    Submitted 9 June, 2025; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by ACL 2025 Main Conference

  36. arXiv:2312.17025  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Experiential Co-Learning of Software-Developing Agents

    Authors: Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun

    Abstract: Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents. A representative scenario is in software development, where LLM agents demonstrate efficient collaboration, task division, and assurance of software quality, markedly reducing the need for manual involvement. However, these agents frequently perf… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted to ACL 2024, https://github.com/OpenBMB/ChatDev

  37. arXiv:2311.10751  [pdf, other

    cs.RO cs.AI cs.CL

    ProAgent: From Robotic Process Automation to Agentic Process Automation

    Authors: Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: From ancient water wheels to robotic process automation (RPA), automation technology has evolved throughout history to liberate human beings from arduous tasks. Yet, RPA struggles with tasks needing human-like intelligence, especially in elaborate design of workflow construction and dynamic decision-making in workflow execution. As Large Language Models (LLMs) have emerged human-like intelligence,… ▽ More

    Submitted 23 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  38. arXiv:2310.00310  [pdf, other

    cs.CV

    An easy zero-shot learning combination: Texture Sensitive Semantic Segmentation IceHrNet and Advanced Style Transfer Learning Strategy

    Authors: Zhiyong Yang, Yuelong Zhu, Xiaoqin Zeng, Jun Zong, Xiuheng Liu, Ran Tao, Xiaofei Cong, Yufeng Yu

    Abstract: We proposed an easy method of Zero-Shot semantic segmentation by using style transfer. In this case, we successfully used a medical imaging dataset (Blood Cell Imagery) to train a model for river ice semantic segmentation. First, we built a river ice semantic segmentation dataset IPC_RI_SEG using a fixed camera and covering the entire ice melting process of the river. Second, a high-resolution tex… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 12 pages, 11 figures, submitted to Journal of Hydrology

  39. arXiv:2308.12519  [pdf, ps, other

    cs.CL

    Rational Decision-Making Agent with Internalized Utility Judgment

    Authors: Yining Ye, Xin Cong, Shizuo Tian, Yujia Qin, Chong Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications. Existing approaches to LLM-based decision-making predominantly build upon the manually-designed external performance metrics to guide the decision-making process… ▽ More

    Submitted 8 June, 2025; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Published as a conference paper at ICLR 2025

  40. arXiv:2308.10848  [pdf, other

    cs.CL

    AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

    Authors: Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou

    Abstract: Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework \framework… ▽ More

    Submitted 23 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Under review. Code at https://github.com/OpenBMB/AgentVerse/

  41. arXiv:2308.04974  [pdf

    cond-mat.mes-hall

    Interplay of valley polarized dark trion and dark exciton-polaron in monolayer WSe2

    Authors: Xin Cong, Parisa Ali Mohammadi, Mingyang Zheng, Kenji Watanabe, Takashi Taniguchi, Daniel Rhodes, Xiao-Xiao Zhang

    Abstract: The interactions between charges and excitons involve complex many-body interactions at high densities. The exciton-polaron model has been adopted to understand the Fermi sea screening of charged excitons in monolayer transition metal dichalcogenides. The results provide good agreement with absorption measurements, which are dominated by dilute bright exciton responses. Here we investigate the Fer… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  42. arXiv:2308.02103  [pdf, other

    cs.CL

    Prompt2Gaussia: Uncertain Prompt-learning for Script Event Prediction

    Authors: Shiyao Cui, Xin Cong, Jiawei Sheng, Xuebin Wang, Tingwen Liu, Jinqiao Shi

    Abstract: Script Event Prediction (SEP) aims to predict the subsequent event for a given event chain from a candidate list. Prior research has achieved great success by integrating external knowledge to enhance the semantics, but it is laborious to acquisite the appropriate knowledge resources and retrieve the script-related knowledge. In this paper, we regard public pre-trained language models as knowledge… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 16 pages

  43. arXiv:2308.01857  [pdf, other

    cs.AR

    iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

    Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

    Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  44. arXiv:2307.16789  [pdf, other

    cs.AI cs.CL cs.LG

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Authors: Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-th… ▽ More

    Submitted 3 October, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

  45. arXiv:2307.15504  [pdf, other

    cs.CL cs.AI

    Exploring Format Consistency for Instruction Tuning

    Authors: Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun

    Abstract: Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger col… ▽ More

    Submitted 8 January, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  46. arXiv:2307.07924  [pdf, other

    cs.SE cs.CL cs.MA

    ChatDev: Communicative Agents for Software Development

    Authors: Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: Software development is a complex task that necessitates cooperation among multiple members with diverse skills. Numerous studies used deep learning to improve specific phases in a waterfall model, such as design, coding, and testing. However, the deep learning model in each phase requires unique designs, leading to technical inconsistencies across various phases, which results in a fragmented and… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ACL 2024; https://github.com/OpenBMB/ChatDev

  47. arXiv:2306.05675  [pdf, other

    cs.CV

    Illumination Controllable Dehazing Network based on Unsupervised Retinex Embedding

    Authors: Jie Gui, Xiaofeng Cong, Lei He, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: On the one hand, the dehazing task is an illposedness problem, which means that no unique solution exists. On the other hand, the dehazing task should take into account the subjective factor, which is to give the user selectable dehazed images rather than a single result. Therefore, this paper proposes a multi-output dehazing network by introducing illumination controllable ability, called IC-Deha… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  48. arXiv:2305.14895  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite

    Authors: Z. X. Ling, X. J. Sun, C. Zhang, S. L. Sun, G. Jin, S. N. Zhang, X. F. Zhang, J. B. Chang, F. S. Chen, Y. F. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, Z. D. Li, P. R. Liu, Y. H. Lv, X. H. Ma, Y. J. Tang, C. B. Wang, R. J. Xie, Y. L. Xue, A. L. Yan , et al. (101 additional authors not shown)

    Abstract: The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by RAA

  49. arXiv:2305.04469  [pdf, other

    cs.GR

    HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

    Authors: Longwen Zhang, Zijun Zhao, Xinzhou Cong, Qixuan Zhang, Shuqi Gu, Yuchong Gao, Rui Zheng, Wei Yang, Lan Xu, Jingyi Yu

    Abstract: Significant advancements have been made in developing parametric models for digital humans, with various approaches concentrating on parts such as the human body, hand, or face. Nevertheless, connectors such as the neck have been overlooked in these models, with rich anatomical priors often unutilized. In this paper, we introduce HACK (Head-And-neCK), a novel parametric model for constructing the… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Find HACK model on https://github.com/ZoneLikeWonderland/HACK-Model

  50. arXiv:2304.09322  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Modality Multi-Scale Cardiovascular Disease Subtypes Classification Using Raman Image and Medical History

    Authors: Bo Yu, Hechang Chen, Chengyou Jia, Hongren Zhou, Lele Cong, Xiankai Li, Jianhui Zhuang, Xianling Cong

    Abstract: Raman spectroscopy (RS) has been widely used for disease diagnosis, e.g., cardiovascular disease (CVD), owing to its efficiency and component-specific testing capabilities. A series of popular deep learning methods have recently been introduced to learn nuance features from RS for binary classifications and achieved outstanding performance than conventional machine learning methods. However, these… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Journal ref: [J]. Expert Systems with Applications, 2023: 119965