Skip to main content

Showing 1–50 of 183 results for author: Zeng, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05750  [pdf, ps, other

    cs.CL

    DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities

    Authors: Jing Yang Lee, Hamed Bonab, Nasser Zalmout, Ming Zeng, Sanket Lokegaonkar, Colin Lockard, Binxuan Huang, Ritesh Sarkhel, Haodong Wang

    Abstract: Large Language Models (LLMs) are increasingly employed in multi-turn conversational tasks, yet their pre-training data predominantly consists of continuous prose, creating a potential mismatch between required capabilities and training paradigms. We introduce a novel approach to address this discrepancy by synthesizing conversational data from existing text corpora. We present a pipeline that tran… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Accepted at SIGDIAL 2025

  2. arXiv:2506.06156  [pdf, ps, other

    cs.IT eess.SP

    Resource Allocation for Pinching-Antenna Systems: State-of-the-Art, Key Techniques and Open Issues

    Authors: Ming Zeng, Ji Wang, Octavia A. Dobre, Zhiguo Ding, George K. Karagiannidis, Robert Schober, H. Vincent Poor

    Abstract: Pinching antennas have emerged as a promising technology for reconfiguring wireless propagation environments, particularly in high-frequency communication systems operating in the millimeter-wave and terahertz bands. By enabling dynamic activation at arbitrary positions along a dielectric waveguide, pinching antennas offer unprecedented channel reconfigurability and the ability to provide line-of-… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: submitted to IEEE WCM, 8 pages, 5 figures

  3. arXiv:2506.04463  [pdf, ps, other

    cs.CL

    Aligning Large Language Models with Implicit Preferences from User-Generated Content

    Authors: Zhaoxuan Tan, Zheng Li, Tianyi Liu, Haodong Wang, Hyokun Yun, Ming Zeng, Pei Chen, Zhihan Zhang, Yifan Gao, Ruijie Wang, Priyanka Nigam, Bing Yin, Meng Jiang

    Abstract: Learning from preference feedback is essential for aligning large language models (LLMs) with human values and improving the quality of generated responses. However, existing preference learning methods rely heavily on curated data from humans or advanced LLMs, which is costly and difficult to scale. In this work, we present PUGC, a novel framework that leverages implicit human Preferences in unla… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 Main Conference

  4. arXiv:2506.00355  [pdf, ps, other

    cs.NI

    Sum Rate Maximization for Wireless Powered Pinching-Antenna Systems (PASS)

    Authors: Yixuan Li, Ji Wang, Ming Zeng, Yuanwei Liu

    Abstract: In this letter, we investigate a novel wireless powered communication network (WPCN) enabled by a pinching-antenna system (PASS), in which multiple pinching antennas (PAs) are activated on a waveguide to establish strong line-of-sight (LoS) links with multiple devices. In this system, time division multiple access (TDMA) and non-orthogonal multiple access (NOMA) protocols are adopted to fully expl… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  5. arXiv:2505.21928  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

    Authors: Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li , et al. (2 additional authors not shown)

    Abstract: Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterati… ▽ More

    Submitted 6 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  6. arXiv:2505.20636  [pdf, ps, other

    cs.IT

    Frequency-Selective Modeling and Analysis for OFDM-Integrated Wideband Pinching-Antenna Systems

    Authors: Jian Xiao, Ji Wang, Ming Zeng, Yuanwei Liu, George K. Karagiannidis

    Abstract: This letter investigates the integration of pinching-antenna systems (PASS) with orthogonal frequency division multiplexing (OFDM) to ensure their compatibility and to explore the frequency-selective behavior inherent to PASS. First, an end-to-end channel model for OFDM PASS is proposed based on electromagnetic-compliant modeling of waveguides and coupled-mode theory, which includes frequency-depe… ▽ More

    Submitted 7 July, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  7. arXiv:2505.14904  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Design for Downlink Pinching-Antenna Systems with QoS Guarantee

    Authors: Ming Zeng, Ji Wang, Gui Zhou, Fang Fang, Xianbin Wang

    Abstract: Pinching antennas have recently garnered significant attention due to their ability to dynamically reconfigure wireless propagation environments. Despite notable advancements in this area, the exploration of energy efficiency (EE) maximization in pinching-antenna systems remains relatively underdeveloped. In this paper, we address the EE maximization problem in a downlink time-division multiple ac… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: submitted to IEEE TVT, 5 pages, 4 figures;

  8. arXiv:2505.12654  [pdf, other

    cs.CL cs.AI

    Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals

    Authors: Yuxin Lin, Yinglin Zheng, Ming Zeng, Wangzheng Shi

    Abstract: This paper addresses the gap in predicting turn-taking and backchannel actions in human-machine conversations using multi-modal signals (linguistic, acoustic, and visual). To overcome the limitation of existing datasets, we propose an automatic data collection pipeline that allows us to collect and annotate over 210 hours of human conversation videos. From this, we construct a Multi-Modal Face-to-… ▽ More

    Submitted 20 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    Comments: Accepected by ACL 2025

  9. arXiv:2505.07555  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Resource Allocation for NOMA-Assisted Uplink Pinching-Antenna Systems

    Authors: Ming Zeng, Xingwang Li, Ji Wang, Gaojian Huang, Octavia A. Dobre, Zhiguo Ding

    Abstract: The pinching-antenna architecture has emerged as a promising solution for reconfiguring wireless propagation environments and enhancing system performance. While prior research has primarily focused on sum-rate maximization or transmit power minimization of pinching-antenna systems, the critical aspect of energy efficiency (EE) has received limited attention. Given the increasing importance of EE… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: submitted IEEE WCL; 4 figures; 5 pages

  10. arXiv:2505.06307  [pdf, ps, other

    cs.CR cs.AI

    Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought

    Authors: Mingfei Zeng, Ming Xie, Xixi Zheng, Chunhai Li, Chuan Zhang, Liehuang Zhu

    Abstract: The rapid development of Internet of Things (IoT) technology has transformed people's way of life and has a profound impact on both production and daily activities. However, with the rapid advancement of IoT technology, the security of IoT devices has become an unavoidable issue in both research and applications. Although some efforts have been made to detect or mitigate IoT security vulnerabiliti… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  11. arXiv:2505.00549  [pdf, other

    cs.IT eess.SP

    Sum Rate Maximization for NOMA-Assisted Uplink Pinching-Antenna Systems

    Authors: Ming Zeng, Ji Wang, Xingwang Li, Gongpu Wang, Octavia A. Dobre, Zhiguo Ding

    Abstract: In this paper, we investigate an uplink communication scenario in which multiple users communicate with an access point (AP) employing non-orthogonal multiple access (NOMA). A pinching antenna, which can be activated at an arbitrary point along a dielectric waveguide, is deployed at the AP to dynamically reconfigure user channels. The objective is to maximize the system sum rate by jointly optimiz… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 4 figures; submitted to IEEE COMML

  12. arXiv:2504.05578  [pdf, other

    cs.IT eess.SP

    Recent Advances in Near-Field Beam Training and Channel Estimation for XL-MIMO Systems

    Authors: Ming Zeng, Ji Wang, Xingwang Li, Wanming Hao, Zheng Chu, Wenwu Xie, Xianbin Wang, Quoc-Viet Pham

    Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is a key technology for next-generation wireless communication systems. By deploying significantly more antennas than conventional massive MIMO systems, XL-MIMO promises substantial improvements in spectral efficiency. However, due to the drastically increased array size, the conventional planar wave channel model is no longer accurate… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Submitted to IEEE Wireless Commmunications; 8 pages; 6 figures

  13. arXiv:2504.00889  [pdf, ps, other

    math.AG cs.SC

    Brackets and Projective Geometry in Macaulay2

    Authors: Dalton Bidleman, Timothy Duff, Jack Kendrick, Michael Zeng

    Abstract: We introduce the Brackets package for the computer algebra system Macaulay2, which provides convenient syntax for computations involving the classical invariants of the special linear group. We describe our implementation of bracket rings and Grassmann-Cayley algebras, and illustrate basic functionality such as the straightening algorithm on examples from projective and enumerative geometry.

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 9 pages

    MSC Class: 14-04

  14. arXiv:2503.23224  [pdf, other

    cs.SE

    SmartFL: Semantics Based Probabilistic Fault Localization

    Authors: Yiqian Wu, Yujie Liu, Yi Yin, Muhan Zeng, Zhentao Ye, Xin Zhang, Yingfei Xiong, Lu Zhang

    Abstract: Testing-based fault localization has been a research focus in software engineering in the past decades. It localizes faulty program elements based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consi… ▽ More

    Submitted 3 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: Submitted to IEEE Transactions on Software Engineering Code: https://github.com/toledosakasa/SMARTFL This update corrects the author's name

  15. arXiv:2503.06624  [pdf, other

    cs.CV

    Chameleon: On the Scene Diversity and Domain Variety of AI-Generated Videos Detection

    Authors: Meiyu Zeng, Xingming Liao, Canyu Chen, Nankai Lin, Zhuowei Wang, Chong Chen, Aimin Yang

    Abstract: Artificial intelligence generated content (AIGC), known as DeepFakes, has emerged as a growing concern because it is being utilized as a tool for spreading disinformation. While much research exists on identifying AI-generated text and images, research on detecting AI-generated videos is limited. Existing datasets for AI-generated videos detection exhibit limitations in terms of diversity, complex… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 17 pages

  16. arXiv:2503.05995  [pdf, other

    cs.RO

    ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features

    Authors: Shan An, Shipeng Dai, Mahrukh Ansari, Yu Liang, Ming Zeng, Konstantinos A. Tsintotas, Changhong Fu, Hong Zhang

    Abstract: Accurate hand pose estimation is vital in robotics, advancing dexterous manipulation in human-computer interaction. Toward this goal, this paper presents ReJSHand (which stands for Refined Joint and Skeleton Features), a cutting-edge network formulated for real-time hand pose estimation and mesh reconstruction. The proposed framework is designed to accurately predict 3D hand gestures under real-ti… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  17. arXiv:2502.18523  [pdf, other

    eess.IV cs.AI cs.CV

    End-to-End Deep Learning for Structural Brain Imaging: A Unified Framework

    Authors: Yao Su, Keqi Han, Mingjie Zeng, Lichao Sun, Liang Zhan, Carl Yang, Lifang He, Xiangnan Kong

    Abstract: Brain imaging analysis is fundamental in neuroscience, providing valuable insights into brain structure and function. Traditional workflows follow a sequential pipeline-brain extraction, registration, segmentation, parcellation, network generation, and classification-treating each step as an independent task. These methods rely heavily on task-specific training data and expert intervention to corr… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  18. arXiv:2501.09465  [pdf, other

    cs.CV cs.AI cs.DC

    RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection

    Authors: Jianrui Shi, Yong Zhao, Zeyang Cui, Xiaoming Shen, Minhang Zeng, Xiaojie Liu

    Abstract: Object detection plays a crucial role in smart video analysis, with applications ranging from autonomous driving and security to smart cities. However, achieving real-time object detection on edge devices presents significant challenges due to their limited computational resources and the high demands of deep neural network (DNN)-based detection models, particularly when processing high-resolution… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  19. arXiv:2412.14838  [pdf, other

    cs.CL

    DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs

    Authors: Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding

    Abstract: Efficient KV cache management in LLMs is crucial for long-context tasks like RAG and summarization. Existing KV cache compression methods enforce a fixed pattern, neglecting task-specific characteristics and reducing the retention of essential information. However, we observe distinct activation patterns across layers in various tasks, highlighting the need for adaptive strategies tailored to each… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  20. arXiv:2412.13522  [pdf, other

    cs.CR

    Privacy-Preserving Cyberattack Detection in Blockchain-Based IoT Systems Using AI and Homomorphic Encryption

    Authors: Bui Duc Manh, Chi-Hieu Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Ming Zeng, Quoc-Viet Pham

    Abstract: This work proposes a novel privacy-preserving cyberattack detection framework for blockchain-based Internet-of-Things (IoT) systems. In our approach, artificial intelligence (AI)-driven detection modules are strategically deployed at blockchain nodes to identify real-time attacks, ensuring high accuracy and minimal delay. To achieve this efficiency, the model training is conducted by a cloud servi… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  21. arXiv:2412.06575  [pdf, other

    cs.CL

    Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

    Authors: Min Zeng, Caiquan Liu, Shiqi Zhang, Li Xie, Chen Sang, Xiaoxin Chen

    Abstract: In recent years, the use of large language models (LLMs) for text classification has attracted widespread attention. Despite this, the classification accuracy of LLMs has not yet universally surpassed that of smaller models. LLMs can enhance their performance in text classification through fine-tuning. However, existing data quality research based on LLMs is challenging to apply directly to solve… ▽ More

    Submitted 9 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted by COLING 2025(main, long paper)

  22. arXiv:2411.15500  [pdf, other

    cs.ET cs.CL

    MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

    Authors: Yifan Wu, Min Zeng, Yang Li, Yang Zhang, Min Li

    Abstract: Most current molecular language models transfer the masked language model or image-text generation model from natural language processing to molecular field. However, molecules are not solely characterized by atom/bond symbols; they encapsulate important physical/chemical properties. Moreover, normal language models bring grammar rules that are irrelevant for understanding molecules. In this study… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  23. arXiv:2411.15066  [pdf, other

    cs.CV cs.LG

    SPAC-Net: Rethinking Point Cloud Completion with Structural Prior

    Authors: Zizhao Wu, Jian Shi, Xuan Deng, Cheng Zhang, Genfu Yang, Ming Zeng, Yunhai Wang

    Abstract: Point cloud completion aims to infer a complete shape from its partial observation. Many approaches utilize a pure encoderdecoder paradigm in which complete shape can be directly predicted by shape priors learned from partial scans, however, these methods suffer from the loss of details inevitably due to the feature abstraction issues. In this paper, we propose a novel framework,termed SPAC-Net, t… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  24. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  25. arXiv:2410.20838  [pdf, other

    cs.CL

    A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction

    Authors: Nankai Lin, Meiyu Zeng, Wentao Huang, Shengyi Jiang, Lixian Xiao, Aimin Yang

    Abstract: Currently, the majority of research in grammatical error correction (GEC) is concentrated on universal languages, such as English and Chinese. Many low-resource languages lack accessible evaluation corpora. How to efficiently construct high-quality evaluation corpora for GEC in low-resource languages has become a significant challenge. To fill these gaps, in this paper, we present a framework for… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  26. arXiv:2410.19723  [pdf, other

    cs.LG cs.AI

    Sparse Decomposition of Graph Neural Networks

    Authors: Yaochen Hu, Mai Zeng, Ge Zhang, Pavel Rumiantsev, Liheng Ma, Yingxue Zhang, Mark Coates

    Abstract: Graph Neural Networks (GNN) exhibit superior performance in graph representation learning, but their inference cost can be high, due to an aggregation operation that can require a memory fetch for a very large number of nodes. This inference cost is the major obstacle to deploying GNN models with \emph{online prediction} to reflect the potentially dynamic node features. To address this, we propose… ▽ More

    Submitted 15 March, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

  27. arXiv:2410.11404  [pdf, other

    cs.CV

    MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description

    Authors: Jiawei Mo, Yixuan Chen, Rifen Lin, Yongkang Ni, Min Zeng, Xiping Hu, Min Li

    Abstract: Despite continuous advancements in deep learning for understanding human motion, existing models often struggle to accurately identify action timing and specific body parts, typically supporting only single-round interaction. Such limitations in capturing fine-grained motion details reduce their effectiveness in motion understanding tasks. In this paper, we propose MoChat, a multimodal large langu… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  28. arXiv:2409.04016  [pdf, other

    cs.SD eess.AS

    Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

    Authors: Jiaqi Li, Dongmei Wang, Xiaofei Wang, Yao Qian, Long Zhou, Shujie Liu, Midia Yousefi, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yanqing Liu, Junkun Chen, Sheng Zhao, Jinyu Li, Zhizheng Wu, Michael Zeng

    Abstract: Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In this work, we examine codec tokens within SLM framework for speech generation to provide insights for effective codec design. We retrain existing hig… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT-2024

  29. arXiv:2408.15556  [pdf, other

    cs.CV

    Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

    Authors: Wenbin Wang, Liang Ding, Minyan Zeng, Xiabin Zhou, Li Shen, Yong Luo, Dacheng Tao

    Abstract: Multimodal large language models (MLLMs) have experienced significant advancements recently, but still struggle to recognize and interpret intricate details in high-resolution (HR) images effectively. While state-of-the-art (SOTA) MLLMs claim to process images at 4K resolution, existing MLLM benchmarks only support up to 2K, leaving the capabilities of SOTA models on true HR images largely unteste… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  30. arXiv:2408.14354  [pdf, other

    cs.SE cs.AI cs.CL

    SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

    Authors: Daoguang Zan, Zhirong Huang, Ailun Yu, Shaoxin Lin, Yifan Shi, Wei Liu, Dong Chen, Zongshuai Qi, Hao Yu, Lei Yu, Dezhi Ran, Muhan Zeng, Bo Shen, Pan Bian, Guangtai Liang, Bei Guan, Pengjie Huang, Tao Xie, Yongji Wang, Qianxiang Wang

    Abstract: GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This work is in progress

  31. Guiding the Last Centimeter: Novel Anatomy-Aware Probe Servoing for Standardized Imaging Plane Navigation in Robotic Lung Ultrasound

    Authors: Xihan Ma, Mingjie Zeng, Jeffrey C. Hill, Beatrice Hoffmann, Ziming Zhang, Haichong K. Zhang

    Abstract: Navigating the ultrasound (US) probe to the standardized imaging plane (SIP) for image acquisition is a critical but operator-dependent task in conventional freehand diagnostic US. Robotic US systems (RUSS) offer the potential to enhance imaging consistency by leveraging real-time US image feedback to optimize the probe pose, thereby reducing reliance on operator expertise. However, determining th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.01304  [pdf, other

    cs.CL cs.AI cs.SE

    CodeR: Issue Resolving with Multi-Agent and Task Graphs

    Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

    Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: https://github.com/NL2Code/CodeR

  33. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 30 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Neural Information Processing Systems, poster

  34. arXiv:2405.16041  [pdf, other

    cs.LG cs.AI

    Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

    Authors: Zhenzhong Wang, Zehui Lin, Wanyu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan

    Abstract: Providing explainable molecular property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop… ▽ More

    Submitted 1 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  35. arXiv:2405.13082  [pdf, other

    cs.LG cs.AI cs.CV

    A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

    Authors: Haocong Rao, Minlin Zeng, Xuejiao Zhao, Chunyan Miao

    Abstract: Recent years have witnessed an increasing global population affected by neurodegenerative diseases (NDs), which traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring. As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs. The current advances in artificial intelligence (AI) models enable automatic… ▽ More

    Submitted 6 February, 2025; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by Neurocomputing journal. Article: 57 pages, citing 290 papers. Appendix: 30 pages. A up-to-date resource (papers, data, etc.) of this survey (AI4NDD) is provided at https://github.com/minlinzeng/AI4NDD-Survey

  36. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  37. arXiv:2405.00716  [pdf, other

    cs.CL cs.AI

    Large Language Models in the Clinic: A Comprehensive Benchmark

    Authors: Fenglin Liu, Zheng Li, Hongjian Zhou, Qingyu Yin, Jingfeng Yang, Xianfeng Tang, Chen Luo, Ming Zeng, Haoming Jiang, Yifan Gao, Priyanka Nigam, Sreyashi Nag, Bing Yin, Yining Hua, Xuan Zhou, Omid Rohanian, Anshul Thakur, Lei Clifton, David A. Clifton

    Abstract: The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first coll… ▽ More

    Submitted 16 October, 2024; v1 submitted 25 April, 2024; originally announced May 2024.

    Comments: Accepted at EMNLP 2024 Main Conference

  38. arXiv:2404.13984  [pdf, other

    cs.CV

    RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

    Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng

    Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. In this paper, we introduce RHanDS, a conditional diffusion-based framework designed to refine malformed hands by utilizing decoupled structure and style guidance. The hand mesh reconstructed from the malformed hand offers structure guidan… ▽ More

    Submitted 14 April, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 15 December, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Neural Information Processing Systems 2024, poster

  40. arXiv:2402.17613  [pdf, other

    cs.CL

    Neural Automated Writing Evaluation with Corrective Feedback

    Authors: Izia Xiaoxiao Wang, Xihan Wu, Edith Coates, Min Zeng, Jiexin Kuang, Siliang Liu, Mengyang Qiu, Jungyeul Park

    Abstract: The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural lan… ▽ More

    Submitted 6 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Supported by the SoTL Seed Program at UBC

  41. arXiv:2402.15930  [pdf, ps, other

    cs.CL

    Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency

    Authors: Min Zeng, Jiexin Kuang, Mengyang Qiu, Jayoung Song, Jungyeul Park

    Abstract: The writing examples of English language learners may be different from those of native speakers. Given that there is a significant differences in second language (L2) learners' error types by their proficiency levels, this paper attempts to reduce overcorrection by examining the interaction between LLM's performance and L2 language proficiency. Our method focuses on zero-shot and few-shot prompti… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: To appear in LREC-COLING 2024, short paper (preprint)

  42. arXiv:2402.07383  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

    Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

    Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

  43. arXiv:2311.06242  [pdf, other

    cs.CV

    Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    Authors: Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

    Abstract: We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity.… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  44. arXiv:2311.03282  [pdf, ps, other

    cs.IT eess.SP

    Resource Allocation for RIS-Empowered Wireless Communications: Low-Complexity and Robust Designs

    Authors: Ming Zeng, Wanming Hao, Zhangjie Peng, Zheng Chu, Xingwang Li, Changsheng You, Cunhua Pan

    Abstract: This article delves into advancements in resource allocation techniques tailored for systems utilizing reconfigurable intelligent surfaces (RIS), with a primary focus on achieving low-complexity and resilient solutions. The investigation of low-complexity approaches for RIS holds significant relevance, primarily owing to the intricate characteristics inherent in RIS-based systems and the need of d… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE WCM

  45. arXiv:2310.17800  [pdf, other

    cs.LG

    Interacting Diffusion Processes for Event Sequence Forecasting

    Authors: Mai Zeng, Florence Regol, Mark Coates

    Abstract: Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step… ▽ More

    Submitted 19 July, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: camera ready version for ICML

  46. arXiv:2309.06917  [pdf, other

    cs.CL cs.AI cs.LG

    Continual Learning with Dirichlet Generative-based Rehearsal

    Authors: Min Zeng, Wei Xue, Qifeng Liu, Yike Guo

    Abstract: Recent advancements in data-driven task-oriented dialogue systems (ToDs) struggle with incremental learning due to computational constraints and time-consuming issues. Continual Learning (CL) attempts to solve this by avoiding intensive pre-training, but it faces the problem of catastrophic forgetting (CF). While generative-based rehearsal CL methods have made significant strides, generating pseud… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  47. arXiv:2307.14936  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

    Authors: Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang

    Abstract: Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Preprint

  48. arXiv:2307.06123  [pdf, other

    cs.CR cs.LG

    SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

    Authors: Jun Niu, Xiaoyan Zhu, Moxuan Zeng, Ge Zhang, Qingyang Zhao, Chunhui Huang, Yangming Zhang, Suyu An, Yangzhong Wang, Xinghui Yue, Zhipeng He, Weihao Guo, Kuo Shen, Peng Liu, Yulong Shen, Xiaohong Jiang, Jianfeng Ma, Yuqing Zhang

    Abstract: Membership inference (MI) attacks threaten user privacy through determining if a given data example has been used to train a target model. However, it has been increasingly recognized that the "comparing different MI attacks" methodology used in the existing works has serious limitations. Due to these limitations, we found (through the experiments in this work) that some comparison results reporte… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: 21 pages,15 figures

  49. arXiv:2307.04024  [pdf, other

    cs.LG cs.CR

    Robust Ranking Explanations

    Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie

    Abstract: Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-n… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted to IMLH (Interpretable ML in Healthcare) workshop at ICML 2023. arXiv admin note: substantial text overlap with arXiv:2212.14106

  50. arXiv:2307.00252  [pdf, other

    cs.LG cs.AI cs.SC math.AG

    An ML approach to resolution of singularities

    Authors: Gergely Bérczi, Honglu Fan, Mingcong Zeng

    Abstract: The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and… ▽ More

    Submitted 22 August, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: To appear in Proceedings of the 40th International Conference on Machine Learning TAG Workshop (ICML-TAG 2023)