Skip to main content

Showing 1–50 of 97 results for author: Chu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.05276  [pdf, other

    cs.CL

    Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

    Authors: Yucheng Chu, Peng He, Hang Li, Haoyu Han, Kaiqi Yang, Yu Xue, Tingting Li, Joseph Krajcik, Jiliang Tang

    Abstract: Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific require… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  2. arXiv:2504.05239  [pdf, other

    cs.CL

    LLM-based Automated Grading with Human-in-the-Loop

    Authors: Hang Li, Yucheng Chu, Kaiqi Yang, Yasemin Copur-Gencturk, Jiliang Tang

    Abstract: The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the field of education. Among various applications, automatic short answer grading (ASAG), which focuses on evaluating open-ended textual responses, has seen remarkable progress with the introduction of LLMs. These models not only enhance grading performance com… ▽ More

    Submitted 28 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  3. arXiv:2503.20215  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    Qwen2.5-Omni Technical Report

    Authors: Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, Junyang Lin

    Abstract: In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. To enable the streaming of multimodal information inputs, both audio and visual encoders utilize a block-wise processing approach. To synchronize the timest… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2503.19248  [pdf

    cond-mat.mtrl-sci cs.CV

    Limited-angle x-ray nano-tomography with machine-learning enabled iterative reconstruction engine

    Authors: Chonghang Zhao, Mingyuan Ge, Xiaogang Yang, Yong S. Chu, Hanfei Yan

    Abstract: A long-standing challenge in tomography is the 'missing wedge' problem, which arises when the acquisition of projection images within a certain angular range is restricted due to geometrical constraints. This incomplete dataset results in significant artifacts and poor resolution in the reconstructed image. To tackle this challenge, we propose an approach dubbed Perception Fused Iterative Tomograp… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  5. arXiv:2503.16919  [pdf, ps, other

    cs.IT

    Gaussian Blahut-Arimoto Algorithm for Capacity Region Calculation of Gaussian Vector Broadcast Channels

    Authors: Tian Jiao, Yanlin Geng, Yonghui Chu, Anthony Man-Cho So, Zai Yang

    Abstract: This paper is concerned with the computation of the capacity region of a continuous, Gaussian vector broadcast channel (BC) with covariance matrix constraints. Since the decision variables of the corresponding optimization problem are Gaussian distributed, they can be characterized by a finite number of parameters. Consequently, we develop new Blahut-Arimoto (BA)-type algorithms that can compute t… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 16 pages, 3 figures

  6. arXiv:2503.03282  [pdf, other

    cs.RO

    Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments

    Authors: Yijie Chu, Ziniu Wu, Yong Yue, Eng Gee Lim, Paolo Paoletti, Xiaohui Zhu

    Abstract: Unmanned Surface Vehicles (USVs) are increasingly applied to water operations such as environmental monitoring and river-map modeling. It faces a significant challenge in achieving precise autonomous docking at ports or stations, still relying on remote human control or external positioning systems for accuracy and safety which limits the full potential of human-out-of-loop deployment for USVs.Thi… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  7. arXiv:2502.15040  [pdf, other

    cs.CL cs.AI

    Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation

    Authors: Yun-Wei Chu, Kai Zhang, Christopher Malon, Martin Renqiang Min

    Abstract: Multimodal Large Language Models (MLLMs) have shown impressive performance in vision and text tasks. However, hallucination remains a major challenge, especially in fields like healthcare where details are critical. In this work, we show how MLLMs may be enhanced to support Visual RAG (V-RAG), a retrieval-augmented generation framework that incorporates both text and visual data from retrieved ima… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: GenAI4Health - AAAI '25

  8. arXiv:2502.11229  [pdf, other

    math.OC cs.LG

    Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

    Authors: Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

    Abstract: This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with emp… ▽ More

    Submitted 16 March, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  9. arXiv:2502.08529  [pdf

    cs.NI eess.SY

    Testbed Development: An Intelligent O-RAN based Cell-Free MIMO Network

    Authors: Yi Chu, Mostafa Rahmani, Josh Shackleton, David Grace, Kanapathippillai Cumanan, Hamed Ahmadi, Alister Burr

    Abstract: Cell-free multiple input multiple output (CF-MIMO) systems improve spectral and energy efficiencies using distributed access points (APs) to provide reliable service across an area equivalent to multiple conventional cells. This paper presents a novel design and implementation of a CF-MIMO network leveraging the open radio access network (O-RAN) architecture based testbed to enhance the performanc… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  10. arXiv:2502.00734  [pdf, other

    cs.SD cs.AI eess.AS

    CycleGuardian: A Framework for Automatic RespiratorySound classification Based on Improved Deep clustering and Contrastive Learning

    Authors: Yun Chu, Qiuhao Wang, Enze Zhou, Ling Fu, Qian Liu, Gang Zheng

    Abstract: Auscultation plays a pivotal role in early respiratory and pulmonary disease diagnosis. Despite the emergence of deep learning-based methods for automatic respiratory sound classification post-Covid-19, limited datasets impede performance enhancement. Distinguishing between normal and abnormal respiratory sounds poses challenges due to the coexistence of normal respiratory components and noise com… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Journal ref: Complex Intell. Syst. 11, 200 (2025)

  11. arXiv:2501.15275  [pdf

    physics.app-ph cond-mat.mes-hall cs.AR

    A Tale of Two Sides of Wafer: Physical Implementation and Block-Level PPA on Flip FET with Dual-sided Signals

    Authors: Haoran Lu, Xun Jiang, Yanbang Chu, Ziqiao Xu, Rui Guo, Wanyue Peng, Yibo Lin, Runsheng Wang, Heng Wu, Ru Huang

    Abstract: As the conventional scaling of logic devices comes to an end, functional wafer backside and 3D transistor stacking are consensus for next-generation logic technology, offering considerable design space extension for powers, signals or even devices on the wafer backside. The Flip FET (FFET), a novel transistor architecture combining 3D transistor stacking and fully functional wafer backside, was re… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: Accepted by DATE 2025

    Journal ref: Proc. of DATE 2025

  12. arXiv:2501.01349  [pdf, other

    cs.AI

    Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

    Authors: Liang He, Yougang Chu, Zhen Wu, Jianbing Zhang, Xinyu Dai, Jiajun Chen

    Abstract: Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions. However, biases within datasets can lead models to learn shortcut patterns, resulting in inaccurate assessments and hindering real-world applicability. This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entit… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  13. arXiv:2412.16838  [pdf, other

    cs.CL

    Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions

    Authors: Hang Li, Tianlong Xu, Kaiqi Yang, Yucheng Chu, Yanling Chen, Yichi Song, Qingsong Wen, Hui Liu

    Abstract: The rise of large language models (LLMs) offers new opportunities for automatic error detection in education, particularly for math word problems (MWPs). While prior studies demonstrate the promise of LLMs as error detectors, they overlook the presence of multiple valid solutions for a single MWP. Our preliminary analysis reveals a significant performance gap between conventional and alternative s… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: 12 pages, 4 figures

  14. arXiv:2412.16746  [pdf

    cs.CY cs.AI

    Beyond Partisan Leaning: A Comparative Analysis of Political Bias in Large Language Models

    Authors: Tai-Quan Peng, Kaiqi Yang, Sanguk Lee, Hang Li, Yucheng Chu, Yuping Lin, Hui Liu

    Abstract: As large language models (LLMs) become increasingly embedded in civic, educational, and political information environments, concerns about their potential political bias have grown. Prior research often evaluates such bias through simulated personas or predefined ideological typologies, which may introduce artificial framing effects or overlook how models behave in general use scenarios. This stud… ▽ More

    Submitted 10 May, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

  15. arXiv:2411.13577  [pdf, other

    eess.AS cs.CL cs.LG cs.MM cs.SD

    WavChat: A Survey of Spoken Dialogue Models

    Authors: Shengpeng Ji, Yifu Chen, Minghui Fang, Jialong Zuo, Jingyu Lu, Hanting Wang, Ziyue Jiang, Long Zhou, Shujie Liu, Xize Cheng, Xiaoda Yang, Zehan Wang, Qian Yang, Jian Li, Yidi Jiang, Jingzhen He, Yunfei Chu, Jin Xu, Zhou Zhao

    Abstract: Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. Compared to traditional three-tier cascaded spoken dialogue models that comprise speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS), modern spoken dialogue models exhibit greater intelligence. These advanced spoken dialogue model… ▽ More

    Submitted 26 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

    Comments: 60 papes, working in progress

  16. arXiv:2411.12520  [pdf

    cs.RO cs.CV

    VMGNet: A Low Computational Complexity Robotic Grasping Network Based on VMamba with Multi-Scale Feature Fusion

    Authors: Yuhao Jin, Qizhong Gao, Xiaohui Zhu, Yong Yue, Eng Gee Lim, Yuqing Chen, Prudence Wong, Yijie Chu

    Abstract: While deep learning-based robotic grasping technology has demonstrated strong adaptability, its computational complexity has also significantly increased, making it unsuitable for scenarios with high real-time requirements. Therefore, we propose a low computational complexity and high accuracy model named VMGNet for robotic grasping. For the first time, we introduce the Visual State Space into the… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  17. arXiv:2411.01803  [pdf, other

    math.OC cs.LG

    Gradient Methods with Online Scaling

    Authors: Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, Madeleine Udell

    Abstract: We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong co… ▽ More

    Submitted 5 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

  18. arXiv:2410.02165  [pdf, other

    cs.AI cs.CL

    A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization

    Authors: Yucheng Chu, Hang Li, Kaiqi Yang, Harry Shomer, Hui Liu, Yasemin Copur-Gencturk, Jiliang Tang

    Abstract: Open-ended short-answer questions (SAGs) have been widely recognized as a powerful tool for providing deeper insights into learners' responses in the context of learning analytics (LA). However, SAGs often present challenges in practice due to the high grading workload and concerns about inconsistent assessments. With recent advancements in natural language processing (NLP), automatic short-answer… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  19. Real-time Diverse Motion In-betweening with Space-time Control

    Authors: Yuchen Chu, Zeshi Yang

    Abstract: In this work, we present a data-driven framework for generating diverse in-betweening motions for kinematic characters. Our approach injects dynamic conditions and explicit motion controls into the procedure of motion transitions. Notably, this integration enables a finer-grained spatial-temporal control by allowing users to impart additional conditions, such as duration, path, style, etc., into t… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Presented at The 16th ACM SIGGRAPH Conference on Motion, Interaction, and Games (MIG '24)

  20. arXiv:2409.19283  [pdf, other

    eess.AS cs.SD

    Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

    Authors: Wenrui Liu, Zhifang Guo, Jin Xu, Yuanjun Lv, Yunfei Chu, Zhou Zhao, Junyang Lin

    Abstract: Building upon advancements in Large Language Models (LLMs), the field of audio processing has seen increased interest in training audio generation tasks with discrete audio token sequences. However, directly discretizing audio by neural audio codecs often results in sequences that fundamentally differ from text sequences. Unlike text, where text token sequences are deterministic, discrete audio to… ▽ More

    Submitted 4 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: e.g.: 15 pages, 4 figures

  21. arXiv:2409.16203  [pdf, other

    cs.SD cs.AI eess.AS

    Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech

    Authors: Yunji Chu, Yunseob Shim, Unsang Park

    Abstract: We propose FEIM-TTS, an innovative zero-shot text-to-speech (TTS) model that synthesizes emotionally expressive speech, aligned with facial images and modulated by emotion intensity. Leveraging deep learning, FEIM-TTS transcends traditional TTS systems by interpreting facial cues and adjusting to emotional nuances without dependence on labeled datasets. To address sparse audio-visual-emotional dat… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 13 pages, 3 figures, accepted to ECCV Workshop ABAW(Affective Behavior Analysis in-the-wild)7 (to be appear)

  22. arXiv:2409.09828  [pdf, other

    cs.LG cs.AI q-bio.QM

    Latent Diffusion Models for Controllable RNA Sequence Generation

    Authors: Kaixuan Huang, Yukang Yang, Kaidi Fu, Yanyi Chu, Le Cong, Mengdi Wang

    Abstract: This work presents RNAdiffusion, a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths. RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures to support a wide range of functions. We utilize pretrained BERT-type models to encode raw RNA sequences into token-level, biologically meani… ▽ More

    Submitted 2 October, 2024; v1 submitted 15 September, 2024; originally announced September 2024.

  23. arXiv:2409.04901  [pdf, other

    cs.LG

    Unlocking the Potential of Model Calibration in Federated Learning

    Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher Brinton

    Abstract: Over the past several years, various federated learning (FL) methodologies have been developed to improve model accuracy, a primary performance metric in machine learning. However, to utilize FL in practical decision-making scenarios, beyond considering accuracy, the trained model must also have a reliable confidence in each of its predictions, an aspect that has been largely overlooked in existin… ▽ More

    Submitted 22 January, 2025; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: ICLR 2025

  24. arXiv:2408.09851  [pdf, other

    cs.NI eess.SY

    ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

    Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

    Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 14 pages, 22 figures

  25. arXiv:2408.03650  [pdf, other

    cs.MM

    Towards Multimodal Emotional Support Conversation Systems

    Authors: Yuqi Chu, Lizi Liao, Zhiyuan Zhou, Chong-Wah Ngo, Richang Hong

    Abstract: The integration of conversational artificial intelligence (AI) into mental health care promises a new horizon for therapist-client interactions, aiming to closely emulate the depth and nuance of human conversations. Despite the potential, the current landscape of conversational AI is markedly limited by its reliance on single-modal data, constraining the systems' ability to empathize and provide e… ▽ More

    Submitted 19 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  26. arXiv:2407.14651  [pdf, other

    eess.IV cs.AI cs.CV

    Improving Representation of High-frequency Components for Medical Visual Foundation Models

    Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Chao Huang, Xin Gao

    Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomic… ▽ More

    Submitted 3 March, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

    Journal ref: IEEE Transactions on Medical Imaging (2025)

  27. arXiv:2407.11840  [pdf, other

    cs.CV

    MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

    Authors: Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu

    Abstract: In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: https://mvgsplatting.github.io

  28. arXiv:2407.11280  [pdf, other

    cs.AI cs.CE cs.DB cs.LG

    Intelligent Cross-Organizational Process Mining: A Survey and New Perspectives

    Authors: Yiyuan Yang, Zheshun Wu, Yong Chu, Zhenghua Chen, Zenglin Xu, Qingsong Wen

    Abstract: Process mining, as a high-level field in data mining, plays a crucial role in enhancing operational efficiency and decision-making across organizations. In this survey paper, we delve into the growing significance and ongoing trends in the field of process mining, advocating a specific viewpoint on its contents, application, and development in modern businesses and process management, particularly… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review; 13 pages, 7 figures, 2 tables

  29. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  30. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 10 September, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 26 pages, 1 figure

  31. arXiv:2407.00657  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Real-Time Music Accompaniment Separation with MMDenseNet

    Authors: Chun-Hsiang Wang, Chung-Che Wang, Jun-You Wang, Jyh-Shing Roger Jang, Yen-Hsun Chu

    Abstract: Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them unsuitable for deployment on edge devices. Moreover, these methods may produce low-quality output when the input duration is short, making them impractical for real-time applications. T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  32. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Cheng Ouyang, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 16 January, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 23 pages, 3 figures, 2 tables

  33. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Yajing Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  34. Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)

    Authors: Kaiqi Yang, Yucheng Chu, Taylor Darwin, Ahreum Han, Hang Li, Hongzhi Wen, Yasemin Copur-Gencturk, Jiliang Tang, Hui Liu

    Abstract: Teachers' mathematical content knowledge (CK) is of vital importance and need in teacher professional development (PD) programs. Computer-aided asynchronous PD systems are the most recent proposed PD techniques, which aim to help teachers improve their PD equally with fewer concerns about costs and limitations of time or location. However, current automatic CK identification methods, which serve a… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

    Journal ref: AIED 2024. Lecture Notes in Computer Science(), vol 14830. Springer, Cham

  35. Deep learning-driven pulmonary artery and vein segmentation reveals demography-associated vasculature anatomical differences

    Authors: Yuetan Chu, Gongning Luo, Longxi Zhou, Shaodong Cao, Guolin Ma, Xianglin Meng, Juexiao Zhou, Changchun Yang, Dexuan Xie, Dan Mu, Ricardo Henao, Gianluca Setti, Xigang Xiao, Lianming Wu, Zhaowen Qiu, Xin Gao

    Abstract: Pulmonary artery-vein segmentation is crucial for disease diagnosis and surgical planning and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-cost clinical… ▽ More

    Submitted 1 December, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Journal ref: Nat Commun 16, 2262 (2025)

  36. arXiv:2403.15696  [pdf, other

    cs.AI cs.CL

    MixRED: A Mix-lingual Relation Extraction Dataset

    Authors: Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

    Abstract: Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix con… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  37. Rapidly Deployable Intelligent 5G Aerial Neutral Host Networks: an O-RAN-Based Approach

    Authors: Yi Chu, David Grace, Josh Shackleton, Andy White, David Hunter, Hamed Ahmadi

    Abstract: Arxiv is acting weird and throwing error: "Bad character(s) in field Abstract." for no reason. Please refer to the manuscript.

    Submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2402.07729  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

    Authors: Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the ope… ▽ More

    Submitted 26 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Code and Data: https://github.com/OFA-Sys/AIR-Bench. Accepted by ACL 2024

  39. arXiv:2402.02225  [pdf, other

    cs.LG

    Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

    Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

    Abstract: A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) res… ▽ More

    Submitted 11 December, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: AAAI 2025

  40. Supervised Contrastive Learning based Dual-Mixer Model for Remaining Useful Life Prediction

    Authors: En Fu, Yanyan Hu, Kaixiang Peng, Yuxin Chu

    Abstract: The problem of the Remaining Useful Life (RUL) prediction, aiming at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device, has gained significant attention from researchers in recent years. In this paper, to overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approa… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Journal ref: Reliability Engineering & System Safety, 251, 110398

  41. arXiv:2401.10935  [pdf, other

    cs.HC cs.AI

    SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

    Authors: Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

    Abstract: Graphical User Interface (GUI) agents are designed to automate complex tasks on digital devices, such as smartphones and desktops. Most existing GUI agents interact with the environment through extracted structured data, which can be notably lengthy (e.g., HTML) and occasionally inaccessible (e.g., on desktops). To alleviate this issue, we propose a novel visual GUI agent -- SeeClick, which only r… ▽ More

    Submitted 22 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  42. arXiv:2401.07456  [pdf, other

    cs.CL cs.AI

    Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

    Authors: Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton

    Abstract: Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 January, 2024; originally announced January 2024.

  43. arXiv:2312.17072  [pdf, other

    cs.IR cs.LG

    An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation

    Authors: Luo Ji, Jiayu Mao, Hailong Shi, Qian Li, Yunfei Chu, Hongxia Yang

    Abstract: Online to offline recommendation strongly correlates with the user and service's spatiotemporal information, therefore calling for a higher degree of model personalization. The traditional methodology is based on a uniform model structure trained by collected centralized data, which is unlikely to capture all user patterns over different geographical areas or time periods. To tackle this challenge… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures, Accepted by ECIR 2024

  44. arXiv:2311.14925  [pdf, other

    cs.CV eess.IV

    Coordinate-based Neural Network for Fourier Phase Retrieval

    Authors: Tingyou Li, Zixin Xu, Yong S. Chu, Xiaojing Huang, Jizhou Li

    Abstract: Fourier phase retrieval is essential for high-definition imaging of nanoscale structures across diverse fields, notably coherent diffraction imaging. This study presents the Single impliCit neurAl Network (SCAN), a tool built upon coordinate neural networks meticulously designed for enhanced phase retrieval performance. Remedying the drawbacks of conventional iterative methods which are easiliy tr… ▽ More

    Submitted 8 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  45. arXiv:2311.07919  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

    Authors: Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: The code, checkpoints and demo are released at https://github.com/QwenLM/Qwen-Audio

  46. arXiv:2310.04673  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

    Authors: Zhihao Du, Jiaming Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

    Abstract: Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks, and have shown great potential as backbones for audio-and-text large language models (LLMs). Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as a… ▽ More

    Submitted 2 July, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, work in progress

  47. arXiv:2310.03281   

    cs.LG cs.AI

    A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions

    Authors: Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason Zhang, Mengdi Wang

    Abstract: The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogeno… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Sorry for withdrawing this manuscript. Because we want to major revised this manuscript, and it need some time

  48. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  49. arXiv:2309.10836  [pdf, other

    cs.CV

    CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

    Authors: Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang , et al. (3 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (CMR) has emerged as a valuable diagnostic tool for cardiac diseases. However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images. There has been growing interest in deep learning-based CMR imaging algorithms that can reconstruct high-quality images from highly under-sampled k-space data. However,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  50. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables