Skip to main content

Showing 1–50 of 134 results for author: Yan, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05108  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

    Authors: Yuyi Zhang, Peirong Zhang, Zhenhua Yang, Pengyu Yan, Yongxin Shi, Pengwei Liu, Fengjun Guo, Lianwen Jin

    Abstract: Historical documents represent an invaluable cultural heritage, yet have undergone significant degradation over time through tears, water erosion, and oxidation. Existing Historical Document Restoration (HDR) methods primarily focus on single modality or limited-size restoration, failing to meet practical needs. To fill this gap, we present a full-page HDR dataset (FPHDR) and a novel automated HDR… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Journal ref: ACL 2025 main

  2. arXiv:2507.04284  [pdf, ps, other

    eess.SP cs.IT

    High-Availability Integrity Monitoring for Multi-Constellation GNSS Navigation with Non-Gaussian Errors

    Authors: Penggao Yan, Ronghe Jin, Junyi Zhang, Cheng-Wei Wang, Li-Ta Hsu

    Abstract: Global navigation satellite systems (GNSS) are essential for aviation, requiring strict integrity monitoring to alert users to hazardously misleading information. Conventional receiver autonomous integrity monitoring (RAIM) and advanced RAIM (ARAIM) rely heavily on Gaussian models in bounding nominal errors, which can be overly conservative with real-world non-Gaussian errors with heavy tails, suc… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Submitted to IEEE Transactions on Instrumentation and Measurement

  3. arXiv:2507.03892  [pdf

    cs.HC

    Is AI mingling or bullying me? Exploring User Interactions with a Chatbot in China

    Authors: Nuo Chen, Pu Yan, Jia Li, Qixuan Zhao

    Abstract: Since its viral emergence in early 2024, Comment Robert-a Weibo-launched social chatbot-has gained widespread attention on the Chinese Internet for its unsolicited and unpredictable comments on user posts. Unlike conventional chatbots that respond only to user prompts, Robert autonomously intervenes in public discourse, representing a novel form of AI-driven social media engagement. This study exa… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  4. arXiv:2507.00472  [pdf, ps, other

    cs.CV

    ARIG: Autoregressive Interactive Head Generation for Real-time Conversations

    Authors: Ying Guo, Xi Liu, Cheng Zhen, Pengfei Yan, Xiaoming Wei

    Abstract: Face-to-face communication, as a common human activity, motivates the research on interactive head generation. A virtual agent can generate motion responses with both listening and speaking capabilities based on the audio or motion signals of the other user and itself. However, previous clip-wise generation paradigm or explicit listener/speaker generator-switching methods have limitations in futur… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: ICCV 2025. Homepage: https://jinyugy21.github.io/ARIG/

  5. arXiv:2506.19055  [pdf, ps, other

    eess.IV cs.CV

    Xray2Xray: World Model from Chest X-rays with Volumetric Context

    Authors: Zefan Yang, Xinrui Song, Xuanang Xu, Yongyi Shi, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

    Abstract: Chest X-rays (CXRs) are the most widely used medical imaging modality and play a pivotal role in diagnosing diseases. However, as 2D projection images, CXRs are limited by structural superposition, which constrains their effectiveness in precise disease diagnosis and risk prediction. To address the limitations of 2D CXRs, this study introduces Xray2Xray, a novel World Model that learns latent repr… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  6. arXiv:2506.18679  [pdf, ps, other

    cs.CV

    MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation

    Authors: Ruicheng Zhang, Yu Sun, Zeyu Zhang, Jinai Li, Xiaofan Liu, Au Hoi Fan, Haowei Guo, Puxin Yan

    Abstract: We introduce MARL-MambaContour, the first contour-based medical image segmentation framework based on Multi-Agent Reinforcement Learning (MARL). Our approach reframes segmentation as a multi-agent cooperation task focused on generate topologically consistent object-level contours, addressing the limitations of traditional pixel-based methods which could lack topological constraints and holistic st… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  7. arXiv:2506.04755  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning

    Authors: Shenshen Li, Kaiyuan Deng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Heng Tao Shen, Xing Xu

    Abstract: While multi-modal large language models (MLLMs) have made significant progress in complex reasoning tasks via reinforcement learning, it is commonly believed that extensive training data is necessary for improving multi-modal reasoning ability, inevitably leading to data redundancy and substantial computational costs. However, can smaller high-value datasets match or outperform full corpora for mu… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  8. arXiv:2504.14582  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu , et al. (86 additional authors not shown)

    Abstract: This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that ach… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

  9. arXiv:2504.08291  [pdf, other

    cs.CV

    DreamFuse: Adaptive Image Fusion with Diffusion Transformer

    Authors: Junjia Huang, Pengxiang Yan, Jiyang Liu, Jie Wu, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li

    Abstract: Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Unlike existing methods that directly insert objects into the background, adaptive and interactive fusion remains a challenging yet appealing task. It requires the foreground to adjust or interact with the background context, enabling more coherent integration. To… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: under review

  10. arXiv:2503.23179  [pdf, other

    eess.IV cs.CV

    OncoReg: Medical Image Registration for Oncological Challenges

    Authors: Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

    Abstract: In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves w… ▽ More

    Submitted 1 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: 26 pages, 6 figures

  11. arXiv:2503.12838  [pdf, other

    cs.CV

    DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode

    Authors: Junjia Huang, Pengxiang Yan, Jinhang Cai, Jiyang Liu, Zhao Wang, Yitong Wang, Xinglong Wu, Guanbin Li

    Abstract: Text-driven image generation using diffusion models has recently gained significant attention. To enable more flexible image manipulation and editing, recent research has expanded from single image generation to transparent layer generation and multi-layer compositions. However, existing approaches often fail to provide a thorough exploration of multi-layer structures, leading to inconsistent inte… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Under submission

  12. arXiv:2503.11792  [pdf, other

    cs.CV

    StyleMorpheus: A Style-Based 3D-Aware Morphable Face Model

    Authors: Peizhi Yan, Rabab K. Ward, Dan Wang, Qiang Tang, Shan Du

    Abstract: For 3D face modeling, the recently developed 3D-aware neural rendering methods are able to render photorealistic face images with arbitrary viewing directions. The training of the parametric controllable 3D-aware face models, however, still relies on a large-scale dataset that is lab-collected. To address this issue, this paper introduces "StyleMorpheus", the first style-based neural 3D Morphable… ▽ More

    Submitted 14 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: 13 pages, work was completed in 2023

  13. arXiv:2503.01411  [pdf, other

    cs.LG cs.AI eess.SY

    Learning Actionable World Models for Industrial Process Control

    Authors: Peng Yan, Ahmed Abdulkadir, Gerrit A. Schatte, Giulia Aguzzi, Joonsu Gha, Nikola Pascher, Matthias Rosenthal, Yunlong Gao, Benjamin F. Grewe, Thilo Stadelmann

    Abstract: To go from (passive) process monitoring to active process control, an effective AI system must learn about the behavior of the complex system from very limited training data, forming an ad-hoc digital twin with respect to process inputs and outputs that captures the consequences of actions on the process's world. We propose a novel methodology based on learning world models that disentangles proce… ▽ More

    Submitted 25 April, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted by SDS 2025

    ACM Class: I.2.0; I.2.4

  14. arXiv:2502.05142  [pdf, ps, other

    eess.IV cs.CV

    Chest X-ray Foundation Model with Global and Local Representations Integration

    Authors: Zefan Yang, Xuanang Xu, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

    Abstract: Chest X-ray (CXR) is the most frequently ordered imaging test, supporting diverse clinical tasks from thoracic disease detection to postoperative monitoring. However, task-specific classification models are limited in scope, require costly labeled data, and lack generalizability to out-of-distribution datasets. To address these challenges, we introduce CheXFound, a self-supervised vision foundatio… ▽ More

    Submitted 19 June, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted by IEEE Transactions on Medical Imaging (TMI)

  15. arXiv:2501.16150  [pdf, ps, other

    cs.AI cs.HC eess.SY

    A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

    Authors: Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, Thilo Stadelmann

    Abstract: Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices - such as desktops, mobile phones, and web platforms - given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for ever… ▽ More

    Submitted 4 June, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  16. arXiv:2501.12844  [pdf, other

    cs.CV cs.AI

    GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model for Multi-organ Segmentation

    Authors: Ruicheng Zhang, Haowei Guo, Zeyu Zhang, Puxin Yan, Shen Zhao

    Abstract: Multi-organ segmentation is a critical yet challenging task due to complex anatomical backgrounds, blurred boundaries, and diverse morphologies. This study introduces the Gradient-aware Adaptive Momentum Evolution Deep Snake (GAMED-Snake) model, which establishes a novel paradigm for contour-based segmentation by integrating gradient-based learning with adaptive momentum evolution mechanisms. The… ▽ More

    Submitted 2 March, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  17. arXiv:2501.10609  [pdf, other

    eess.SP cs.IT

    Universal Discrete Filtering with Lookahead or Delay

    Authors: Pumiao Yan, Jiwon Jeong, Naomi Sagan, Tsachy Weissman

    Abstract: We consider the universal discrete filtering problem, where an input sequence generated by an unknown source passes through a discrete memoryless channel, and the goal is to estimate its components based on the output sequence with limited lookahead or delay. We propose and establish the universality of a family of schemes for this setting. These schemes are induced by universal Sequential Probabi… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  18. arXiv:2501.05109  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    EquiBoost: An Equivariant Boosting Approach to Molecular Conformation Generation

    Authors: Yixuan Yang, Xingyu Fang, Zhaowen Cheng, Pengju Yan, Xiaolin Li

    Abstract: Molecular conformation generation plays key roles in computational drug design. Recently developed deep learning methods, particularly diffusion models have reached competitive performance over traditional cheminformatical approaches. However, these methods are often time-consuming or require extra support from traditional methods. We propose EquiBoost, a boosting model that stacks several equivar… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  19. arXiv:2501.00216  [pdf, other

    cs.DC

    FedCod: An Efficient Communication Protocol for Cross-Silo Federated Learning with Coding

    Authors: Peishen Yan, Jun Li, Hao Wang, Tao Song, Yang Hua, Lu Peng, Haihui Zhou, Haibing Guan

    Abstract: Federated Learning (FL) is an innovative distributed machine learning paradigm that enables multiple parties to collaboratively train a model without sharing their raw data, thereby preserving data privacy. Communication efficiency concerns arise in cross-silo FL, particularly due to the network heterogeneity and fluctuations associated with geo-distributed silos. Most existing solutions to these… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

  20. arXiv:2412.11082  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    EquiFlow: Equivariant Conditional Flow Matching with Optimal Transport for 3D Molecular Conformation Prediction

    Authors: Qingwen Tian, Yuxin Xu, Yixuan Yang, Zhen Wang, Ziqi Liu, Pengju Yan, Xiaolin Li

    Abstract: Molecular 3D conformations play a key role in determining how molecules interact with other molecules or protein surfaces. Recent deep learning advancements have improved conformation prediction, but slow training speeds and difficulties in utilizing high-degree features limit performance. We propose EquiFlow, an equivariant conditional flow matching model with optimal transport. EquiFlow uniquely… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 11 pages,5 figures

  21. arXiv:2412.02177  [pdf, other

    cs.CV cs.AI

    Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

    Authors: R. Mahmood, K. C. L. Wong, D. M. Reyes, N. D'Souza, L. Shi, J. Wu, P. Kaviani, M. Kalra, G. Wang, P. Yan, T. Syeda-Mahmood

    Abstract: With the emergence of large-scale vision-language models, realistic radiology reports may be generated using only medical images as input guided by simple prompts. However, their practical utility has been limited due to the factual errors in their description of findings. In this paper, we propose a novel model for explainable fact-checking that identifies errors in findings and their locations i… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Report number: RPI12

  22. arXiv:2412.01031  [pdf, other

    cs.CL cs.AI cs.CV

    Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings

    Authors: Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood

    Abstract: Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, an… ▽ More

    Submitted 22 May, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

  23. arXiv:2411.18281  [pdf, other

    cs.CV

    MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation

    Authors: Haopeng Fang, Di Qiu, Binjie Mao, Pengfei Yan, He Tang

    Abstract: Recent advancements in personalized Text-to-Video (T2V) generation highlight the importance of integrating character-specific identities and actions. However, previous T2V models struggle with identity consistency and controllable motion dynamics, mainly due to limited fine-grained facial and action-based textual prompts, and datasets that overlook key human attributes and actions. To address thes… ▽ More

    Submitted 30 November, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  24. arXiv:2411.16809  [pdf, ps, other

    cs.CR cs.AI

    Blockchain Meets LLMs: A Living Survey on Bidirectional Integration

    Authors: Jianghao Gong, Peiqi Yan, Yue Zhang, Hongli An, Logan Liu

    Abstract: In the domain of large language models, considerable advancements have been attained in multimodal large language models and explainability research, propelled by the continuous technological progress and innovation. Nonetheless, security and privacy concerns continue to pose as prominent challenges in this field. The emergence of blockchain technology, marked by its decentralized nature, tamper-p… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  25. arXiv:2411.14001  [pdf, other

    cs.CV

    Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

    Authors: Yuntao Shou, Peiqiang Yan, Xingjian Yuan, Xiangyong Cao, Qian Zhao, Deyu Meng

    Abstract: In recent years, histopathological whole slide image (WSI)- based survival analysis has attracted much attention in medical image analysis. In practice, WSIs usually come from different hospitals or laboratories, which can be seen as different domains, and thus may have significant differences in imaging equipment, processing procedures, and sample sources. These differences generally result in la… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 12 pages, 6 figures

  26. arXiv:2411.13775  [pdf, other

    cs.CL cs.AI

    Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels

    Authors: Jianhao Yan, Pingchuan Yan, Yulong Chen, Jing Li, Xianchao Zhu, Yue Zhang

    Abstract: This study presents a comprehensive evaluation of GPT-4's translation capabilities compared to human translators of varying expertise levels. Through systematic human evaluation using the MQM schema, we assess translations across three language pairs (Chinese$\longleftrightarrow$English, Russian$\longleftrightarrow$English, and Chinese$\longleftrightarrow$Hindi) and three domains (News, Technology… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: Work in progress

  27. arXiv:2410.15135  [pdf, other

    cs.CL

    TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation

    Authors: Xiaocheng Zhang, Xi Wang, Yifei Lu, Jianing Wang, Zhuangzhuang Ye, Mengjiao Bao, Peng Yan, Xiaohong Su

    Abstract: Although fact verification remains fundamental, explanation generation serves as a critical enabler for trustworthy fact-checking systems by producing interpretable rationales and facilitating comprehensive verification processes. However, current benchmarks have limitations that include the lack of impact assessment, insufficient high-quality explanatory annotations, and an English-centric bias.… ▽ More

    Submitted 23 May, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

  28. arXiv:2410.14961  [pdf, other

    cs.LG cs.AI cs.SI

    LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

    Authors: Tianqianjin Lin, Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Jun Lin, Weikang Yuan, Junjie Cao, Changlong Sun, Xiaozhong Liu

    Abstract: Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often inco… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: under review

  29. arXiv:2410.14170  [pdf, other

    cs.IR cs.AI cs.MM

    Personalized Image Generation with Large Multimodal Models

    Authors: Yiyan Xu, Wenjie Wang, Yang Zhang, Biao Tang, Peng Yan, Fuli Feng, Xiangnan He

    Abstract: Personalized content filtering, such as recommender systems, has become a critical infrastructure to alleviate information overload. However, these systems merely filter existing content and are constrained by its limited diversity, making it difficult to meet users' varied content needs. To address this limitation, personalized content generation has emerged as a promising direction with broad ap… ▽ More

    Submitted 2 February, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Accepted for publication in WWW'25

  30. arXiv:2410.02507  [pdf, other

    cs.AI cs.CL

    Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration

    Authors: Weikang Yuan, Junjie Cao, Zhuoren Jiang, Yangyang Kang, Jun Lin, Kaisong Song, tianqianjin lin, Pengwei Yan, Changlong Sun, Xiaozhong Liu

    Abstract: Large Language Models (LLMs) could struggle to fully understand legal theories and perform complex legal reasoning tasks. In this study, we introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities. We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability (MALR). MA… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    ACM Class: I.2.7

  31. arXiv:2409.16147  [pdf, other

    cs.CV

    Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

    Authors: Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du

    Abstract: Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this… ▽ More

    Submitted 6 November, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: 11 pages, Accepted by WACV 2025 in Round 1

  32. arXiv:2409.11212  [pdf, other

    cs.CL

    Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization

    Authors: Jianing Wang, Yang Zhou, Xiaocheng Zhang, Mengjiao Bao, Peng Yan

    Abstract: Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs), but the performance is still underwhelming due to too much noisy preference data yielded in the loop. To combat this issue, we present an \textbf{U}ncertainty-enhanced \textbf{P}reference \textbf{O}ptimization (UPO) framework to make the LLM self-evolve with reliable feedb… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 17 pages

  33. arXiv:2409.06135  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

    Authors: Qi Yang, Binjie Mao, Zili Wang, Xing Nie, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated a… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 14 pages, 11 figures

  34. OFL-W3: A One-shot Federated Learning System on Web 3.0

    Authors: Linshan Jiang, Moming Duan, Bingsheng He, Yulin Sun, Peishen Yan, Yang Hua, Tao Song

    Abstract: Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning without compromising privacy or security. Concurrently, the advancement of blockchain technology and decentralized applications (DApps) within Web 3.… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: VLDB 24 demo paper

  35. arXiv:2407.17267  [pdf, other

    cs.CV

    M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

    Authors: Junyu Li, Ye Zhang, Wen Shu, Xiaobing Feng, Yingchun Wang, Pengju Yan, Xiaolin Li, Chulin Sha, Min He

    Abstract: Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of int… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 25pages,5figures

  36. arXiv:2407.05810  [pdf, other

    cs.AI cs.HC

    Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT

    Authors: Xinrui Song, Jiajin Zhang, Pingkun Yan, Juergen Hahn, Uwe Kruger, Hisham Mohamed, Ge Wang

    Abstract: The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduat… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  37. arXiv:2407.03658  [pdf, other

    cs.CL

    GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels

    Authors: Jianhao Yan, Pingchuan Yan, Yulong Chen, Judy Li, Xianchao Zhu, Yue Zhang

    Abstract: This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We a… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  38. arXiv:2407.00557  [pdf, other

    cs.CV

    Explaining Chest X-ray Pathology Models using Textual Concepts

    Authors: Vijay Sadashivaiah, Pingkun Yan, James A. Hendler

    Abstract: Deep learning models have revolutionized medical imaging and diagnostics, yet their opaque nature poses challenges for clinical adoption and trust. Amongst approaches to improve model interpretability, concept-based explanations aim to provide concise and human-understandable explanations of any arbitrary classifier. However, such methods usually require a large amount of manually collected data w… ▽ More

    Submitted 22 October, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted at NeurIPS'24 workshop on Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond (AIM-FM)

  39. arXiv:2407.00541  [pdf

    cs.CL cs.AI cs.IR

    Answering real-world clinical questions using large language model based systems

    Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

    Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

  40. arXiv:2407.00514  [pdf, ps, other

    cs.PL

    Combining Classical and Probabilistic Independence Reasoning to Verify the Security of Oblivious Algorithms (Extended Version)

    Authors: Pengbo Yan, Toby Murray, Olga Ohrimenko, Van-Thuan Pham, Robert Sison

    Abstract: We consider the problem of how to verify the security of probabilistic oblivious algorithms formally and systematically. Unfortunately, prior program logics fail to support a number of complexities that feature in the semantics and invariant needed to verify the security of many practical probabilistic oblivious algorithms. We propose an approach based on reasoning over perfectly oblivious approxi… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  41. arXiv:2406.19631  [pdf, other

    cs.LG cs.DC

    Personalized Interpretation on Federated Learning: A Virtual Concepts approach

    Authors: Peng Yan, Guodong Long, Jing Jiang, Michael Blumenstein

    Abstract: Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixt… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  42. arXiv:2406.00258  [pdf, other

    cs.CV cs.AI

    Artemis: Towards Referential Understanding in Complex Videos

    Authors: Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

    Abstract: Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language quest… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 19 pages, 14 figures. Code and data are available at https://github.com/qiujihao19/Artemis

  43. arXiv:2405.18533  [pdf, other

    eess.IV cs.CV

    Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba

    Authors: Zefan Yang, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

    Abstract: Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower level… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Early accepted paper for MICCAI 2024

  44. arXiv:2405.15728  [pdf, other

    cs.CV

    Disease-informed Adaptation of Vision-Language Models

    Authors: Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

    Abstract: In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely abs… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Early Accepted by MICCAI 2024

  45. arXiv:2405.13467  [pdf, other

    cs.CV

    AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

    Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan

    Abstract: With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  46. arXiv:2405.11344  [pdf

    cs.LG cs.AI

    LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

    Authors: Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan

    Abstract: In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling… ▽ More

    Submitted 13 July, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  47. arXiv:2404.08450  [pdf, other

    cs.CV

    Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

    Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

    Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

  48. arXiv:2404.08361  [pdf, other

    cs.IR cs.AI

    Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework

    Authors: Dongbo Xi, Zhen Chen, Yuexian Wang, He Cui, Chong Peng, Fuzhen Zhuang, Peng Yan

    Abstract: Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing cha… ▽ More

    Submitted 14 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 8 pages

  49. arXiv:2404.03181  [pdf, other

    cs.CV

    MonoCD: Monocular 3D Object Detection with Complementary Depths

    Authors: Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan

    Abstract: Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formu… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  50. arXiv:2404.02655  [pdf, other

    cs.CL

    Calibrating the Confidence of Large Language Models by Eliciting Fidelity

    Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

    Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More

    Submitted 9 October, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: EMNLP 2024