Skip to main content

Showing 1–50 of 253 results for author: Long, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04923  [pdf, ps, other

    cs.SI

    Advancement of Circular Economy Through Interdisciplinary Collaboration: A Bibliometric Approach

    Authors: Keita Nishimoto, Koji Kimita, Shinsuke Murakami, Yin Long, Kimitaka Asatani, Ichiro Sakata

    Abstract: Since the European Union introduced its Circular Economy (CE) Action Plan in 2015, CE research has expanded rapidly. However, the structure of this emerging field - both in terms of its constituent disciplines and researcher dynamics - remains poorly understood. To address this gap, we analyze over 25,000 CE-related publications from Scopus by combining conventional bibliometric approaches with ad… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.04661  [pdf, ps, other

    cs.RO

    DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics

    Authors: Yayu Long, Kewei Chen, Long Jin, Mingsheng Shang

    Abstract: We introduce Dynamic Retrieval-Augmented Expert Networks (DRAE), a groundbreaking architecture that addresses the challenges of lifelong learning, catastrophic forgetting, and task adaptation by combining the dynamic routing capabilities of Mixture-of-Experts (MoE); leveraging the knowledge-enhancement power of Retrieval-Augmented Generation (RAG); incorporating a novel hierarchical reinforcement… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted to the main conference of the Annual Meeting of the Association for Computational Linguistics (ACL 2025)

    ACM Class: I.2.9; I.2.6

  3. arXiv:2507.03343  [pdf, ps, other

    cs.CL eess.AS

    SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge

    Authors: Yuxiang Mei, Yuang Zheng, Dongxing Xu, Yanhua Long

    Abstract: This paper describes SHNU multilingual conversational speech recognition system (SHNU-mASR, team name-"maybe"), submitted to Track 1 of the INTERSPEECH 2025 MLC-SLM Challenge. Our system integrates a parallel-speech-encoder architecture with a large language model (LLM) to form a unified multilingual ASR framework. The parallel-speech-encoder consists of two pre-trained encoders, the Whisper-large… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted by Interspeech 2025 MLC-SLM workshop

  4. ARMOUR US: Android Runtime Zero-permission Sensor Usage Monitoring from User Space

    Authors: Yan Long, Jiancong Cui, Yuqing Yang, Tobias Alam, Zhiqiang Lin, Kevin Fu

    Abstract: This work investigates how to monitor access to Android zero-permission sensors which could cause privacy leakage to users. Moreover, monitoring such sensitive access allows security researchers to characterize potential sensor abuse patterns. Zero-permission sensors such as accelerometers have become an indispensable part of Android devices. The critical information they provide has attracted ext… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    ACM Class: K.6.5; D.4.6

    Journal ref: WiSec 2025: 18th ACM Conference on Security and Privacy in Wireless and Mobile Networks

  5. arXiv:2507.00802  [pdf, ps, other

    cs.CV

    TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency

    Authors: Minye Shao, Xingyu Miao, Haoran Duan, Zeyu Wang, Jingkun Chen, Yawen Huang, Xian Wu, Jingjing Deng, Yang Long, Yefeng Zheng

    Abstract: 3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted to MICCAI 2025 (this version is not peer-reviewed; it is the preprint version). MICCAI proceedings DOI will appear here

  6. Rethinking Brain Tumor Segmentation from the Frequency Domain Perspective

    Authors: Minye Shao, Zeyu Wang, Haoran Duan, Yawen Huang, Bing Zhai, Shizheng Wang, Yang Long, Yefeng Zheng

    Abstract: Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient considerati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Medical Imaging

  7. arXiv:2506.09343  [pdf, other

    cs.CV cs.RO

    CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation

    Authors: Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong

    Abstract: Correct use of electrical appliances has significantly improved human life quality. Unlike simple tools that can be manipulated with common sense, different parts of electrical appliances have specific functions defined by manufacturers. If we want the robot to heat bread by microwave, we should enable them to review the microwave manual first. From the manual, it can learn about component functio… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 Highlight

  8. arXiv:2506.07637  [pdf, ps, other

    cs.CV cs.LG

    HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition

    Authors: Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li, Shan Yin

    Abstract: Automated pollen recognition is vital to paleoclimatology, biodiversity monitoring, and public health, yet conventional methods are hampered by inefficiency and subjectivity. Existing deep learning models often struggle to achieve the requisite localization accuracy for microscopic targets like pollen, which are characterized by their minute size, indistinct edges, and complex backgrounds. To over… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures, 2 tables. The dataset at https://www.kaggle.com/datasets/ayinven/hieraedgenetintegratesdatasets. The models at https://huggingface.co/datasets/AyinMostima/HieraEdgeNetintegratesdatasets. The source code in at https://github.com/AyinMostima/PalynoKit

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.4.9; I.5.4

  9. Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

    Authors: Hongyu Wang, Yonghao Long, Yueyao Chen, Hon-Chi Yip, Markus Scheppach, Philip Wai-Yan Chiu, Yeung Yam, Helen Mei-Ling Meng, Qi Dou

    Abstract: Endoscopic Submucosal Dissection (ESD) is a well-established technique for removing epithelial lesions. Predicting dissection trajectories in ESD videos offers significant potential for enhancing surgical skill training and simplifying the learning process, yet this area remains underexplored. While imitation learning has shown promise in acquiring skills from expert demonstrations, challenges per… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  10. arXiv:2505.12439  [pdf, ps, other

    cs.CL

    Learning to Play Like Humans: A Framework for LLM Adaptation in Interactive Fiction Games

    Authors: Jinming Zhang, Yunfei Long

    Abstract: Interactive Fiction games (IF games) are where players interact through natural language commands. While recent advances in Artificial Intelligence agents have reignited interest in IF games as a domain for studying decision-making, existing approaches prioritize task-specific performance metrics over human-like comprehension of narrative context and gameplay logic. This work presents a cognitivel… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  11. arXiv:2505.12288  [pdf, ps, other

    eess.AS cs.SD

    Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement

    Authors: Ziling Huang, Haixin Guan, Yanhua Long

    Abstract: Conventional speech enhancement (SE) aims to improve speech perception and intelligibility by suppressing noise without requiring enrollment speech as reference, whereas personalized SE (PSE) addresses the cocktail party problem by extracting a target speaker's speech using enrollment speech. While these two tasks tackle different yet complementary challenges in speech signal processing, they ofte… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: Submitted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  12. arXiv:2505.11889  [pdf, other

    eess.AS cs.AI cs.SD

    Exploring the Potential of SSL Models for Sound Event Detection

    Authors: Hanfang Cui, Longfei Song, Li Li, Dongxing Xu, Yanhua Long

    Abstract: Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED), yet their synergistic potential remains underexplored. This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED. We propose a framework that combines heterogeneous SSL representations (e.g., BEATs, HuBERT, WavLM) through three fusion… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 27 pages, 5 figures, submitted to the Journal of King Saud University - Computer and Information Sciences (under review)

    ACM Class: I.5.4; I.2.10; H.5.5

  13. arXiv:2505.01888  [pdf, ps, other

    cs.CV

    Rethinking Score Distilling Sampling for 3D Editing and Generation

    Authors: Xingyu Miao, Haoran Duan, Yang Long, Jungong Han

    Abstract: Score Distillation Sampling (SDS) has emerged as a prominent method for text-to-3D generation by leveraging the strengths of 2D diffusion models. However, SDS is limited to generation tasks and lacks the capability to edit existing 3D assets. Conversely, variants of SDS that introduce editing capabilities often can not generate new 3D assets effectively. In this work, we observe that the processes… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  14. arXiv:2504.15211  [pdf, ps, other

    cs.AI stat.AP

    Position: Bayesian Statistics Facilitates Stakeholder Participation in Evaluation of Generative AI

    Authors: Yanan Long

    Abstract: The evaluation of Generative AI (GenAI) systems plays a critical role in public policy and decision-making, yet existing methods are often limited by reliance on benchmark-driven, point-estimate comparisons that fail to capture uncertainty and broader societal impacts. This paper argues for the use of Bayesian statistics as a principled framework to address these challenges. Bayesian methods enabl… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: To be presented at ACM CHI 2025 workshop STAIG

  15. arXiv:2504.12204  [pdf, other

    cs.CV cs.MM

    Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling

    Authors: Zhihua Wang, Yu Long, Qinghua Lin, Kai Zhang, Yazhu Zhang, Yuming Fang, Li Liu, Xiaochun Cao

    Abstract: Deep neural networks (DNNs) have recently become the leading method for low-light image enhancement (LLIE). However, despite significant progress, their outputs may still exhibit issues such as amplified noise, incorrect white balance, or unnatural enhancements when deployed in real world applications. A key challenge is the lack of diverse, large scale training data that captures the complexities… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 17 pages, 11 tables, 10 figures

  16. arXiv:2504.09086  [pdf, other

    cs.CV

    RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

    Authors: Yunfei Long, Abhinav Kumar, Xiaoming Liu, Daniel Morris

    Abstract: Radar hits reflect from points on both the boundary and internal to object outlines. This results in a complex distribution of radar hits that depends on factors including object category, size, and orientation. Current radar-camera fusion methods implicitly account for this with a black-box neural network. In this paper, we explicitly utilize a radar hit distribution model to assist fusion. First… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  17. arXiv:2504.05411  [pdf, other

    cs.CL cs.LG

    Less but Better: Parameter-Efficient Fine-Tuning of Large Language Models for Personality Detection

    Authors: Lingzhi Shen, Yunfei Long, Xiaohao Cai, Guanming Chen, Imran Razzak, Shoaib Jameel

    Abstract: Personality detection automatically identifies an individual's personality from various data sources, such as social media texts. However, as the parameter scale of language models continues to grow, the computational cost becomes increasingly difficult to manage. Fine-tuning also grows more complex, making it harder to justify the effort and reliably predict outcomes. We introduce a novel paramet… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  18. arXiv:2504.03738  [pdf, other

    cs.LG cs.AI cs.CV

    Attention in Diffusion Model: A Survey

    Authors: Litao Hua, Fan Liu, Jie Su, Xingyu Miao, Zizhou Ouyang, Zeyu Wang, Runze Hu, Zhenyu Wen, Bing Zhai, Yang Long, Haoran Duan, Yuan Zhou

    Abstract: Attention mechanisms have become a foundational component in diffusion models, significantly influencing their capacity across a wide range of generative and discriminative tasks. This paper presents a comprehensive survey of attention within diffusion models, systematically analysing its roles, design patterns, and operations across different modalities and tasks. We propose a unified taxonomy th… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  19. arXiv:2504.02146  [pdf, other

    cs.CL cs.LG

    LL4G: Self-Supervised Dynamic Optimization for Graph-Based Personality Detection

    Authors: Lingzhi Shen, Yunfei Long, Xiaohao Cai, Guanming Chen, Yuhan Wang, Imran Razzak, Shoaib Jameel

    Abstract: Graph-based personality detection constructs graph structures from textual data, particularly social media posts. Current methods often struggle with sparse or noisy data and rely on static graphs, limiting their ability to capture dynamic changes between nodes and relationships. This paper introduces LL4G, a self-supervised framework leveraging large language models (LLMs) to optimize graph neura… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  20. arXiv:2504.00061  [pdf

    cs.CL cs.AI

    Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology

    Authors: Dou Liu, Ying Long, Sophia Zuoqiu, Tian Tang, Rong Yin

    Abstract: Effective physician-patient communications in pre-diagnostic environments, and most specifically in complex and sensitive medical areas such as infertility, are critical but consume a lot of time and, therefore, cause clinic workflows to become inefficient. Recent advancements in Large Language Models (LLMs) offer a potential solution for automating conversational medical history-taking and improv… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: Accepted by IISE 2025 annual conference

  21. arXiv:2503.23217  [pdf, other

    cs.DS

    Length-Constrained Directed Expander Decomposition and Length-Constrained Vertex-Capacitated Flow Shortcuts

    Authors: Bernhard Haeupler, Yaowei Long, Thatchaphol Saranurak, Shengzhe Wang

    Abstract: We show the existence of length-constrained expander decomposition in directed graphs and undirected vertex-capacitated graphs. Previously, its existence was shown only in undirected edge-capacitated graphs [Haeupler-Räcke-Ghaffari, STOC 2022; Haeupler-Hershkowitz-Tan, FOCS 2024]. Along the way, we prove the multi-commodity maxflow-mincut theorems for length-constrained expansion in both directed… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  22. arXiv:2503.21080  [pdf, other

    cs.CL

    EQ-Negotiator: An Emotion-Reasoning LLM Agent in Credit Dialogues

    Authors: Yuhan Liu, Yunbo Long

    Abstract: While large language model (LLM)-based chatbots have been applied for effective engagement in credit dialogues, their capacity for dynamic emotional expression remains limited. Current agents primarily rely on passive empathy rather than affective reasoning. For instance, when faced with persistent client negativity, the agent should employ strategic emotional adaptation by expressing measured ang… ▽ More

    Submitted 31 March, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  23. arXiv:2503.17994  [pdf, other

    cs.CL cs.AI

    Instructing the Architecture Search for Spatial-temporal Sequence Forecasting with LLM

    Authors: Xin Xue, Haoyi Zhou, Tianyu Chen, Shuai Zhang, Yizhou Long, Jianxin Li

    Abstract: Spatial-temporal sequence forecasting (STSF) is a long-standing research problem with widespread real-world applications. Neural architecture search (NAS), which automates the neural network design, has been shown effective in tackling the STSF problem. However, the existing NAS methods for STSF focus on generating architectures in a time-consuming data-driven fashion, which heavily limits their a… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  24. arXiv:2503.17530  [pdf, other

    cs.CV

    FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

    Authors: Tianyu Zhang, Fan Wan, Haoran Duan, Kevin W. Tong, Jingjing Deng, Yang Long

    Abstract: Spatial convolution is fundamental in constructing deep Convolutional Neural Networks (CNNs) for visual recognition. While dynamic convolution enhances model accuracy by adaptively combining static kernels, it incurs significant computational overhead, limiting its deployment in resource-constrained environments such as federated edge computing. To address this, we propose Fast Multi-Attention Dyn… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  25. arXiv:2503.12220  [pdf, other

    cs.LG cs.CR

    PA-CFL: Privacy-Adaptive Clustered Federated Learning for Transformer-Based Sales Forecasting on Heterogeneous Retail Data

    Authors: Yunbo Long, Liming Xu, Ge Zheng, Alexandra Brintrup

    Abstract: Federated learning (FL) enables retailers to share model parameters for demand forecasting while maintaining privacy. However, heterogeneous data across diverse regions, driven by factors such as varying consumer behavior, poses challenges to the effectiveness of federated learning. To tackle this challenge, we propose Privacy-Adaptive Clustered Federated Learning (PA-CFL) tailored for demand fore… ▽ More

    Submitted 21 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

  26. arXiv:2503.12156  [pdf, other

    cs.LG cs.SI

    Efficient and Privacy-Preserved Link Prediction via Condensed Graphs

    Authors: Yunbo Long, Liming Xu, Alexandra Brintrup

    Abstract: Link prediction is crucial for uncovering hidden connections within complex networks, enabling applications such as identifying potential customers and products. However, this research faces significant challenges, including concerns about data privacy, as well as high computational and storage costs, especially when dealing with large-scale networks. Condensed graphs, which are much smaller than… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  27. arXiv:2503.07539  [pdf, other

    cs.CL

    XIFBench: Evaluating Large Language Models on Multilingual Instruction Following

    Authors: Zhenyu Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yaoyin Zhang, Xuchen Wei, Juntao Li, Min Zhang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable instruction-following capabilities across various applications. However, their performance in multilingual settings remains poorly understood, as existing evaluations lack fine-grained constraint analysis. We introduce XIFBench, a comprehensive constraint-based benchmark for assessing multilingual instruction-following abilities of LLMs, fe… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  28. arXiv:2503.04469  [pdf

    physics.med-ph cs.LG

    An artificially intelligent magnetic resonance spectroscopy quantification method: Comparison between QNet and LCModel on the cloud computing platform CloudBrain-MRS

    Authors: Meijin Lin, Lin Guo, Dicheng Chen, Jianshu Chen, Zhangren Tu, Xu Huang, Jianhua Wang, Ji Qi, Yuan Long, Zhiguo Huang, Di Guo, Xiaobo Qu, Haiwei Han

    Abstract: Objctives: This work aimed to statistically compare the metabolite quantification of human brain magnetic resonance spectroscopy (MRS) between the deep learning method QNet and the classical method LCModel through an easy-to-use intelligent cloud computing platform CloudBrain-MRS. Materials and Methods: In this retrospective study, two 3 T MRI scanners Philips Ingenia and Achieva collected 61 and… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  29. arXiv:2503.04453  [pdf

    stat.ML cs.LG physics.med-ph

    Reproducibility Assessment of Magnetic Resonance Spectroscopy of Pregenual Anterior Cingulate Cortex across Sessions and Vendors via the Cloud Computing Platform CloudBrain-MRS

    Authors: Runhan Chen, Meijin Lin, Jianshu Chen, Liangjie Lin, Jiazheng Wang, Xiaoqing Li, Jianhua Wang, Xu Huang, Ling Qian, Shaoxing Liu, Yuan Long, Di Guo, Xiaobo Qu, Haiwei Han

    Abstract: Given the need to elucidate the mechanisms underlying illnesses and their treatment, as well as the lack of harmonization of acquisition and post-processing protocols among different magnetic resonance system vendors, this work is to determine if metabolite concentrations obtained from different sessions, machine models and even different vendors of 3 T scanners can be highly reproducible and be p… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  30. arXiv:2503.04153  [pdf, other

    cs.AI

    KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease

    Authors: Yongchao Long, Chao Yang, Gongzheng Tang, Jinwei Wang, Zhun Sui, Yuxi Zhou, Shenda Hong, Luxia Zhang

    Abstract: Privacy-preserving medical decision support for kidney disease requires localized deployment of large language models (LLMs) while maintaining clinical reasoning capabilities. Current solutions face three challenges: 1) Cloud-based LLMs pose data security risks; 2) Local model deployment demands technical expertise; 3) General LLMs lack mechanisms to integrate medical knowledge. Retrieval-augmente… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: Corresponding authors: [email protected]; [email protected]; [email protected]

  31. arXiv:2503.02519  [pdf, ps, other

    cs.CL

    Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent

    Authors: Xingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yong Xu, Min Zhang

    Abstract: Large language model (LLM) agents typically adopt a step-by-step reasoning framework, in which they interleave the processes of thinking and acting to accomplish the given task. However, this paradigm faces a deep-rooted one-pass issue whereby each generated intermediate thought is plugged into the trajectory regardless of its correctness, which can cause irreversible error propagation. To address… ▽ More

    Submitted 3 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  32. arXiv:2503.02161  [pdf, other

    cs.LG

    LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation

    Authors: Yunbo Long, Liming Xu, Alexandra Brintrup

    Abstract: Synthetic tabular data have widespread applications in industrial domains such as healthcare, finance, and supply chains, owing to their potential to protect privacy and mitigate data scarcity. However, generating realistic synthetic tabular data while preserving inter-column logical relationships remains a significant challenge for the existing generative models. To address these challenges, we p… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  33. arXiv:2503.00407  [pdf

    cs.LG cs.DC

    Asynchronous Personalized Federated Learning through Global Memorization

    Authors: Fan Wan, Yuchen Li, Xueqi Qiu, Rui Sun, Leyuan Zhang, Xingyu Miao, Tianyu Zhang, Haoran Duan, Yang Long

    Abstract: The proliferation of Internet of Things devices and advances in communication technology have unleashed an explosion of personal data, amplifying privacy concerns amid stringent regulations like GDPR and CCPA. Federated Learning offers a privacy preserving solution by enabling collaborative model training across decentralized devices without centralizing sensitive data. However, statistical hetero… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  34. arXiv:2502.05320  [pdf, other

    cs.CV

    Towards Fine-grained Renal Vasculature Segmentation: Full-Scale Hierarchical Learning with FH-Seg

    Authors: Yitian Long, Zhongze Wu, Xiu Su, Lining Yu, Ruining Deng, Haichun Yang, Yuankai Huo

    Abstract: Accurate fine-grained segmentation of the renal vasculature is critical for nephrological analysis, yet it faces challenges due to diverse and insufficiently annotated images. Existing methods struggle to accurately segment intricate regions of the renal vasculature, such as the inner and outer walls, arteries and lesions. In this paper, we introduce FH-Seg, a Full-scale Hierarchical Learning Fram… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  35. arXiv:2502.04055  [pdf, ps, other

    cs.LG

    Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation

    Authors: Yunbo Long, Liming Xu, Alexandra Brintrup

    Abstract: Current evaluations of synthetic tabular data mainly focus on how well joint distributions are modeled, often overlooking the assessment of their effectiveness in preserving realistic event sequences and coherent entity relationships across columns.This paper proposes three evaluation metrics designed to assess the preservation of logical relationships among columns in synthetic tabular data. We v… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  36. Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields

    Authors: Xingyu Miao, Haoran Duan, Yang Bai, Tejal Shah, Jun Song, Yang Long, Rajiv Ranjan, Ling Shao

    Abstract: In this work, we propose a method that leverages CLIP feature distillation, achieving efficient 3D segmentation through language guidance. Unlike previous methods that rely on multi-scale CLIP features and are limited by processing speed and storage requirements, our approach aims to streamline the workflow by directly and effectively distilling dense CLIP features, thereby achieving precise segme… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence

  37. arXiv:2501.16925  [pdf, other

    cs.CL

    Detecting harassment and defamation in cyberbullying with emotion-adaptive training

    Authors: Peiling Yi, Arkaitz Zubiaga, Yunfei Long

    Abstract: Existing research on detecting cyberbullying incidents on social media has primarily concentrated on harassment and is typically approached as a binary classification task. However, cyberbullying encompasses various forms, such as denigration and harassment, which celebrities frequently face. Furthermore, suitable training data for these diverse forms of cyberbullying remains scarce. In this study… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  38. arXiv:2501.16884  [pdf, ps, other

    cs.CL cs.AI

    Irony Detection, Reasoning and Understanding in Zero-shot Learning

    Authors: Peiling Yi, Yuhan Xia, Yunfei Long

    Abstract: The generalisation of irony detection faces significant challenges, leading to substantial performance deviations when detection models are applied to diverse real-world scenarios. In this study, we find that irony-focused prompts, as generated from our IDADP framework for LLMs, can not only overcome dataset-specific limitations but also generate coherent, human-readable reasoning, transforming ir… ▽ More

    Submitted 11 June, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  39. arXiv:2501.15696  [pdf, other

    cs.LG

    Random Walk Guided Hyperbolic Graph Distillation

    Authors: Yunbo Long, Liming Xu, Stefan Schoepf, Alexandra Brintrup

    Abstract: Graph distillation (GD) is an effective approach to extract useful information from large-scale network structures. However, existing methods, which operate in Euclidean space to generate condensed graphs, struggle to capture the inherent tree-like geometry of real-world networks, resulting in distilled graphs with limited task-specific information for downstream tasks. Furthermore, these methods… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  40. arXiv:2501.12380  [pdf, other

    cs.CV cs.AI cs.CL

    MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

    Authors: Yilun Zhao, Lujing Xie, Haowei Zhang, Guo Gan, Yitao Long, Zhiyuan Hu, Tongyan Hu, Weiyuan Chen, Chuhan Li, Junyang Song, Zhijian Xu, Chengye Wang, Weifeng Pan, Ziyao Shangguan, Xiangru Tang, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan

    Abstract: We introduce MMVU, a comprehensive expert-level, multi-discipline benchmark for evaluating foundation models in video understanding. MMVU includes 3,000 expert-annotated questions spanning 27 subjects across four core disciplines: Science, Healthcare, Humanities & Social Sciences, and Engineering. Compared to prior benchmarks, MMVU features three key advancements. First, it challenges models to ap… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  41. arXiv:2501.11274  [pdf, other

    eess.AS cs.SD

    SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

    Authors: Ziling Huang, Haixin Guan, Haoran Wei, Yanhua Long

    Abstract: Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech. However, these approaches suffer from significant model complexity and often underutilize enrollment speaker information, limiting the potential performance of the PSE model.… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: accpeted by ICASSP2025

  42. arXiv:2501.04529  [pdf, other

    cs.LG

    A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes

    Authors: Qingmei Wang, Yuxin Wu, Yujie Long, Jing Huang, Fengyuan Ran, Bing Su, Hongteng Xu

    Abstract: An event sequence generated by a temporal point process is often associated with a hidden and structured event branching process that captures the triggering relations between its historical and current events. In this study, we design a new plug-and-play module based on the Bregman ADMM (BADMM) algorithm, which infers event branches associated with event sequences in the maximum likelihood estima… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: Accepted at AAAI 2025

    MSC Class: 60G55; 62M10

  43. arXiv:2412.19085  [pdf, other

    cs.LG

    Assessing Pre-Trained Models for Transfer Learning Through Distribution of Spectral Components

    Authors: Tengxue Zhang, Yang Shu, Xinyang Chen, Yifei Long, Chenjuan Guo, Bin Yang

    Abstract: Pre-trained model assessment for transfer learning aims to identify the optimal candidate for the downstream tasks from a model hub, without the need of time-consuming fine-tuning. Existing advanced works mainly focus on analyzing the intrinsic characteristics of the entire features extracted by each pre-trained model or how well such features fit the target labels. This paper proposes a novel per… ▽ More

    Submitted 6 March, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  44. arXiv:2412.12685  [pdf, other

    cs.CV

    SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing

    Authors: Chen Chen, Liangjin Zhao, Yuanchun He, Yingxuan Long, Kaiqiang Chen, Zhirui Wang, Yanfeng Hu, Xian Sun

    Abstract: Semantic segmentation and 3D reconstruction are two fundamental tasks in remote sensing, typically treated as separate or loosely coupled tasks. Despite attempts to integrate them into a unified network, the constraints between the two heterogeneous tasks are not explicitly modeled, since the pioneering studies either utilize a loosely coupled parallel structure or engage in only implicit interact… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 9 pages, 6 figures, AAAI 2025

  45. arXiv:2412.12164  [pdf, other

    cs.LG cs.AI

    GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection

    Authors: Lingzhi Shen, Yunfei Long, Xiaohao Cai, Imran Razzak, Guanming Chen, Kang Liu, Shoaib Jameel

    Abstract: Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language. Existing detection methods typically rely on fusion effectiveness and cross-modal consistency to model the content, complicating understanding how each modality affects prediction accuracy. Additionally, these methods are primarily based on static feature modelling, making it difficult… ▽ More

    Submitted 2 March, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

  46. arXiv:2412.03603  [pdf, other

    cs.CV

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Authors: Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue , et al. (27 additional authors not shown)

    Abstract: Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates per… ▽ More

    Submitted 11 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  47. arXiv:2412.02929  [pdf, other

    cs.CV cs.AI

    Panoptic Diffusion Models: co-generation of images and segmentation maps

    Authors: Yinghan Long, Kaushik Roy

    Abstract: Recently, diffusion models have demonstrated impressive capabilities in text-guided and image-conditioned image generation. However, existing diffusion models cannot simultaneously generate an image and a panoptic segmentation of objects and stuff from the prompt. Incorporating an inherent understanding of shapes and scene layouts can improve the creativity and realism of diffusion models. To addr… ▽ More

    Submitted 22 February, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  48. arXiv:2412.02897  [pdf, other

    cs.CL cs.AI

    MLD-EA: Check and Complete Narrative Coherence by Introducing Emotions and Actions

    Authors: Jinming Zhang, Yunfei Long

    Abstract: Narrative understanding and story generation are critical challenges in natural language processing (NLP), with much of the existing research focused on summarization and question-answering tasks. While previous studies have explored predicting plot endings and generating extended narratives, they often neglect the logical coherence within stories, leaving a significant gap in the field. To addres… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  49. arXiv:2411.11934  [pdf, other

    cs.CV cs.AI

    SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input

    Authors: Zhen Lv, Yangqi Long, Congzhentao Huang, Cao Li, Chengfei Lv, Hao Ren, Dian Zheng

    Abstract: Stereo video synthesis from a monocular input is a demanding task in the fields of spatial computing and virtual reality. The main challenges of this task lie on the insufficiency of high-quality paired stereo videos for training and the difficulty of maintaining the spatio-temporal consistency between frames. Existing methods primarily address these issues by directly applying novel view synthesi… ▽ More

    Submitted 27 April, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: website, see https://spatialdreamer.github.io

  50. arXiv:2411.08418  [pdf

    cs.AI

    Enhanced Classroom Dialogue Sequences Analysis with a Hybrid AI Agent: Merging Expert Rule-Base with Large Language Models

    Authors: Yun Long, Yu Zhang

    Abstract: Classroom dialogue plays a crucial role in fostering student engagement and deeper learning. However, analysing dialogue sequences has traditionally relied on either theoretical frameworks or empirical descriptions of practice, with limited integration between the two. This study addresses this gap by developing a comprehensive rule base of dialogue sequences and an Artificial Intelligence (AI) ag… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.