Skip to main content

Showing 1–50 of 89 results for author: Zou, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03315  [pdf, ps, other

    eess.IV cs.CV

    Towards Interpretable PolSAR Image Classification: Polarimetric Scattering Mechanism Informed Concept Bottleneck and Kolmogorov-Arnold Network

    Authors: Jinqi Zhang, Fangzhou Han, Di Zhuang, Lamei Zhang, Bin Zou, Li Yuan

    Abstract: In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems.… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2506.16371  [pdf, ps, other

    cs.CV

    AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

    Authors: Yunhao Hou, Bochao Zou, Min Zhang, Ran Chen, Shangdong Yang, Yanmei Zhang, Junbao Zhuo, Siheng Chen, Jiansheng Chen, Huimin Ma

    Abstract: By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monito… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.14224  [pdf, ps, other

    cs.AI

    From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

    Authors: Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma

    Abstract: As large language models evolve, there is growing anticipation that they will emulate human-like Theory of Mind (ToM) to assist with routine tasks. However, existing methods for evaluating machine ToM focus primarily on unimodal models and largely treat these models as black boxes, lacking an interpretative exploration of their internal mechanisms. In response, this study adopts an approach based… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 24 pages, 22 figures, accepted at ICML 2025, project page: see https://annaisavailable.github.io/GridToM/

  4. arXiv:2506.12696  [pdf, ps, other

    cs.LG

    TFKAN: Time-Frequency KAN for Long-Term Time Series Forecasting

    Authors: Xiaoyan Kui, Canwei Liu, Qinsong Li, Zhipeng Hu, Yangyang Shi, Weixin Si, Beiji Zou

    Abstract: Kolmogorov-Arnold Networks (KANs) are highly effective in long-term time series forecasting due to their ability to efficiently represent nonlinear relationships and exhibit local plasticity. However, prior research on KANs has predominantly focused on the time domain, neglecting the potential of the frequency domain. The frequency domain of time series data reveals recurring patterns and periodic… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 11 pages,5 figures

  5. arXiv:2505.23037  [pdf, ps, other

    cs.CL

    Improving Multilingual Social Media Insights: Aspect-based Comment Analysis

    Authors: Longyin Zhang, Bowei Zou, Ai Ti Aw

    Abstract: The inherent nature of social media posts, characterized by the freedom of language use with a disjointed array of diverse opinions and topics, poses significant challenges to downstream NLP tasks such as comment clustering, comment summarization, and social media opinion analysis. To address this, we propose a granular level of identifying and generating aspect terms from individual comments to g… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: The paper was peer-reviewed

  6. arXiv:2505.19220  [pdf, other

    cs.AI cs.CY

    DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models

    Authors: Chengbo He, Bochao Zou, Junliang Xing, Jiansheng Chen, Yuanchun Shi, Huimin Ma

    Abstract: In human-AI collaboration, a central challenge is deciding whether the AI should handle a task, be deferred to a human expert, or be addressed through collaborative effort. Existing Learning to Defer approaches typically make binary choices between AI and humans, neglecting their complementary strengths. They also lack interpretability, a critical property in high-stakes scenarios where users must… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  7. arXiv:2505.18996  [pdf, ps, other

    cs.LG stat.ML

    Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs

    Authors: Bob Junyi Zou, Lu Tian

    Abstract: Hybrid neural ordinary differential equations (neural ODEs) integrate mechanistic models with neural ODEs, offering strong inductive bias and flexibility, and are particularly advantageous in data-scarce healthcare settings. However, excessive latent states and interactions from mechanistic models can lead to training inefficiency and over-fitting, limiting practical effectiveness of hybrid neural… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  8. arXiv:2505.14725  [pdf, ps, other

    q-bio.GN cs.LG stat.AP

    HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

    Authors: Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

    Abstract: Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  9. arXiv:2505.14671  [pdf, other

    cs.CV

    UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

    Authors: Ruichuan An, Sihan Yang, Renrui Zhang, Zijun Shen, Ming Lu, Gaole Dai, Hao Liang, Ziyu Guo, Shilin Yan, Yulin Luo, Bocheng Zou, Chaoqun Yang, Wentao Zhang

    Abstract: Personalized models have demonstrated remarkable success in understanding and generating concepts provided by users. However, existing methods use separate concept tokens for understanding and generation, treating these tasks in isolation. This may result in limitations for generating images with complex prompts. For example, given the concept $\langle bo\rangle$, generating "$\langle bo\rangle$ w… ▽ More

    Submitted 22 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  10. arXiv:2504.21281  [pdf, other

    cs.CV

    Mamba Based Feature Extraction And Adaptive Multilevel Feature Fusion For 3D Tumor Segmentation From Multi-modal Medical Image

    Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Hua Li, Pierre Vera, Su Ruan

    Abstract: Multi-modal 3D medical image segmentation aims to accurately identify tumor regions across different modalities, facing challenges from variations in image intensity and tumor morphology. Traditional convolutional neural network (CNN)-based methods struggle with capturing global features, while Transformers-based methods, despite effectively capturing global context, encounter high computational c… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  11. arXiv:2504.10105  [pdf, other

    cs.CV

    Global and Local Mamba Network for Multi-Modality Medical Image Super-Resolution

    Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Sebastien Thureau, Su Ruan

    Abstract: Convolutional neural networks and Transformer have made significant progresses in multi-modality medical image super-resolution. However, these methods either have a fixed receptive field for local learning or significant computational burdens for global learning, limiting the super-resolution performance. To solve this problem, State Space Models, notably Mamba, is introduced to efficiently model… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  12. arXiv:2504.04784  [pdf, other

    cs.CV

    Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing

    Authors: Hui Liu, Bin Zou, Suiyun Zhang, Kecheng Chen, Rui Liu, Haoliang Li

    Abstract: Instruction-guided image editing enables users to specify modifications using natural language, offering more flexibility and control. Among existing frameworks, Diffusion Transformers (DiTs) outperform U-Net-based diffusion models in scalability and performance. However, while real-world scenarios often require concurrent execution of multiple instructions, step-by-step editing suffers from accum… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 14 pages, 8 figures

  13. arXiv:2503.21330  [pdf, other

    cs.CE

    Large Language Models for Traffic and Transportation Research: Methodologies, State of the Art, and Future Opportunities

    Authors: Yimo Yan, Yejia Liao, Guanhao Xu, Ruili Yao, Huiying Fan, Jingran Sun, Xia Wang, Jonathan Sprinkle, Ziyan An, Meiyi Ma, Xi Cheng, Tong Liu, Zemian Ke, Bo Zou, Matthew Barth, Yong-Hong Kuo

    Abstract: The rapid rise of Large Language Models (LLMs) is transforming traffic and transportation research, with significant advancements emerging between the years 2023 and 2025 -- a period marked by the inception and swift growth of adopting and adapting LLMs for various traffic and transportation applications. However, despite these significant advancements, a systematic review and synthesis of the exi… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  14. arXiv:2503.07097  [pdf, other

    eess.IV cs.CV

    A Comprehensive Survey on Magnetic Resonance Image Reconstruction

    Authors: Xiaoyan Kui, Zijie Fan, Zexin Ji, Qinsong Li, Chengtao Liu, Weixin Si, Beiji Zou

    Abstract: Magnetic resonance imaging (MRI) reconstruction is a fundamental task aimed at recovering high-quality images from undersampled or low-quality MRI data. This process enhances diagnostic accuracy and optimizes clinical applications. In recent years, deep learning-based MRI reconstruction has made significant progress. Advancements include single-modality feature extraction using different network a… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  15. arXiv:2503.06919  [pdf, other

    eess.IV cs.CV

    CAFusion: Controllable Anatomical Synthesis of Perirectal Lymph Nodes via SDF-guided Diffusion

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Peiquan Jin

    Abstract: Lesion synthesis methods have made significant progress in generating large-scale synthetic datasets. However, existing approaches predominantly focus on texture synthesis and often fail to accurately model masks for anatomically complex lesions. Additionally, these methods typically lack precise control over the synthesis process. For example, perirectal lymph nodes, which range in diameter from… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  16. arXiv:2502.14795  [pdf, other

    cs.RO cs.CV

    Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

    Authors: Pengxiang Ding, Jianfei Ma, Xinyang Tong, Binghong Zou, Xinxin Luo, Yiguo Fan, Ting Wang, Hongchao Lu, Panzhong Mo, Jinxin Liu, Yuefan Wang, Huaicheng Zhou, Wenshuo Feng, Jiacheng Liu, Siteng Huang, Donglin Wang

    Abstract: This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control. Humanoid-VLA begins with language-mot… ▽ More

    Submitted 21 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  17. arXiv:2501.16154  [pdf, other

    cs.CL cs.AI

    AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought

    Authors: Xin Huang, Tarun Kumar Vangani, Zhengyuan Liu, Bowei Zou, Ai Ti Aw

    Abstract: Large language models have shown impressive multilingual capabilities through pretraining on diverse corpora. While these models show strong reasoning abilities, their performance varies significantly across languages due to imbalanced training data distribution. Existing approaches using sample-level translation for extensive multilingual pretraining and cross-lingual tuning face scalability chal… ▽ More

    Submitted 9 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  18. arXiv:2501.00430  [pdf, other

    cs.CL

    Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents

    Authors: Chengbo He, Bochao Zou, Xin Li, Jiansheng Chen, Junliang Xing, Huimin Ma

    Abstract: Agents have demonstrated their potential in scientific reasoning tasks through large language models. However, they often face challenges such as insufficient accuracy and degeneration of thought when handling complex reasoning tasks, which impede their performance. To overcome these issues, we propose the Reactive and Reflection agents with Multi-Path Reasoning (RR-MP) Framework, aimed at enhanci… ▽ More

    Submitted 2 January, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  19. arXiv:2412.09954  [pdf, other

    cs.CV

    $\textrm{A}^{\textrm{2}}$RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion

    Authors: Jiawei Li, Hongwei Yu, Jiansheng Chen, Xinlong Ding, Jinlong Wang, Jinyuan Liu, Bochao Zou, Huimin Ma

    Abstract: Infrared and visible image fusion (IVIF) is a crucial technique for enhancing visual performance by integrating unique information from different modalities into one fused image. Exiting methods pay more attention to conducting fusion with undisturbed data, while overlooking the impact of deliberate interference on the effectiveness of fusion results. To investigate the robustness of fusion models… ▽ More

    Submitted 13 February, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 9 pages, 8 figures, The 39th Annual AAAI Conference on Artificial Intelligence

  20. arXiv:2411.13162  [pdf, other

    cs.GT

    IC Mechanisms for Risk-Averse Advertisers in the Online Advertising System

    Authors: Bingzhe Wang, Ruohan Qian, Yuejia Dou, Qi Qi, Bo Shen, Changyuan Li, Yixuan Zhang, Yixin Su, Xin Yuan, Wenqiang liu, Bin Zou, Wen Yi, Zhi Guo, Shuanglong Li, Liu Lin

    Abstract: The autobidding system generates huge revenue for advertising platforms, garnering substantial research attention. Existing studies in autobidding systems focus on designing Autobidding Incentive Compatible (AIC) mechanisms, where the mechanism is Incentive Compatible (IC) under ex ante expectations. However, upon deploying AIC mechanisms in advertising platforms, we observe a notable deviation be… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  21. arXiv:2410.10818  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

    Authors: Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang

    Abstract: Understanding fine-grained temporal dynamics is crucial for multimodal video comprehension and generation. Due to the lack of fine-grained temporal annotations, existing video benchmarks mostly resemble static image benchmarks and are incompetent at evaluating models for temporal understanding. In this paper, we introduce TemporalBench, a new benchmark dedicated to evaluating fine-grained temporal… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: Project Page: https://temporalbench.github.io/

  22. arXiv:2409.16767  [pdf, other

    cs.LG

    Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Jiansheng Chen, Huimin Ma, Weiran Huang

    Abstract: In this paper, we introduce matrix entropy as an analytical tool for studying supervised learning, investigating the information content of data representations and classification head vectors, as well as the dynamic interactions between them during the supervised learning process. Our experimental results reveal that matrix entropy effectively captures the variations in information content of dat… ▽ More

    Submitted 28 February, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.03999

  23. arXiv:2409.09707  [pdf, other

    cs.CV

    Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

    Authors: Bochao Zou, Zizheng Guo, Wenfeng Qin, Xin Li, Kangsheng Wang, Huimin Ma

    Abstract: Micro-expressions are involuntary facial movements that cannot be consciously controlled, conveying subtle cues with substantial real-world applications. The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals. Previous deep learning methods have primarily relied on classifi… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  24. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 3 January, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

    Journal ref: ICASSP 2025

  25. arXiv:2408.14977  [pdf, other

    eess.IV cs.CV

    LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reli… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 8 pages

  26. arXiv:2407.15719  [pdf, other

    cs.CV cs.AI

    GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

    Authors: Zhaojie Fang, Shenghao Zhu, Yifei Chen, Binfeng Zou, Fan Jia, Chang Liu, Xiang Feng, Linwei Qiu, Feiwei Qin, Jin Fan, Changbiao Chu, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a progressive, irreversible neurodegenerative disorder that often originates from Mild Cognitive Impairment (MCI). This progression results in significant memory loss and severely affects patients' quality of life. Clinical trials have consistently shown that early and targeted interventions for individuals with MCI may slow or even prevent the advancement of AD. Resear… ▽ More

    Submitted 29 January, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  27. arXiv:2407.10972  [pdf, other

    cs.CV cs.AI cs.LG

    VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

    Authors: Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee

    Abstract: In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world using geometry primitives such as polygons. Vector graphics (VG), on the other hand, offer a textual representation of visual content, which can be more c… ▽ More

    Submitted 29 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Page: https://vgbench.github.io

  28. arXiv:2407.05993  [pdf, other

    cs.CV

    Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution

    Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

    Abstract: In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, Sta… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  29. arXiv:2407.05969  [pdf, other

    cs.CV

    Deform-Mamba Network for MRI Super-Resolution

    Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

    Abstract: In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder wh… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  30. arXiv:2407.03647  [pdf, other

    math.OC cs.AI

    WANCO: Weak Adversarial Networks for Constrained Optimization problems

    Authors: Gang Bao, Dong Wang, Boyi Zou

    Abstract: This paper focuses on integrating the networks and adversarial training into constrained optimization problems to develop a framework algorithm for constrained optimization problems. For such problems, we first transform them into minimax problems using the augmented Lagrangian method and then use two (or several) deep neural networks(DNNs) to represent the primal and dual variables respectively.… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 24 pages, 18 figures

  31. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 11 October, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures

    Journal ref: IEEE Journal of Biomedical and Health Informatics 2024

  32. arXiv:2406.03999  [pdf, other

    cs.LG cs.CV

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

    Abstract: In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  33. arXiv:2405.02363  [pdf, other

    cs.CV cs.CL

    LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

    Authors: Yulin Luo, Ruichuan An, Bocheng Zou, Yiming Tang, Jiaming Liu, Shanghang Zhang

    Abstract: The distribution of subpopulations is an important property hidden within a dataset. Uncovering and analyzing the subpopulation distribution within datasets provides a comprehensive understanding of the datasets, standing as a powerful tool beneficial to various downstream tasks, including Dataset Subpopulation Organization, Subpopulation Shift, and Slice Discovery. Despite its importance, there h… ▽ More

    Submitted 23 July, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: ECCV24 Camera Ready

  34. arXiv:2404.16362  [pdf, other

    cs.CR

    Feature graph construction with static features for malware detection

    Authors: Binghui Zou, Chunjie Cao, Longjuan Wang, Yinan Cheng, Chenxi Dang, Ying Liu, Jingzhang Sun

    Abstract: Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and… ▽ More

    Submitted 22 November, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  35. arXiv:2404.08916  [pdf, other

    cs.CV cs.LG

    Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Jun Li, Peiquan Jin

    Abstract: Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contra… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages

  36. arXiv:2404.06483  [pdf, other

    cs.CV

    RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement

    Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

    Abstract: Remote photoplethysmography (rPPG) is a method for non-contact measurement of physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: understanding the periodic pattern of rPPG among long contexts and addressing lar… ▽ More

    Submitted 23 February, 2025; v1 submitted 9 April, 2024; originally announced April 2024.

  37. arXiv:2404.01013  [pdf, other

    cs.CV cs.AI

    Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge

    Authors: Bo Zou, Shaofeng Wang, Hao Liu, Gaoyue Sun, Yajie Wang, FeiFei Zuo, Chengbin Quan, Youjian Zhao

    Abstract: Teeth localization, segmentation, and labeling in 2D images have great potential in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, general instance segmentation frameworks are incompetent due to 1) the subtle differences between some teeth' shapes (e.g., maxillary first premolar and second premolar), 2) the teeth's position… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by CVPR 2024

  38. arXiv:2404.00973  [pdf, other

    cs.CV

    VideoDistill: Language-aware Vision Distillation for Video Question Answering

    Authors: Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

    Abstract: Significant advancements in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although these image-language models can efficiently represent both video and language branches, they typically employ a goal-free vision perception process and do not interact vision with language well during the answer generation, thus omitting crucial vis… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024

  39. arXiv:2404.00913  [pdf, other

    cs.CV cs.AI cs.CL

    LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

    Authors: Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

    Abstract: Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile informat… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024

  40. arXiv:2403.20271  [pdf, other

    cs.CV

    Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

    Authors: Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

    Abstract: In this paper, we present the Draw-and-Understand framework, exploring how to integrate visual prompting understanding capabilities into Multimodal Large Language Models (MLLMs). Visual prompts allow users to interact through multi-modal instructions, enhancing the models' interactivity and fine-grained image comprehension. In this framework, we propose a general architecture adaptable to differen… ▽ More

    Submitted 22 February, 2025; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 30 pages, 8 figures, 15 tables

  41. arXiv:2403.16358  [pdf, other

    cs.CV

    ChebMixer: Efficient Graph Representation Learning with MLP Mixer

    Authors: Xiaoyan Kui, Haonan Yan, Qinsong Li, Liming Chen, Beiji Zou

    Abstract: Graph neural networks have achieved remarkable success in learning graph representations, especially graph Transformer, which has recently shown superior performance on various graph mining tasks. However, graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph MLP Mixer addresses this chal… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  42. arXiv:2402.17233  [pdf, other

    cs.LG stat.AP stat.ME

    Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response

    Authors: Bob Junyi Zou, Matthew E. Levine, Dessi P. Zaharieva, Ramesh Johari, Emily B. Fox

    Abstract: Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox… ▽ More

    Submitted 11 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  43. arXiv:2402.12788  [pdf, other

    cs.CV

    RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention

    Authors: Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, Huimin Ma

    Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals based on facial videos, holding high potential in various applications. Due to the periodicity nature of rPPG signals, the long-range dependency capturing capacity of the transformer was assumed to be advantageous for such signals. However, existing methods have not conclusively demonstrated the superior… ▽ More

    Submitted 20 February, 2025; v1 submitted 20 February, 2024; originally announced February 2024.

  44. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  45. arXiv:2312.14705  [pdf, other

    eess.IV cs.CV cs.LG

    SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

    Authors: Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang

    Abstract: Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can p… ▽ More

    Submitted 2 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 10 pages, 7 figures, accept WACV2024

    Journal ref: WACV 2024

  46. arXiv:2312.05834  [pdf, other

    cs.CL cs.AI

    Evidence-based Interpretable Open-domain Fact-checking with Large Language Models

    Authors: Xin Tan, Bowei Zou, Ai Ti Aw

    Abstract: Universal fact-checking systems for real-world claims face significant challenges in gathering valid and sufficient real-time evidence and making reasoned decisions. In this work, we introduce the Open-domain Explainable Fact-checking (OE-Fact) system for claim-checking in real-world scenarios. The OE-Fact system can leverage the powerful understanding and reasoning capabilities of large language… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  47. arXiv:2312.02923  [pdf, other

    cs.CV

    MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning

    Authors: Qizhe Zhang, Bocheng Zou, Ruichuan An, Jiaming Liu, Shanghang Zhang

    Abstract: With the rapid growth in the scale of pre-trained foundation models, parameter-efficient fine-tuning techniques have gained significant attention, among which Adapter Tuning is the most widely used. Despite achieving efficiency, it still underperforms full fine-tuning, and the performance improves at the cost of an increase in parameters. Recent efforts have either focused on training multiple ada… ▽ More

    Submitted 23 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 16 pages, 7 figures. Official code: https://github.com/Theia-4869/MoSA

  48. arXiv:2310.15105  [pdf, other

    cs.CV

    FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning

    Authors: Kun Song, Huimin Ma, Bochao Zou, Huishuai Zhang, Weiran Huang

    Abstract: Due to the limited availability of data, existing few-shot learning methods trained from scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models such as CLIP demonstrate remarkable few-shot and zero-shot capabilities. To enhance the performance of pre-trained models for downstream tasks, fine-tuning the model on downstream data is frequently necessary. However… ▽ More

    Submitted 17 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  49. arXiv:2308.08283  [pdf, other

    eess.IV cs.CV cs.LG

    CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

    Authors: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in perfo… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 8 pages

  50. arXiv:2305.16048  [pdf, other

    cs.CL cs.AI

    UFO: Unified Fact Obtaining for Commonsense Question Answering

    Authors: Zhifeng Li, Yifan Fan, Bowei Zou, Yu Hong

    Abstract: Leveraging external knowledge to enhance the reasoning ability is crucial for commonsense question answering. However, the existing knowledge bases heavily rely on manual annotation which unavoidably causes deficiency in coverage of world-wide commonsense knowledge. Accordingly, the knowledge bases fail to be flexible enough to support the reasoning over diverse questions. Recently, large-scale la… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: IJCNN 2023