Skip to main content

Showing 1–50 of 110 results for author: Razzak, I

.
  1. arXiv:2510.22630  [pdf, ps, other

    cs.CV

    Robust Atypical Mitosis Classification with DenseNet121: Stain-Aware Augmentation and Hybrid Loss for Domain Generalization

    Authors: Adinath Dukre, Ankan Deria, Yutong Xie, Imran Razzak

    Abstract: Atypical mitotic figures are important biomarkers of tumor aggressiveness in histopathology, yet reliable recognition remains challenging due to severe class imbalance and variability across imaging domains. We present a DenseNet-121-based framework tailored for atypical mitosis classification in the MIDOG 2025 (Track 2) setting. Our method integrates stain-aware augmentation (Macenko), geometric… ▽ More

    Submitted 26 October, 2025; originally announced October 2025.

    Comments: MIDOG 2025 MICCAI Workshop accepted

  2. arXiv:2510.20933  [pdf, ps, other

    cs.CV cs.AI

    Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation

    Authors: Moin Safdar, Shahzaib Iqbal, Mehwish Mehmood, Mubeen Ghafoor, Tariq M. Khan, Imran Razzak

    Abstract: Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological and spatial information on anatomical structures that directly influence treatment decisions. Convolutional neural networks significantly impact image segmentation; however, since convolution operations are loc… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

  3. arXiv:2510.20505  [pdf, ps, other

    cs.CL cs.AI

    Hierarchical Sequence Iteration for Heterogeneous Question Answering

    Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim

    Abstract: Retrieval-augmented generation (RAG) remains brittle on multi-step questions and heterogeneous evidence sources, trading accuracy against latency and token/tool budgets. This paper introducesHierarchical Sequence (HSEQ) Iteration for Heterogeneous Question Answering, a unified framework that (i) linearize documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightwei… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: 22 pages, 3 figures

  4. arXiv:2510.16536  [pdf, ps, other

    q-bio.QM cs.AI cs.LG

    Few-Label Multimodal Modeling of SNP Variants and ECG Phenotypes Using Large Language Models for Cardiovascular Risk Stratification

    Authors: Niranjana Arun Menon, Yulong Li, Iqra Farooq, Sara Ahmed, Muhammad Awais, Imran Razzak

    Abstract: Cardiovascular disease (CVD) risk stratification remains a major challenge due to its multifactorial nature and limited availability of high-quality labeled datasets. While genomic and electrophysiological data such as SNP variants and ECG phenotypes are increasingly accessible, effectively integrating these modalities in low-label settings is non-trivial. This challenge arises from the scarcity o… ▽ More

    Submitted 18 October, 2025; originally announced October 2025.

  5. arXiv:2510.12384  [pdf, ps, other

    q-bio.GN cs.AI

    Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging

    Authors: Huifa Li, Feilong Tang, Haochen Xue, Yulong Li, Xinlin Zhuang, Bin Zhang, Eran Segal, Imran Razzak

    Abstract: Aging is a highly complex and heterogeneous process that progresses at different rates across individuals, making biological age (BA) a more accurate indicator of physiological decline than chronological age. While previous studies have built aging clocks using single-omics data, they often fail to capture the full molecular complexity of human aging. In this work, we leveraged the Human Phenotype… ▽ More

    Submitted 23 October, 2025; v1 submitted 14 October, 2025; originally announced October 2025.

  6. arXiv:2510.09893  [pdf, ps, other

    cs.CL cs.LG

    HIPPD: Brain-Inspired Hierarchical Information Processing for Personality Detection

    Authors: Guanming Chen, Lingzhi Shen, Xiaohao Cai, Imran Razzak, Shoaib Jameel

    Abstract: Personality detection from text aims to infer an individual's personality traits based on linguistic patterns. However, existing machine learning approaches often struggle to capture contextual information spanning multiple posts and tend to fall short in extracting representative and robust features in semantically sparse environments. This paper presents HIPPD, a brain-inspired framework for per… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  7. arXiv:2509.16886  [pdf, ps, other

    cs.CV

    SAM-DCE: Addressing Token Uniformity and Semantic Over-Smoothing in Medical Segmentation

    Authors: Yingzhen Hu, Yiheng Zhong, Ruobing Li, Yingxue Su, Jiabao An, Feilong Tang, Jionglong Su, Imran Razzak

    Abstract: The Segment Anything Model (SAM) demonstrates impressive zero-shot segmentation ability on natural images but encounters difficulties in medical imaging due to domain shifts, anatomical variability, and its reliance on user-provided prompts. Recent prompt-free adaptations alleviate the need for expert intervention, yet still suffer from limited robustness and adaptability, often overlooking the is… ▽ More

    Submitted 23 September, 2025; v1 submitted 20 September, 2025; originally announced September 2025.

  8. arXiv:2509.16011  [pdf, ps, other

    cs.CV

    Towards Robust Visual Continual Learning with Multi-Prototype Supervision

    Authors: Xiwei Liu, Yulong Li, Yichen Li, Xinlin Zhuang, Haolin Yang, Huifa Li, Imran Razzak

    Abstract: Language-guided supervision, which utilizes a frozen semantic target from a Pretrained Language Model (PLM), has emerged as a promising paradigm for visual Continual Learning (CL). However, relying on a single target introduces two critical limitations: 1) semantic ambiguity, where a polysemous category name results in conflicting visual representations, and 2) intra-class visual diversity, where… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

  9. arXiv:2509.14998  [pdf, ps, other

    cs.AI cs.CV

    A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making

    Authors: Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie

    Abstract: Medical decision-making often involves integrating knowledge from multiple clinical specialties, typically achieved through multidisciplinary teams. Inspired by this collaborative process, recent work has leveraged large language models (LLMs) in multi-agent collaboration frameworks to emulate expert teamwork. While these approaches improve reasoning through agent interaction, they are limited by… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: The paper has been accepted to the EMNLP 2025 Main Conference

  10. arXiv:2509.02450  [pdf, ps, other

    cs.CL cs.LG

    EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling

    Authors: Lingzhi Shen, Xiaohao Cai, Yunfei Long, Imran Razzak, Guanming Chen, Shoaib Jameel

    Abstract: Personality detection from text is commonly performed by analysing users' social media posts. However, existing methods heavily rely on large-scale annotated datasets, making it challenging to obtain high-quality personality labels. Moreover, most studies treat emotion and personality as independent variables, overlooking their interactions. In this paper, we propose a novel self-supervised framew… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

  11. arXiv:2508.21148  [pdf, ps, other

    cs.CL cs.AI

    A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

    Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su , et al. (95 additional authors not shown)

    Abstract: Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a un… ▽ More

    Submitted 18 October, 2025; v1 submitted 28 August, 2025; originally announced August 2025.

  12. arXiv:2508.20688  [pdf, ps, other

    cs.RO cs.AI

    Task Allocation for Autonomous Machines using Computational Intelligence and Deep Reinforcement Learning

    Authors: Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Jonathan Kua, Imran Razzak, Dung Nguyen, Saeid Nahavandi

    Abstract: Enabling multiple autonomous machines to perform reliably requires the development of efficient cooperative control algorithms. This paper presents a survey of algorithms that have been developed for controlling and coordinating autonomous machines in complex environments. We especially focus on task allocation methods using computational intelligence (CI) and deep reinforcement learning (RL). The… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

    Comments: Accepted for publication in the Proceedings of the 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

  13. arXiv:2508.13401  [pdf, ps, other

    cs.CV

    AIM 2025 Rip Current Segmentation (RipSeg) Challenge Report

    Authors: Andrei Dumitriu, Florin Miron, Florin Tatui, Radu Tudor Ionescu, Radu Timofte, Aakash Ralhan, Florin-Alexandru Vasluianu, Shenyang Qian, Mitchell Harley, Imran Razzak, Yang Song, Pu Luo, Yumei Li, Cong Xu, Jinming Chai, Kexin Zhang, Licheng Jiao, Lingling Li, Siqi Yu, Chao Zhang, Kehuan Song, Fang Liu, Puhua Chen, Xu Liu, Jin Hu , et al. (2 additional authors not shown)

    Abstract: This report presents an overview of the AIM 2025 RipSeg Challenge, a competition designed to advance techniques for automatic rip current segmentation in still images. Rip currents are dangerous, fast-moving flows that pose a major risk to beach safety worldwide, making accurate visual detection an important and underexplored research task. The challenge builds on RipVIS, the largest available rip… ▽ More

    Submitted 3 September, 2025; v1 submitted 18 August, 2025; originally announced August 2025.

    Comments: Challenge report paper from AIM2025 Workshop at ICCVW 2025

    MSC Class: cs.AI ACM Class: I.4.0; I.4.9

  14. arXiv:2508.09747  [pdf, ps, other

    cs.LG

    A Machine Learning Approach to Predict Biological Age and its Longitudinal Drivers

    Authors: Nazira Dunbayeva, Yulong Li, Yutong Xie, Imran Razzak

    Abstract: Predicting an individual's aging trajectory is a central challenge in preventative medicine and bioinformatics. While machine learning models can predict chronological age from biomarkers, they often fail to capture the dynamic, longitudinal nature of the aging process. In this work, we developed and validated a machine learning pipeline to predict age using a longitudinal cohort with data from tw… ▽ More

    Submitted 13 August, 2025; originally announced August 2025.

  15. arXiv:2508.09180  [pdf, ps, other

    cs.LG cs.AI

    scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering

    Authors: Huifa Li, Jie Fu, Xinlin Zhuang, Haolin Yang, Xinpeng Ling, Tong Cheng, Haochen xue, Imran Razzak, Zhili Chen

    Abstract: Accurate cell type annotation is a crucial step in analyzing single-cell RNA sequencing (scRNA-seq) data, which provides valuable insights into cellular heterogeneity. However, due to the high dimensionality and prevalence of zero elements in scRNA-seq data, traditional clustering methods face significant statistical and computational challenges. While some advanced methods use graph neural networ… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

  16. arXiv:2508.07127  [pdf, ps, other

    cs.LG q-bio.GN

    How Effectively Can Large Language Models Connect SNP Variants and ECG Phenotypes for Cardiovascular Risk Prediction?

    Authors: Niranjana Arun Menon, Iqra Farooq, Yulong Li, Sara Ahmed, Yutong Xie, Muhammad Awais, Imran Razzak

    Abstract: Cardiovascular disease (CVD) prediction remains a tremendous challenge due to its multifactorial etiology and global burden of morbidity and mortality. Despite the growing availability of genomic and electrophysiological data, extracting biologically meaningful insights from such high-dimensional, noisy, and sparsely annotated datasets remains a non-trivial task. Recently, LLMs has been applied ef… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

  17. arXiv:2508.03538  [pdf, ps, other

    cs.CV cs.AI

    Retinal Lipidomics Associations as Candidate Biomarkers for Cardiovascular Health

    Authors: Inamullah, Imran Razzak, Shoaib Jameel

    Abstract: Retinal microvascular imaging is increasingly recognised as a non invasive method for evaluating systemic vascular and metabolic health. However, the association between lipidomics and retinal vasculature remains inadequate. This study investigates the relationships between serum lipid subclasses, free fatty acids (FA), diacylglycerols (DAG), triacylglycerols (TAG), and cholesteryl esters (CE), an… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

  18. arXiv:2508.01875  [pdf, ps, other

    cs.CV

    StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding

    Authors: Haolin Yang, Feilong Tang, Lingxiao Zhao, Xiang An, Ming Hu, Huifa Li, Xinlin Zhuang, Yifan Lu, Xiaofeng Zhang, Abdalla Swikir, Junjun He, Zongyuan Ge, Imran Razzak

    Abstract: Real-time streaming video understanding in domains such as autonomous driving and intelligent surveillance poses challenges beyond conventional offline video processing, requiring continuous perception, proactive decision making, and responsive interaction based on dynamically evolving visual content. However, existing methods rely on alternating perception-reaction or asynchronous triggers, lacki… ▽ More

    Submitted 13 October, 2025; v1 submitted 3 August, 2025; originally announced August 2025.

  19. arXiv:2508.01450  [pdf, ps, other

    cs.CL

    Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data

    Authors: Xinlin Zhuang, Feilong Tang, Haolin Yang, Ming Hu, Huifa Li, Haochen Xue, Yichen Li, Junjun He, Zongyuan Ge, Ying Qian, Imran Razzak

    Abstract: Supervised Fine-Tuning (SFT) plays a pivotal role in adapting Large Language Models (LLMs) to specialized domains such as medical reasoning. However, existing SFT practices often rely on unfiltered datasets that contain redundant and low-quality samples, leading to substantial computational costs and suboptimal performance. Although existing methods attempt to alleviate this problem by selecting d… ▽ More

    Submitted 2 August, 2025; originally announced August 2025.

    Comments: preprint, under review

  20. arXiv:2507.15469  [pdf, ps, other

    cs.RO cs.AI

    The Emergence of Deep Reinforcement Learning for Path Planning

    Authors: Thanh Thi Nguyen, Saeid Nahavandi, Imran Razzak, Dung Nguyen, Nhat Truong Pham, Quoc Viet Hung Nguyen

    Abstract: The increasing demand for autonomous systems in complex and dynamic environments has driven significant research into intelligent path planning methodologies. For decades, graph-based search algorithms, linear programming techniques, and evolutionary computation methods have served as foundational approaches in this domain. Recently, deep reinforcement learning (DRL) has emerged as a powerful meth… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted for publication in the Proceedings of the 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

  21. arXiv:2507.12663  [pdf, ps, other

    cs.CV

    Integrated Oculomics and Lipidomics Reveal Microvascular Metabolic Signatures Associated with Cardiovascular Health in a Healthy Cohort

    Authors: Inamullah, Ernesto Elias Vidal Rosas, Imran Razzak, Shoaib Jameel

    Abstract: Cardiovascular disease (CVD) remains the leading global cause of mortality, yet current risk stratification methods often fail to detect early, subclinical changes. Previous studies have generally not integrated retinal microvasculature characteristics with comprehensive serum lipidomic profiles as potential indicators of CVD risk. In this study, an innovative imaging omics framework was introduce… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  22. arXiv:2506.15649  [pdf, ps, other

    cs.CV cs.LG

    Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

    Authors: Ankan Deria, Adinath Madhavrao Dukre, Feilong Tang, Sara Atito, Sudipta Roy, Muhammad Awais, Muhammad Haris Khan, Imran Razzak

    Abstract: Despite significant advances in inference-time search for vision-language models (VLMs), existing approaches remain both computationally expensive and prone to unpenalized, low-confidence generations which often lead to persistent hallucinations. We introduce \textbf{Value-guided Inference with Margin-based Reward (ViMaR)}, a two-stage inference framework that improves both efficiency and output f… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  23. arXiv:2506.10423  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    PAL: Probing Audio Encoders via LLMs - Audio Information Transfer into LLMs

    Authors: Tony Alex, Wish Suharitdamrong, Sara Atito, Armin Mustafa, Philip J. B. Jackson, Imran Razzak, Muhammad Awais

    Abstract: Integration of audio perception into large language models (LLMs) is an emerging research area for enabling machine listening applications, yet efficient transfer of rich audio semantics from audio encoders to LLMs remains underexplored. The most widely used integration paradigm projects the audio encoder output tokens into the LLM input space (e.g., via an MLP or a Q-Former), then prepends or ins… ▽ More

    Submitted 14 October, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 17 pages, 3 figures

  24. arXiv:2506.10292  [pdf, other

    cs.CL cs.AI

    Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages

    Authors: Ali Almutairi, Abdullah Alsuhaibani, Shoaib Jameel, Usman Naseem, Gelareh Mohammadi, Imran Razzak

    Abstract: Training deep learning networks with minimal supervision has gained significant research attention due to its potential to reduce reliance on extensive labelled data. While self-training methods have proven effective in semi-supervised learning, they remain vulnerable to errors from noisy pseudo labels. Moreover, most recent approaches to the few-label classification problem are either designed fo… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  25. arXiv:2506.05221  [pdf, ps, other

    cs.CV

    SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

    Authors: Jianghao Wu, Yicheng Wu, Yutong Xie, Wenjia Bai, You Zhang, Feilong Tang, Yulong Li, Yasmeen George, Imran Razzak

    Abstract: Universal medical image segmentation using the Segment Anything Model (SAM) remains challenging due to its limited adaptability to medical domains. Existing adaptations, such as MedSAM, enhance SAM's performance in medical imaging but at the cost of reduced generalization to unseen data. Therefore, in this paper, we propose SAM-aware Test-Time Adaptation (SAM-TTA), a fundamentally different pipeli… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 10 pages, 4 figures

  26. arXiv:2505.23595  [pdf

    cs.CV cs.AI

    DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

    Authors: Youssef Mohamed, Noran Mohamed, Khaled Abouhashad, Feilong Tang, Sara Atito, Shoaib Jameel, Imran Razzak, Ahmed B. Zaky

    Abstract: While Multi-Task Learning (MTL) offers inherent advantages in complex domains such as medical imaging by enabling shared representation learning, effectively balancing task contributions remains a significant challenge. This paper addresses this critical issue by introducing DeepChest, a novel, computationally efficient and effective dynamic task-weighting framework specifically designed for multi… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  27. arXiv:2505.20810  [pdf, other

    eess.IV cs.CV

    The Role of AI in Early Detection of Life-Threatening Diseases: A Retinal Imaging Perspective

    Authors: Tariq M Khan, Toufique Ahmed Soomro, Imran Razzak

    Abstract: Retinal imaging has emerged as a powerful, non-invasive modality for detecting and quantifying biomarkers of systemic diseases-ranging from diabetes and hypertension to Alzheimer's disease and cardiovascular disorders but current insights remain dispersed across platforms and specialties. Recent technological advances in optical coherence tomography (OCT/OCTA) and adaptive optics (AO) now deliver… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  28. arXiv:2505.19527  [pdf, ps, other

    cs.LG cs.AI math.OC

    Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles

    Authors: Mohammed Djameleddine Belgoumri, Mohamed Reda Bouadjenek, Hakim Hacid, Imran Razzak, Sunil Aryal

    Abstract: Training large neural networks (NNs) requires optimizing high-dimensional data-dependent loss functions. The optimization landscape of these functions is often highly complex and textured, even fractal-like, with many spurious local minima, ill-conditioned valleys, degenerate points, and saddle points. Complicating things further is the fact that these landscape characteristics are a function of t… ▽ More

    Submitted 24 October, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Submitted for review to ICLR 2026

  29. arXiv:2505.18685  [pdf, ps, other

    cs.CL

    From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation

    Authors: Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Liting Huang, Imran Razzak, Preslav Nakov, Usman Naseem

    Abstract: Infodemics and health misinformation have significant negative impact on individuals and society, exacerbating confusion and increasing hesitancy in adopting recommended health measures. Recent advancements in generative AI, capable of producing realistic, human like text and images, have significantly accelerated the spread and expanded the reach of health misinformation, resulting in an alarming… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Preprint

  30. arXiv:2505.18283  [pdf, ps, other

    cs.CL cs.AI cs.MA

    TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification

    Authors: Jianghao Wu, Feilong Tang, Yulong Li, Ming Hu, Haochen Xue, Shoaib Jameel, Yutong Xie, Imran Razzak

    Abstract: Recent advances such as Chain-of-Thought prompting have significantly improved large language models (LLMs) in zero-shot medical reasoning. However, prompting-based methods often remain shallow and unstable, while fine-tuned medical LLMs suffer from poor generalization under distribution shifts and limited adaptability to unseen clinical scenarios. To address these limitations, we present TAGS, a… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 16 pages including references, 2 figures

    ACM Class: I.2.7

  31. arXiv:2505.17677  [pdf, ps, other

    cs.CV

    Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery

    Authors: Ming Hu, Zhengdi Yu, Feilong Tang, Kaiwen Chen, Yulong Li, Imran Razzak, Junjun He, Tolga Birdal, Kaijing Zhou, Zongyuan Ge

    Abstract: Accurate 3D reconstruction of hands and instruments is critical for vision-based analysis of ophthalmic microsurgery, yet progress has been hampered by the lack of realistic, large-scale datasets and reliable annotation tools. In this work, we introduce OphNet-3D, the first extensive RGB-D dynamic 3D reconstruction dataset for ophthalmic surgery, comprising 41 sequences from 40 surgeons and totali… ▽ More

    Submitted 30 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  32. arXiv:2505.16652  [pdf, ps, other

    cs.CV cs.LG

    Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

    Authors: Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zelin Peng, Zhiwei Yang, Jionglong Su, Minquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge

    Abstract: Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations. We argue that adequate contextual information can be extracted directly from the token interaction p… ▽ More

    Submitted 7 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Clarification note for the CVPR 2025 paper (FarSight). Prepared by a subset of the original authors; remaining co-authors are acknowledged in the text

  33. arXiv:2505.13994  [pdf, ps, other

    cs.AI cs.IR cs.MA

    Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning

    Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim

    Abstract: Retrieval-Augmented Generation (RAG) systems empower large language models (LLMs) with external knowledge, yet struggle with efficiency-accuracy trade-offs when scaling to large knowledge graphs. Existing approaches often rely on monolithic graph retrieval, incurring unnecessary latency for simple queries and fragmented reasoning for complex multi-hop questions. To address these challenges, this p… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 20 pages, 4 figures

  34. arXiv:2505.04006  [pdf, other

    eess.IV cs.CV

    The Eye as a Window to Systemic Health: A Survey of Retinal Imaging from Classical Techniques to Oculomics

    Authors: Inamullah, Imran Razzak, Shoaib Jameel

    Abstract: The unique vascularized anatomy of the human eye, encased in the retina, provides an opportunity to act as a window for human health. The retinal structure assists in assessing the early detection, monitoring of disease progression and intervention for both ocular and non-ocular diseases. The advancement in imaging technology leveraging Artificial Intelligence has seized this opportunity to bridge… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  35. arXiv:2504.15072  [pdf, other

    cs.SI cs.CL

    Rhythm of Opinion: A Hawkes-Graph Framework for Dynamic Propagation Analysis

    Authors: Yulong Li, Zhixiang Lu, Feilong Tang, Simin Lai, Ming Hu, Yuxuan Zhang, Haochen Xue, Zhaodong Wu, Imran Razzak, Qingxia Li, Jionglong Su

    Abstract: The rapid development of social media has significantly reshaped the dynamics of public opinion, resulting in complex interactions that traditional models fail to effectively capture. To address this challenge, we propose an innovative approach that integrates multi-dimensional Hawkes processes with Graph Neural Network, modeling opinion propagation dynamics among nodes in a social network while c… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  36. arXiv:2504.05411  [pdf, other

    cs.CL cs.LG

    Less but Better: Parameter-Efficient Fine-Tuning of Large Language Models for Personality Detection

    Authors: Lingzhi Shen, Yunfei Long, Xiaohao Cai, Guanming Chen, Imran Razzak, Shoaib Jameel

    Abstract: Personality detection automatically identifies an individual's personality from various data sources, such as social media texts. However, as the parameter scale of language models continues to grow, the computational cost becomes increasingly difficult to manage. Fine-tuning also grows more complex, making it harder to justify the effort and reliably predict outcomes. We introduce a novel paramet… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  37. arXiv:2504.02146  [pdf, other

    cs.CL cs.LG

    LL4G: Self-Supervised Dynamic Optimization for Graph-Based Personality Detection

    Authors: Lingzhi Shen, Yunfei Long, Xiaohao Cai, Guanming Chen, Yuhan Wang, Imran Razzak, Shoaib Jameel

    Abstract: Graph-based personality detection constructs graph structures from textual data, particularly social media posts. Current methods often struggle with sparse or noisy data and rely on static graphs, limiting their ability to capture dynamic changes between nodes and relationships. This paper introduces LL4G, a self-supervised framework leveraging large language models (LLMs) to optimize graph neura… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  38. arXiv:2503.24164  [pdf, ps, other

    cs.MM

    SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation

    Authors: Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Imran Razzak, Hakim Hacid, Sunil Aryal

    Abstract: Large vision and language models show strong performance in tasks like image captioning, visual question answering, and retrieval. However, challenges remain in integrating speech, text, and vision into a unified model, especially for spoken tasks. Speech generation methods vary (some produce speech directly), others through text (but their impact on quality is unclear). Evaluation often relies on… ▽ More

    Submitted 7 July, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: 21 pages

  39. arXiv:2503.18227  [pdf, other

    cs.CV cs.AI

    PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation

    Authors: Yiheng Zhong, Zihong Luo, Chengzhi Liu, Feilong Tang, Zelin Peng, Ming Hu, Yingzhen Hu, Jionglong Su, Zongyuan Ge, Imran Razzak

    Abstract: Segment Anything Model (SAM) demonstrates powerful zero-shot capabilities; however, its accuracy and robustness significantly decrease when applied to medical image segmentation. Existing methods address this issue through modality fusion, integrating textual and image information to provide more detailed priors. In this study, we argue that the granularity of text and the domain gap affect the ac… ▽ More

    Submitted 26 March, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

  40. arXiv:2503.16473  [pdf, other

    cs.HC cs.RO

    PERCY: Personal Emotional Robotic Conversational System

    Authors: Zhijin Meng, Mohammed Althubyani, Shengyuan Xie, Imran Razzak, Eduardo B. Sandoval, Mahdi Bamdad, Francisco Cruz

    Abstract: Traditional rule-based conversational robots, constrained by predefined scripts and static response mappings, fundamentally lack adaptability for personalized, long-term human interaction. While Large Language Models (LLMs) like GPT-4 have revolutionized conversational AI through open-domain capabilities, current social robots implementing LLMs still lack emotional awareness and continuous persona… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 6 pages, 4 figures

  41. arXiv:2503.16400  [pdf, other

    cs.LG

    ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos

    Authors: Haolin Yang, Feilong Tang, Ming Hu, Qingyu Yin, Yulong Li, Yexin Liu, Zelin Peng, Peng Gao, Junjun He, Zongyuan Ge, Imran Razzak

    Abstract: Video diffusion models (VDMs) facilitate the generation of high-quality videos, with current research predominantly concentrated on scaling efforts during training through improvements in data quality, computational resources, and model complexity. However, inference-time scaling has received less attention, with most approaches restricting models to a single generation attempt. Recent studies hav… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

  42. arXiv:2503.15566  [pdf, other

    cs.LG

    Enforcing Consistency and Fairness in Multi-level Hierarchical Classification with a Mask-based Output Layer

    Authors: Shijing Chen, Shoaib Jameel, Mohamed Reda Bouadjenek, Feilong Tang, Usman Naseem, Basem Suleiman, Hakim Hacid, Flora D. Salim, Imran Razzak

    Abstract: Traditional Multi-level Hierarchical Classification (MLHC) classifiers often rely on backbone models with $n$ independent output layers. This structure tends to overlook the hierarchical relationships between classes, leading to inconsistent predictions that violate the underlying taxonomy. Additionally, once a backbone architecture for an MLHC classifier is selected, adapting the model to accommo… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 14 pages, 14 figures. arXiv admin note: text overlap with arXiv:2501.06827

  43. arXiv:2503.14800  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Long Context Modeling with Ranked Memory-Augmented Retrieval

    Authors: Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Hakim Hacid, Flora D. Salim, Imran Razzak

    Abstract: Effective long-term memory management is crucial for language models handling extended contexts. We introduce a novel framework that dynamically ranks memory entries based on relevance. Unlike previous works, our model introduces a novel relevance scoring and a pointwise re-ranking model for key-value embeddings, inspired by learning-to-rank techniques in information retrieval. Enhanced Ranked Mem… ▽ More

    Submitted 6 July, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  44. arXiv:2503.14234  [pdf, other

    cs.AI cs.MA

    Beyond Single Pass, Looping Through Time: KG-IRAG with Iterative Knowledge Retrieval

    Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim

    Abstract: Graph Retrieval-Augmented Generation (GraphRAG) has proven highly effective in enhancing the performance of Large Language Models (LLMs) on tasks that require external knowledge. By leveraging Knowledge Graphs (KGs), GraphRAG improves information retrieval for complex reasoning tasks, providing more precise and comprehensive retrieval and generating more accurate responses to QAs. However, most RA… ▽ More

    Submitted 19 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 15 pages, 3 figures

  45. arXiv:2503.13560  [pdf, other

    eess.IV cs.CV

    MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset

    Authors: Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Kang Dang, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Jionglong Su

    Abstract: With the significantly increasing incidence and prevalence of abdominal diseases, there is a need to embrace greater use of new innovations and technology for the diagnosis and treatment of patients. Although deep-learning methods have notably been developed to assist radiologists in diagnosing abdominal diseases, existing models have the restricted ability to segment common lesions in the abdomen… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  46. arXiv:2503.05319  [pdf, ps, other

    cs.CV cs.AI

    Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

    Authors: Xinkun Wang, Yifang Wang, Senwei Liang, Feilong Tang, Chengzhi Liu, Ming Hu, Chao Hu, Junjun He, Zongyuan Ge, Imran Razzak

    Abstract: This paper discusses how ophthalmologists often rely on multimodal data to improve diagnostic accuracy. However, complete multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy. Traditional deep learning methods typically address these issues by learning representations in latent space. However, the paper highlights two key limitations… ▽ More

    Submitted 24 June, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 10pages

  47. arXiv:2503.03313  [pdf, ps, other

    cs.LG cs.CL

    LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models

    Authors: Xi Zhu, Haochen Xue, Ziwei Zhao, Wujiang Xu, Jingyuan Huang, Minghao Guo, Qifan Wang, Kaixiong Zhou, Imran Razzak, Yongfeng Zhang

    Abstract: Text-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Network… ▽ More

    Submitted 20 October, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  48. arXiv:2502.11903  [pdf, other

    cs.CL

    MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

    Authors: Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao

    Abstract: Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six cor… ▽ More

    Submitted 8 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  49. arXiv:2501.06827  [pdf, other

    cs.AI

    Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

    Authors: Shijing Chen, Mohamed Reda Bouadjenek, Shoaib Jameel, Usman Naseem, Basem Suleiman, Flora D. Salim, Hakim Hacid, Imran Razzak

    Abstract: Multi-level Hierarchical Classification (MLHC) tackles the challenge of categorizing items within a complex, multi-layered class structure. However, traditional MLHC classifiers often rely on a backbone model with independent output layers, which tend to ignore the hierarchical relationships between classes. This oversight can lead to inconsistent predictions that violate the underlying taxonomy.… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: 11 pages, 7 figures, 2 tables, and accepted by COLING 2025

  50. arXiv:2501.03939  [pdf, other

    cs.CV cs.MM

    Visual question answering: from early developments to recent advances -- a survey

    Authors: Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Sunil Aryal, Imran Razzak, Hakim Hacid

    Abstract: Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text embedding, natural language understanding, and language generation. With the growth of multimodal data research, VQA has gained significant attention due to its br… ▽ More

    Submitted 11 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 20 papers