Skip to main content

Showing 1–7 of 7 results for author: Hannaan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07954  [pdf, ps, other

    cs.SD cs.CV eess.AS

    Input Conditioned Layer Dropping in Speech Foundation Models

    Authors: Abdul Hannan, Daniele Falavigna, Alessio Brutti

    Abstract: Curating foundation speech models for edge and IoT settings, where computational resources vary over time, requires dynamic architectures featuring adaptable reduction strategies. One emerging approach is layer dropping ($\mathcal{LD}$) which skips fraction of the layers of a backbone network during inference to reduce the computational load. This allows transforming static models into dynamic one… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted at IEEE MLSP 2025

  2. arXiv:2506.12208  [pdf, ps, other

    cs.CV

    InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation

    Authors: Daniya Najiha Abdul Kareem, Abdul Hannan, Mubashir Noman, Jean Lahoud, Mustansar Fiaz, Hisham Cholakkal

    Abstract: Accurate microscopic medical image segmentation plays a crucial role in diagnosing various cancerous cells and identifying tumors. Driven by advancements in deep learning, convolutional neural networks (CNNs) and transformer-based models have been extensively studied to enhance receptive fields and improve medical image segmentation task. However, they often struggle to capture complex cellular an… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  3. arXiv:2506.09375  [pdf, ps, other

    cs.CL cs.SD eess.AS

    CoLMbo: Speaker Language Model for Descriptive Profiling

    Authors: Massa Baali, Shuo Han, Syed Abdul Hannan, Purusottam Samal, Karanveer Singh, Soham Deshmukh, Rita Singh, Bhiksha Raj

    Abstract: Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  4. arXiv:2505.17002  [pdf, ps, other

    cs.CV cs.AI

    PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

    Authors: Abdul Hannan, Muhammad Arslan Manzoor, Shah Nawaz, Muhammad Irzam Liaqat, Markus Schedl, Mubashir Noman

    Abstract: We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted at InterSpeech 2025

  5. arXiv:2505.16991  [pdf, ps, other

    cs.CV

    An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

    Authors: Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman

    Abstract: Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource devices is impractical despite of their favorable performance. Existing approaches (pruning, distillation, layer skip etc.) transform the large models into smaller… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted at InterSpeech 2025

  6. arXiv:2504.16515  [pdf, other

    cs.CV cs.AI

    Federated Learning of Low-Rank One-Shot Image Detection Models in Edge Devices with Scalable Accuracy and Compute Complexity

    Authors: Abdul Hannaan, Zubair Shah, Aiman Erbad, Amr Mohamed, Ali Safa

    Abstract: This paper introduces a novel federated learning framework termed LoRa-FL designed for training low-rank one-shot image detection models deployed on edge devices. By incorporating low-rank adaptation techniques into one-shot detection architectures, our method significantly reduces both computational and communication overhead while maintaining scalable accuracy. The proposed framework leverages f… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: accepted for publication at IEEE IWCMC 2025

  7. arXiv:1407.2019  [pdf

    cs.CL

    Assamese-English Bilingual Machine Translation

    Authors: Kalyanee Kanchan Baruah, Pranjal Das, Abdul Hannan, Shikhar Kr. Sarma

    Abstract: Machine translation is the process of translating text from one language to another. In this paper, Statistical Machine Translation is done on Assamese and English language by taking their respective parallel corpus. A statistical phrase based translation toolkit Moses is used here. To develop the language model and to align the words we used two another tools IRSTLM, GIZA respectively. BLEU score… ▽ More

    Submitted 8 July, 2014; originally announced July 2014.

    Comments: In the proceedings of International Conference of Natural Language Processing and Cognitive Computing (ICONACC)-2014, pp. 227-231

    Journal ref: International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014