Skip to main content

Showing 1–22 of 22 results for author: Gogate, M

.
  1. arXiv:2508.19483  [pdf, ps, other

    eess.AS

    Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids

    Authors: Nasir Saleem, Mandar Gogate, Kia Dashtipour, Adeel Hussain, Usman Anwar, Adewale Adetomi, Tughrul Arslan, Amir Hussain

    Abstract: Audio-visual feature synchronization for real-time speech enhancement in hearing aids represents a progressive approach to improving speech intelligibility and user experience, particularly in strong noisy backgrounds. This approach integrates auditory signals with visual cues, utilizing the complementary description of these modalities to improve speech intelligibility. Audio-visual feature synch… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: Preprint of the paper presented at Euronoise 2025 Malaga, Spain

  2. arXiv:2402.16757  [pdf, other

    cs.SD eess.AS

    Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids

    Authors: Jasper Kirton-Wingate, Shafique Ahmed, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Jen-Cheng Hou, Tassadaq Hussain, Yu Tsao, Amir Hussain

    Abstract: Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-da… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: This has been submitted to the Trends in Hearing journal

  3. arXiv:2307.07748  [pdf, other

    eess.AS

    Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations

    Authors: Richard Lee Lai, Jen-Cheng Hou, I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Mandar Gogate, Tughrul Arslan, Amir Hussain, Yu Tsao

    Abstract: Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability… ▽ More

    Submitted 19 March, 2025; v1 submitted 15 July, 2023; originally announced July 2023.

  4. arXiv:2210.17456  [pdf, other

    eess.AS cs.SD

    Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

    Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

    Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More

    Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: ICASSP AMHAT 2023

  5. arXiv:2210.13127  [pdf, other

    eess.AS cs.SD eess.SP

    A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids

    Authors: Abhijeet Bishnu, Ankit Gupta, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain, Mathini Sellathurai, Tharmalingam Ratnarajah

    Abstract: In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing,… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  6. arXiv:2202.05756  [pdf, other

    cs.SD cs.LG eess.AS

    A Novel Speech Intelligibility Enhancement Model based on CanonicalCorrelation and Deep Learning

    Authors: Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain

    Abstract: Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, such approaches do not deliver required levels of speech intelligibility in everyday noisy environments . Intelligibility-oriented (I-O) loss functions… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.04172

  7. arXiv:2202.05662  [pdf, other

    cs.CR cs.SD eess.AS

    A Novel Chaos-based Light-weight Image Encryption Scheme for Multi-modal Hearing Aids

    Authors: Awais Aziz Shah, Ahsan Adeel, Jawad Ahmad, Ahmed Al-Dubai, Mandar Gogate, Abhijeet Bishnu, Muhammad Diyan, Tassadaq Hussain, Kia Dashtipour, Tharm Ratnarajah, Amir Hussain

    Abstract: Multimodal hearing aids (HAs) aim to deliver more intelligible audio in noisy environments by contextually sensing and processing data in the form of not only audio but also visual information (e.g. lip reading). Machine learning techniques can play a pivotal role for the contextually processing of multimodal data. However, since the computational power of HA devices is low, therefore this data mu… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  8. arXiv:2202.04172   

    eess.AS cs.SD

    A Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning for Hearing-Assistive Technologies

    Authors: Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain

    Abstract: Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are generally trained to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in everyday noisy situations. In an attempt to addres… ▽ More

    Submitted 15 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: We would like to withdraw this article because we have accidentally uploaded the revised version of the same article from another account. The updated version is titled "A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning" (arXiv:2202.05756)

  9. arXiv:2201.09913  [pdf

    eess.AS cs.SD

    A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement

    Authors: Tassadaq Hussain, Wei-Chien Wang, Mandar Gogate, Kia Dashtipour, Yu Tsao, Xugang Lu, Adeel Ahsan, Amir Hussain

    Abstract: In acoustic signal processing, the target signals usually carry semantic information, which is encoded in a hierarchal structure of short and long-term contexts. However, the background noise distorts these structures in a nonuniform way. The existing deep acoustic signal enhancement (ASE) architectures ignore this kind of local and global effect. To address this problem, we propose to integrate a… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  10. arXiv:2112.09060  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Towards Robust Real-time Audio-Visual Speech Enhancement

    Authors: Mandar Gogate, Kia Dashtipour, Amir Hussain

    Abstract: The human brain contextually exploits heterogeneous sensory information to efficiently perform cognitive tasks including vision and hearing. For example, during the cocktail party situation, the human auditory cortex contextually integrates audio-visual (AV) cues in order to better perceive speech. Recent studies have shown that AV speech enhancement (SE) models can significantly improve speech qu… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  11. arXiv:2111.09642  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Towards Intelligibility-Oriented Audio-Visual Speech Enhancement

    Authors: Tassadaq Hussain, Mandar Gogate, Kia Dashtipour, Amir Hussain

    Abstract: Existing deep learning (DL) based speech enhancement approaches are generally optimised to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in real noisy situations. In an attempt to address these challenges, researchers have explo… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: 6 pages, 4 figures

  12. A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis

    Authors: Kia Dashtipour, Mandar Gogate, Erik Cambria, Amir Hussain

    Abstract: Most recent works on sentiment analysis have exploited the text modality. However, millions of hours of video recordings posted on social media platforms everyday hold vital unstructured information that can be exploited to more effectively gauge public perception. Multimodal sentiment analysis offers an innovative solution to computationally understand and harvest sentiments from videos by contex… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted in Neurocomputing

  13. An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks

    Authors: Andrew Churcher, Rehmat Ullah, Jawad Ahmad, Sadaqat ur Rehman, Fawad Masood, Mandar Gogate, Fehaid Alqahtani, Boubakr Nour, William J. Buchanan

    Abstract: In recent years, there has been a massive increase in the amount of Internet of Things (IoT) devices as well as the data generated by such devices. The participating devices in IoT networks can be problematic due to their resource-constrained nature, and integrating security on these devices is often overlooked. This has resulted in attackers having an increased incentive to target IoT devices. As… ▽ More

    Submitted 10 January, 2021; originally announced January 2021.

    Journal ref: Sensors. 2021; 21(2):446

  14. arXiv:1910.00424  [pdf, other

    cs.SD cs.LG eess.AS

    AV Speech Enhancement Challenge using a Real Noisy Corpus

    Authors: Mandar Gogate, Ahsan Adeel, Kia Dashtipour, Peter Derleth, Amir Hussain

    Abstract: This paper presents, a first of its kind, audio-visual (AV) speech enhacement challenge in real-noisy settings. A detailed description of the AV challenge, a novel real noisy AV corpus (ASPIRE), benchmark speech enhancement task, and baseline performance results are outlined. The latter are based on training a deep neural architecture on a synthetic mixture of Grid corpus and ChiME3 noises (consis… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1909.10407

  15. arXiv:1909.13568  [pdf, other

    cs.CL cs.LG

    A Hybrid Persian Sentiment Analysis Framework: Integrating Dependency Grammar Based Rules and Deep Neural Networks

    Authors: Kia Dashtipour, Mandar Gogate, Jingpeng Li, Fengling Jiang, Bin Kong, Amir Hussain

    Abstract: Social media hold valuable, vast and unstructured information on public opinion that can be utilized to improve products and services. The automatic analysis of such data, however, requires a deep understanding of natural language. Current sentiment analysis approaches are mainly based on word co-occurrence frequencies, which are inadequate in most practical cases. In this work, we propose a novel… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted in Neurocomputing, Demo available at: https://cogbid.napier.ac.uk/demo/persian-sentiment-analysis/

  16. arXiv:1909.10407  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement

    Authors: Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain

    Abstract: Noisy situations cause huge problems for suffers of hearing loss as hearing aids often make the signal more audible but do not always restore the intelligibility. In noisy settings, humans routinely exploit the audio-visual (AV) nature of the speech to selectively suppress the background noise and to focus on the target speaker. In this paper, we present a causal, language, noise and speaker indep… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: 34 pages, 11 figures, Submitted to Information Fusion

  17. A Survey on the Role of Wireless Sensor Networks and IoT in Disaster Management

    Authors: Ahsan Adeel, Mandar Gogate, Saadullah Farooq, Cosimo Ieracitano, Kia Dashtipour, Hadi Larijani, Amir Hussain

    Abstract: Extreme events and disasters resulting from climate change or other ecological factors are difficult to predict and manage. Current limitations of state-of-the-art approaches to disaster prediction and management could be addressed by adopting new unorthodox risk assessment and management strategies. The next generation Internet of Things (IoT), Wireless Sensor Networks (WSNs), 5G wireless communi… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: Accepted in Springer Natural Hazards book series

    Journal ref: 2019 Geological Disaster Monitoring Based on Sensor Networks

  18. Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments

    Authors: Ahsan Adeel, Mandar Gogate, Amir Hussain

    Abstract: Human speech processing is inherently multimodal, where visual cues (lip movements) help to better understand the speech in noise. Lip-reading driven speech enhancement significantly outperforms benchmark audio-only approaches at low signal-to-noise ratios (SNRs). However, at high SNRs or low levels of background noise, visual cues become fairly less effective for speech enhancement. Therefore, a… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: 16 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1808.00046

    Report number: ISSN 1566-2535

    Journal ref: Information Fusion, 2019

  19. arXiv:1808.05633  [pdf, other

    cs.CR

    Statistical Analysis Driven Optimized Deep Learning System for Intrusion Detection

    Authors: Cosimo Ieracitano, Ahsan Adeel, Mandar Gogate, Kia Dashtipour, Francesco Carlo Morabito, Hadi Larijani, Ali Raza, Amir Hussain

    Abstract: Attackers have developed ever more sophisticated and intelligent ways to hack information and communication technology systems. The extent of damage an individual hacker can carry out upon infiltrating a system is well understood. A potentially catastrophic scenario can be envisaged where a nation-state intercepting encrypted financial data gets hacked. Thus, intelligent cybersecurity systems have… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

    Comments: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018)

    ACM Class: K.6.5; I.2.1; I.5.1

  20. arXiv:1808.05077  [pdf, other

    cs.CL

    Exploiting Deep Learning for Persian Sentiment Analysis

    Authors: Kia Dashtipour, Mandar Gogate, Ahsan Adeel, Cosimo Ieracitano, Hadi Larijani, Amir Hussain

    Abstract: The rise of social media is enabling people to freely express their opinions about products and services. The aim of sentiment analysis is to automatically determine subject's sentiment (e.g., positive, negative, or neutral) towards a particular aspect such as topic, product, movie, news etc. Deep learning has recently emerged as a powerful machine learning technique to tackle a growing demand of… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018)

    ACM Class: I.2.7; I.5.0

  21. arXiv:1808.00060  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

    Authors: Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker, Amir Hussain

    Abstract: Human auditory cortex excels at selectively suppressing background noise to focus on a target speaker. The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises. In this study, we propose a novel deep neural network (DNN) based audiovisual (AV) mask estimation model. The pr… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: Accepted for Interspeech 2018, 5 pages, 4 figures

    ACM Class: I.5; I.4; I.2

  22. arXiv:1808.00046  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Lip-Reading Driven Deep Learning Approach for Speech Enhancement

    Authors: Ahsan Adeel, Mandar Gogate, Amir Hussain, William M. Whitmer

    Abstract: This paper proposes a novel lip-reading driven deep learning framework for speech enhancement. The proposed approach leverages the complementary strengths of both deep learning and analytical acoustic modelling (filtering based approach) as compared to recently published, comparatively simpler benchmark approaches that rely only on deep learning. The proposed audio-visual (AV) speech enhancement f… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: 11 pages, 13 figures

    ACM Class: I.4; I.5; I.2