Skip to main content

Showing 1–24 of 24 results for author: Botteldooren, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.12509  [pdf, other

    q-bio.NC cs.AI

    A Reservoir-based Model for Human-like Perception of Complex Rhythm Pattern

    Authors: Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

    Abstract: Rhythm is a fundamental aspect of human behaviour, present from infancy and deeply embedded in cultural practices. Rhythm anticipation is a spontaneous cognitive process that typically occurs before the onset of actual beats. While most research in both neuroscience and artificial intelligence has focused on metronome-based rhythm tasks, studies investigating the perception of complex musical rhyt… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  2. arXiv:2503.12506  [pdf, other

    cs.SD cs.AI eess.AS

    A General Close-loop Predictive Coding Framework for Auditory Working Memory

    Authors: Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

    Abstract: Auditory working memory is essential for various daily activities, such as language acquisition, conversation. It involves the temporary storage and manipulation of information that is no longer present in the environment. While extensively studied in neuroscience and cognitive science, research on its modeling within neural networks remains limited. To address this gap, we propose a general frame… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  3. arXiv:2501.00038  [pdf, other

    cs.HC cs.RO cs.SD eess.AS

    Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction

    Authors: Yuanbo Hou, Qiaoqiao Ren, Wenwu Wang, Dick Botteldooren

    Abstract: Emotion recognition and touch gesture decoding are crucial for advancing human-robot interaction (HRI), especially in social environments where emotional cues and tactile perception play important roles. However, many humanoid robots, such as Pepper, Nao, and Furhat, lack full-body tactile skin, limiting their ability to engage in touch-based emotional and gesture interactions. In addition, vision… ▽ More

    Submitted 24 December, 2024; originally announced January 2025.

    Comments: ICASSP 2025

  4. arXiv:2409.06580  [pdf, other

    eess.AS cs.SD

    Exploring Differences between Human Perception and Model Inference in Audio Event Recognition

    Authors: Yizhou Tan, Yanru Wu, Yuanbo Hou, Xin Xu, Hui Bu, Shengchen Li, Dick Botteldooren, Mark D. Plumbley

    Abstract: Audio Event Recognition (AER) traditionally focuses on detecting and identifying audio events. Most existing AER models tend to detect all potential events without considering their varying significance across different contexts. This makes the AER results detected by existing models often have a large discrepancy with human auditory perception. Although this is a critical and significant issue, i… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Dataset homepage: https://github.com/Voltmeter00/MAFAR

  5. arXiv:2407.09538  [pdf, other

    q-bio.NC cs.AI cs.HC cs.NE

    A Dynamic Systems Approach to Modelling Human-Machine Rhythm Interaction

    Authors: Zhongju Yuan, Wannes Van Ransbeeck, Geraint Wiggins, Dick Botteldooren

    Abstract: In exploring the simulation of human rhythmic perception and synchronization capabilities, this study introduces a computational model inspired by the physical and biological processes underlying rhythm processing. Utilizing a reservoir computing framework that simulates the function of cerebellum, the model features a dual-neuron classification and incorporates parameters to modulate information… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Journal ref: 2025

  6. arXiv:2406.05914  [pdf, other

    eess.AS cs.SD eess.SP

    Soundscape Captioning using Sound Affective Quality Network and Large Language Model

    Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

    Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring their effects on people, such as the emotions they evoke within a context. T… ▽ More

    Submitted 29 November, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

  7. arXiv:2405.10102  [pdf, other

    cs.NE cs.AI cs.LG eess.AS

    A novel Reservoir Architecture for Periodic Time Series Prediction

    Authors: Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

    Abstract: This paper introduces a novel approach to predicting periodic time series using reservoir computing. The model is tailored to deliver precise forecasts of rhythms, a crucial aspect for tasks such as generating musical rhythm. Leveraging reservoir computing, our proposed method is ultimately oriented towards predicting human perception of rhythm. Our network accurately predicts rhythmic signals wit… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2405.09708  [pdf, ps, other

    cs.RO cs.AI stat.CO

    No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

    Authors: Qiaoqiao Ren, Yuanbo Hou, Dick Botteldooren, Tony Belpaeme

    Abstract: Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: IEEE Robotics and Automation Letters (IEEE RAL)

  9. arXiv:2403.15489  [pdf

    eess.SP cs.AI cs.HC cs.LG

    EEG decoding with conditional identification information

    Authors: Pengfei Sun, Jorg De Winne, Paul Devos, Dick Botteldooren

    Abstract: Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces. Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals. Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities. However, DNN still faces challenge in d… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by 6th International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI' 2024)

  10. arXiv:2312.09952  [pdf, other

    eess.AS cs.SD

    Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

    Authors: Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

    Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-relat… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  11. arXiv:2311.09030  [pdf

    eess.AS cs.SD

    AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

    Authors: Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang, Andrew Mitchell, Francesco Aletta, Jian Kang, Dick Botteldooren

    Abstract: Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: The Journal of the Acoustical Society of America, 154 (5), 3145

    Journal ref: The Journal of the Acoustical Society of America, 154, 3145 (2023)

  12. arXiv:2310.14982  [pdf, other

    cs.NE cs.LG eess.AS eess.SP

    Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

    Authors: Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

    Abstract: Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs… ▽ More

    Submitted 10 November, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted for publication in IEEE Transactions on Neural Networks and Learning Systems, 2024

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems,2024

  13. Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

    Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren

    Abstract: Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life aco… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: IEEE Signal Processing Letters, doi: 10.1109/LSP.2023.3319233

  14. arXiv:2308.11980  [pdf, other

    eess.AS cs.SD

    Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

    Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

    Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

  15. arXiv:2302.08607  [pdf, other

    cs.NE cs.SD eess.AS

    Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

    Authors: Pengfei Sun, Ehsan Eqlimi, Yansong Chua, Paul Devos, Dick Botteldooren

    Abstract: Spiking neural networks (SNN) are a promising research avenue for building accurate and efficient automatic speech recognition systems. Recent advances in audio-to-spike encoding and training algorithms enable SNN to be applied in practical tasks. Biologically-inspired SNN communicates using sparse asynchronous events. Therefore, spike-timing is critical to SNN performance. In this aspect, most wo… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  16. arXiv:2210.15366  [pdf, other

    eess.AS cs.SD

    Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

    Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren

    Abstract: Most existing deep learning-based acoustic scene classification (ASC) approaches directly utilize representations extracted from spectrograms to identify target scenes. However, these approaches pay little attention to the audio events occurring in the scene despite they provide crucial semantic information. This paper conducts the first study that investigates whether real-life acoustic scenes ca… ▽ More

    Submitted 1 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

  17. arXiv:2210.12541  [pdf, other

    cs.SD eess.AS

    GCT: Gated Contextual Transformer for Sequential Audio Tagging

    Authors: Yuanbo Hou, Yun Wang, Wenwu Wang, Dick Botteldooren

    Abstract: Audio tagging aims to assign predefined tags to audio clips to indicate the class information of audio events. Sequential audio tagging (SAT) means detecting both the class information of audio events, and the order in which they occur within the audio clip. Most existing methods for SAT are based on connectionist temporal classification (CTC). However, CTC cannot effectively capture connections b… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  18. arXiv:2208.02086  [pdf, other

    cs.SD cs.MM eess.AS eess.IV

    Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion

    Authors: Yuanbo Hou, Bo Kang, Dick Botteldooren

    Abstract: Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separately fine-tune the largescale audio and image pre-trained models on the target dataset, then either fuse the intermediate representations of the audio model and the visual model, or fuse… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: IEEE MMSP 2022

  19. arXiv:2206.08233  [pdf, other

    cs.SD eess.AS

    Event-related data conditioning for acoustic event classification

    Authors: Yuanbo Hou, Dick Botteldooren

    Abstract: Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event classification (AEC). Among them, self-attention is often used in audio-only tasks to help the model recognize different acoustic events. Self-attention relies on the similarity between time frames, and uses global information from the whole segment to highlight specific features within a frame. In… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted by INTERSPEECH 2022

  20. arXiv:2205.02115  [pdf, other

    cs.NE cs.AI cs.LG q-bio.NC

    Axonal Delay As a Short-Term Memory for Feed Forward Deep Spiking Neural Networks

    Authors: Pengfei Sun, Longwei Zhu, Dick Botteldooren

    Abstract: The information of spiking neural networks (SNNs) are propagated between the adjacent biological neuron by spikes, which provides a computing paradigm with the promise of simulating the human brain. Recent studies have found that the time delay of neurons plays an important role in the learning process. Therefore, configuring the precise timing of the spike is a promising direction for understandi… ▽ More

    Submitted 20 April, 2022; originally announced May 2022.

    Comments: Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022

  21. arXiv:2205.00499  [pdf, other

    cs.SD eess.AS

    Relation-guided acoustic scene classification aided with event embeddings

    Authors: Yuanbo Hou, Bo Kang, Wout Van Hauwermeiren, Dick Botteldooren

    Abstract: In real life, acoustic scenes and audio events are naturally correlated. Humans instinctively rely on fine-grained audio events as well as the overall sound characteristics to distinguish diverse acoustic scenes. Yet, most previous approaches treat acoustic scene classification (ASC) and audio event classification (AEC) as two independent tasks. A few studies on scene and event joint classificatio… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: International Joint Conference on Neural Networks (IJCNN) 2022

  22. arXiv:2203.11573  [pdf, other

    cs.SD eess.AS

    CT-SAT: Contextual Transformer for Sequential Audio Tagging

    Authors: Yuanbo Hou, Zhaoyi Liu, Bo Kang, Yun Wang, Dick Botteldooren

    Abstract: Sequential audio event tagging can provide not only the type information of audio events, but also the order information between events and the number of events that occur in an audio clip. Most previous works on audio event sequence analysis rely on connectionist temporal classification (CTC). However, CTC's conditional independence assumption prevents it from effectively learning correlations be… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Submitted to interspeech 2022

  23. arXiv:2106.11411  [pdf, other

    cs.SD eess.AS

    Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

    Authors: Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren

    Abstract: Many previous audio-visual voice-related works focus on speech, ignoring the singing voice in the growing number of musical video streams on the Internet. For processing diverse musical video data, voice activity detection is a necessary step. This paper attempts to detect the speech and singing voices of target performers in musical video streams using audiovisual information. To integrate inform… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted by INTERSPEECH 2021

  24. arXiv:2010.14168  [pdf, other

    cs.SD cs.MM eess.AS

    Rule-embedded network for audio-visual voice activity detection in live musical video streams

    Authors: Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren

    Abstract: Detecting anchor's voice in live musical streams is an important preprocessing for music and speech signal processing. Existing approaches to voice activity detection (VAD) primarily rely on audio, however, audio-based VAD is difficult to effectively focus on the target voice in noisy environments. With the help of visual information, this paper proposes a rule-embedded network to fuse the audio-v… ▽ More

    Submitted 31 October, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2021