-
Multimodal Fusion Balancing Through Game-Theoretic Regularization
Authors:
Konstantinos Kontras,
Thomas Strypsteen,
Christos Chatzichristos,
Paul Pu Liang,
Matthew Blaschko,
Maarten De Vos
Abstract:
Multimodal learning can complete the picture of information extraction by uncovering key dependencies between data sources. However, current systems fail to fully leverage multiple modalities for optimal performance. This has been attributed to modality competition, where modalities strive for training resources, leaving some underoptimized. We show that current balancing methods struggle to train…
▽ More
Multimodal learning can complete the picture of information extraction by uncovering key dependencies between data sources. However, current systems fail to fully leverage multiple modalities for optimal performance. This has been attributed to modality competition, where modalities strive for training resources, leaving some underoptimized. We show that current balancing methods struggle to train multimodal models that surpass even simple baselines, such as ensembles. This raises the question: how can we ensure that all modalities in multimodal training are sufficiently trained, and that learning from new modalities consistently improves performance? This paper proposes the Multimodal Competition Regularizer (MCR), a new loss component inspired by mutual information (MI) decomposition designed to prevent the adverse effects of competition in multimodal training. Our key contributions are: 1) Introducing game-theoretic principles in multimodal learning, where each modality acts as a player competing to maximize its influence on the final outcome, enabling automatic balancing of the MI terms. 2) Refining lower and upper bounds for each MI term to enhance the extraction of task-relevant unique and shared information across modalities. 3) Suggesting latent space permutations for conditional MI estimation, significantly improving computational efficiency. MCR outperforms all previously suggested training strategies and is the first to consistently improve multimodal learning beyond the ensemble baseline, clearly demonstrating that combining modalities leads to significant performance gains on both synthetic and large real-world datasets.
△ Less
Submitted 7 December, 2024; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Improving Multimodal Learning with Multi-Loss Gradient Modulation
Authors:
Konstantinos Kontras,
Christos Chatzichristos,
Matthew Blaschko,
Maarten De Vos
Abstract:
Learning from multiple modalities, such as audio and video, offers opportunities for leveraging complementary information, enhancing robustness, and improving contextual understanding and performance. However, combining such modalities presents challenges, especially when modalities differ in data structure, predictive contribution, and the complexity of their learning processes. It has been obser…
▽ More
Learning from multiple modalities, such as audio and video, offers opportunities for leveraging complementary information, enhancing robustness, and improving contextual understanding and performance. However, combining such modalities presents challenges, especially when modalities differ in data structure, predictive contribution, and the complexity of their learning processes. It has been observed that one modality can potentially dominate the learning process, hindering the effective utilization of information from other modalities and leading to sub-optimal model performance. To address this issue the vast majority of previous works suggest to assess the unimodal contributions and dynamically adjust the training to equalize them. We improve upon previous work by introducing a multi-loss objective and further refining the balancing process, allowing it to dynamically adjust the learning pace of each modality in both directions, acceleration and deceleration, with the ability to phase out balancing effects upon convergence. We achieve superior results across three audio-video datasets: on CREMA-D, models with ResNet backbone encoders surpass the previous best by 1.9% to 12.4%, and Conformer backbone models deliver improvements ranging from 2.8% to 14.1% across different fusion methods. On AVE, improvements range from 2.7% to 7.7%, while on UCF101, gains reach up to 6.1%.
△ Less
Submitted 14 October, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities
Authors:
Konstantinos Kontras,
Christos Chatzichristos,
Huy Phan,
Johan Suykens,
Maarten De Vos
Abstract:
Sleep abnormalities can have severe health consequences. Automated sleep staging, i.e. labelling the sequence of sleep stages from the patient's physiological recordings, could simplify the diagnostic process. Previous work on automated sleep staging has achieved great results, mainly relying on the EEG signal. However, often multiple sources of information are available beyond EEG. This can be pa…
▽ More
Sleep abnormalities can have severe health consequences. Automated sleep staging, i.e. labelling the sequence of sleep stages from the patient's physiological recordings, could simplify the diagnostic process. Previous work on automated sleep staging has achieved great results, mainly relying on the EEG signal. However, often multiple sources of information are available beyond EEG. This can be particularly beneficial when the EEG recordings are noisy or even missing completely. In this paper, we propose CoRe-Sleep, a Coordinated Representation multimodal fusion network that is particularly focused on improving the robustness of signal analysis on imperfect data. We demonstrate how appropriately handling multimodal information can be the key to achieving such robustness. CoRe-Sleep tolerates noisy or missing modalities segments, allowing training on incomplete data. Additionally, it shows state-of-the-art performance when testing on both multimodal and unimodal data using a single model on SHHS-1, the largest publicly available study that includes sleep stage labels. The results indicate that training the model on multimodal data does positively influence performance when tested on unimodal data. This work aims at bridging the gap between automated analysis tools and their clinical utility.
△ Less
Submitted 27 March, 2023;
originally announced April 2023.
-
Early soft and flexible fusion of EEG and fMRI via tensor decompositions
Authors:
Christos Chatzichristos,
Eleftherios Kofidis,
Lieven De Lathauwer,
Sergios Theodoridis,
Sabine Van Huffel
Abstract:
Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary…
▽ More
Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary spatiotemporal resolution: EEG offers good temporal resolution while fMRI is better in its spatial resolution. The fusion methods reported so far ignore the underlying multi-way nature of the data in at least one of the modalities and/or rely on very strong assumptions about the relation of the two datasets. In this preprint, these two points are addressed by adopting for the first time tensor models in the two modalities while also exploring double coupled tensor decompositions and by following soft and flexible coupling approaches to implement the multi-modal analysis. To cope with the Event Related Potential (ERP) variability in EEG, the PARAFAC2 model is adopted. The results obtained are compared against those of parallel Independent Component Analysis (ICA) and hard coupling alternatives in both simulated and real data. Our results confirm the superiority of tensorial methods over methods based on ICA. In scenarios that do not meet the assumptions underlying hard coupling, the advantage of soft and flexible coupled decompositions is clearly demonstrated.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Joint Channel Estimation / Data Detection in MIMO-FBMC/OQAM Systems - A Tensor-Based Approach
Authors:
Eleftherios Kofidis,
Christos Chatzichristos,
Andre L. F. de Almeida
Abstract:
Filter bank-based multicarrier (FBMC) systems are currently being considered as a prevalent candidate for replacing the long established cyclic prefix (CP)-based orthogonal frequency division multiplexing (CP-OFDM) in the physical layer of next generation communications systems. In particular, FBMC/OQAM has received increasing attention due to, among other features, its potential for maximum spect…
▽ More
Filter bank-based multicarrier (FBMC) systems are currently being considered as a prevalent candidate for replacing the long established cyclic prefix (CP)-based orthogonal frequency division multiplexing (CP-OFDM) in the physical layer of next generation communications systems. In particular, FBMC/OQAM has received increasing attention due to, among other features, its potential for maximum spectral efficiency. It suffers, however, from an intrinsic self-interference effect, which complicates signal processing tasks at the receiver, including synchronization, channel estimation and equalization. In a multiple-input multiple-output (MIMO) configuration, the multi-antenna interference has also to be taken into account. (Semi-)blind FBMC/OQAM receivers have been little studied so far and mainly for single-antenna systems. The problem of joint channel estimation and data detection in a MIMO-FBMC/OQAM system, given limited or no training information, is studied in this paper through a tensor-based approach in the light of the success of such techniques in OFDM applications. Simulation-based comparisons with CP-OFDM are included, for realistic transmission models.
△ Less
Submitted 30 September, 2016;
originally announced September 2016.