Search | arXiv e-print repository

SPD Learning for Covariance-Based Neuroimaging Analysis: Perspectives, Methods, and Challenges

Authors: Ce Ju, Reinmar J. Kobler, Antoine Collas, Motoaki Kawanabe, Cuntai Guan, Bertrand Thirion

Abstract: Neuroimaging provides a critical framework for characterizing brain activity by quantifying connectivity patterns and functional architecture across modalities. While modern machine learning has significantly advanced our understanding of neural processing mechanisms through these datasets, decoding task-specific signatures must contend with inherent neuroimaging constraints, for example, low sign… ▽ More Neuroimaging provides a critical framework for characterizing brain activity by quantifying connectivity patterns and functional architecture across modalities. While modern machine learning has significantly advanced our understanding of neural processing mechanisms through these datasets, decoding task-specific signatures must contend with inherent neuroimaging constraints, for example, low signal-to-noise ratios in raw electrophysiological recordings, cross-session non-stationarity, and limited sample sizes. This review focuses on machine learning approaches for covariance-based neuroimaging data, where often symmetric positive definite (SPD) matrices under full-rank conditions encode inter-channel relationships. By equipping the space of SPD matrices with Riemannian metrics (e.g., affine-invariant or log-Euclidean), their space forms a Riemannian manifold enabling geometric analysis. We unify methodologies operating on this manifold under the SPD learning framework, which systematically leverages the SPD manifold's geometry to process covariance features, thereby advancing brain imaging analytics. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: 20 pages, 3 figures, 2 tables; This paper has been submitted for possible publication, and currently under review

ACM Class: I.2.0

arXiv:2411.07249 [pdf, other]

SPDIM: Source-Free Unsupervised Conditional and Label Shift Adaptation in EEG

Authors: Shanglin Li, Motoaki Kawanabe, Reinmar J. Kobler

Abstract: The non-stationary nature of electroencephalography (EEG) introduces distribution shifts across domains (e.g., days and subjects), posing a significant challenge to EEG-based neurotechnology generalization. Without labeled calibration data for target domains, the problem is a source-free unsupervised domain adaptation (SFUDA) problem. For scenarios with constant label distribution, Riemannian geom… ▽ More The non-stationary nature of electroencephalography (EEG) introduces distribution shifts across domains (e.g., days and subjects), posing a significant challenge to EEG-based neurotechnology generalization. Without labeled calibration data for target domains, the problem is a source-free unsupervised domain adaptation (SFUDA) problem. For scenarios with constant label distribution, Riemannian geometry-aware statistical alignment frameworks on the symmetric positive definite (SPD) manifold are considered state-of-the-art. However, many practical scenarios, including EEG-based sleep staging, exhibit label shifts. Here, we propose a geometric deep learning framework for SFUDA problems under specific distribution shifts, including label shifts. We introduce a novel, realistic generative model and show that prior Riemannian statistical alignment methods on the SPD manifold can compensate for specific marginal and conditional distribution shifts but hurt generalization under label shifts. As a remedy, we propose a parameter-efficient manifold optimization strategy termed SPDIM. SPDIM uses the information maximization principle to learn a single SPD-manifold-constrained parameter per target domain. In simulations, we demonstrate that SPDIM can compensate for the shifts under our generative model. Moreover, using public EEG-based brain-computer interface and sleep staging datasets, we show that SPDIM outperforms prior approaches. △ Less

Submitted 28 November, 2024; v1 submitted 26 October, 2024; originally announced November 2024.

ACM Class: I.5.1; I.5.4; J.3

arXiv:2407.18497 [pdf, other]

Answerability Fields: Answerable Location Estimation via Diffusion Models

Authors: Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Motoaki Kawanabe

Abstract: In an era characterized by advancements in artificial intelligence and robotics, enabling machines to interact with and understand their environment is a critical research endeavor. In this paper, we propose Answerability Fields, a novel approach to predicting answerability within complex indoor environments. Leveraging a 3D question answering dataset, we construct a comprehensive Answerability Fi… ▽ More In an era characterized by advancements in artificial intelligence and robotics, enabling machines to interact with and understand their environment is a critical research endeavor. In this paper, we propose Answerability Fields, a novel approach to predicting answerability within complex indoor environments. Leveraging a 3D question answering dataset, we construct a comprehensive Answerability Fields dataset, encompassing diverse scenes and questions from ScanNet. Using a diffusion model, we successfully infer and evaluate these Answerability Fields, demonstrating the importance of objects and their locations in answering questions within a scene. Our results showcase the efficacy of Answerability Fields in guiding scene-understanding tasks, laying the foundation for their application in enhancing interactions between intelligent agents and their environments. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: IROS2024

arXiv:2405.16559 [pdf, other]

Map-based Modular Approach for Zero-shot Embodied Question Answering

Authors: Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

Abstract: Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unkn… ▽ More Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unknown environments. By leveraging foundation models, our method facilitates answering a diverse range of questions using natural language. We conducted extensive experiments in both virtual and real-world settings, demonstrating the robustness of our approach in navigating and comprehending queries within unknown environments. △ Less

Submitted 12 October, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: IROS 2024

arXiv:2310.18773 [pdf, other]

CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data

Authors: Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue

Abstract: City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations… ▽ More City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations available for images and indoor scenes, the scarcity of text annotations for outdoor scenes poses a significant challenge for achieving these applications. To tackle this problem, we introduce the CityRefer dataset for city-level visual grounding. The dataset consists of 35k natural language descriptions of 3D objects appearing in SensatUrban city scenes and 5k landmarks labels synchronizing with OpenStreetMap. To ensure the quality and accuracy of the dataset, all descriptions and labels in the CityRefer dataset are manually verified. We also have developed a baseline system that can learn encoded language descriptions, 3D object instances, and geographical information about the city's landmarks to perform visual grounding on the CityRefer dataset. To the best of our knowledge, the CityRefer dataset is the largest city-level visual grounding dataset for localizing specific 3D objects. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: NeurIPS D&B 2023. The first two authors are equally contributed

arXiv:2305.13876 [pdf, other]

Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

Authors: Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Motoki Kawanabe

Abstract: We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset. We created RIORefer, a large-scale 3D visual grounding dataset, to facilitate Cross3DVG. It includes more than 63k diverse descriptions of 3… ▽ More We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset. We created RIORefer, a large-scale 3D visual grounding dataset, to facilitate Cross3DVG. It includes more than 63k diverse descriptions of 3D objects within 1,380 indoor RGB-D scans from 3RScan, with human annotations. After training the Cross3DVG model using the source 3D visual grounding dataset, we evaluate it without target labels using the target dataset with, e.g., different sensors, 3D reconstruction methods, and language annotators. Comprehensive experiments are conducted using established visual grounding models and with CLIP-based multi-view 2D and 3D integration designed to bridge gaps among 3D datasets. For Cross3DVG tasks, (i) cross-dataset 3D visual grounding exhibits significantly worse performance than learning and evaluation with a single dataset because of the 3D data and language variants across datasets. Moreover, (ii) better object detector and localization modules and fusing 3D data and multi-view CLIP-based image features can alleviate this lower performance. Our Cross3DVG task can provide a benchmark for developing robust 3D visual grounding models to handle diverse 3D scenes while leveraging deep language understanding. △ Less

Submitted 7 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 3DV 2024

arXiv:2206.01323 [pdf, other]

SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG

Authors: Reinmar J Kobler, Jun-ichiro Hirayama, Qibin Zhao, Motoaki Kawanabe

Abstract: Electroencephalography (EEG) provides access to neuronal dynamics non-invasively with millisecond resolution, rendering it a viable method in neuroscience and healthcare. However, its utility is limited as current EEG technology does not generalize well across domains (i.e., sessions and subjects) without expensive supervised re-calibration. Contemporary methods cast this transfer learning (TL) pr… ▽ More Electroencephalography (EEG) provides access to neuronal dynamics non-invasively with millisecond resolution, rendering it a viable method in neuroscience and healthcare. However, its utility is limited as current EEG technology does not generalize well across domains (i.e., sessions and subjects) without expensive supervised re-calibration. Contemporary methods cast this transfer learning (TL) problem as a multi-source/-target unsupervised domain adaptation (UDA) problem and address it with deep learning or shallow, Riemannian geometry aware alignment methods. Both directions have, so far, failed to consistently close the performance gap to state-of-the-art domain-specific methods based on tangent space mapping (TSM) on the symmetric positive definite (SPD) manifold. Here, we propose a theory-based machine learning framework that enables, for the first time, learning domain-invariant TSM models in an end-to-end fashion. To achieve this, we propose a new building block for geometric deep learning, which we denote SPD domain-specific momentum batch normalization (SPDDSMBN). A SPDDSMBN layer can transform domain-specific SPD inputs into domain-invariant SPD outputs, and can be readily applied to multi-source/-target and online UDA scenarios. In extensive experiments with 6 diverse EEG brain-computer interface (BCI) datasets, we obtain state-of-the-art performance in inter-session and -subject TL with a simple, intrinsically interpretable network architecture, which we denote TSMNet. △ Less

Submitted 12 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: 10 pages, accepted at NeurIPS 2022

ACM Class: I.5.1; I.5.4; J.3

arXiv:2112.10482 [pdf, other]

ScanQA: 3D Question Answering for Spatial Scene Understanding

Authors: Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

Abstract: We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D indoor scan and answer the given textual questions about the 3D scene. Unlike the 2D-question answering of VQA, the conventional 2D-QA models suffer from problems with spatial understanding of object alignment and direction… ▽ More We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D indoor scan and answer the given textual questions about the 3D scene. Unlike the 2D-question answering of VQA, the conventional 2D-QA models suffer from problems with spatial understanding of object alignment and directions and fail the object identification from the textual questions in 3D-QA. We propose a baseline model for 3D-QA, named ScanQA model, where the model learns a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of 3D bounding boxes to determine described objects in textual questions and outputs correct answers. We collected human-edited question-answer pairs with free-form answers that are grounded to 3D objects in each 3D scene. Our new ScanQA dataset contains over 40K question-answer pairs from the 800 indoor scenes drawn from the ScanNet dataset. To the best of our knowledge, the proposed 3D-QA task is the first large-scale effort to perform object-grounded question-answering in 3D environments. △ Less

Submitted 7 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: CVPR2022. The first three authors are equally contributed. Project page: https://github.com/ATR-DBI/ScanQA

arXiv:2107.14398 [pdf, other]

On the interpretation of linear Riemannian tangent space model parameters in M/EEG

Authors: Reinmar J. Kobler, Jun-Ichiro Hirayama, Lea Hehenberger Catarina Lopes-Dias, Gernot R. Müller-Putz, Motoaki Kawanabe

Abstract: Riemannian tangent space methods offer state-of-the-art performance in magnetoencephalography (MEG) and electroencephalography (EEG) based applications such as brain-computer interfaces and biomarker development. One limitation, particularly relevant for biomarker development, is limited model interpretability compared to established component-based methods. Here, we propose a method to transform… ▽ More Riemannian tangent space methods offer state-of-the-art performance in magnetoencephalography (MEG) and electroencephalography (EEG) based applications such as brain-computer interfaces and biomarker development. One limitation, particularly relevant for biomarker development, is limited model interpretability compared to established component-based methods. Here, we propose a method to transform the parameters of linear tangent space models into interpretable patterns. Using typical assumptions, we show that this approach identifies the true patterns of latent sources, encoding a target signal. In simulations and two real MEG and EEG datasets, we demonstrate the validity of the proposed approach and investigate its behavior when the model assumptions are violated. Our results confirm that Riemannian tangent space methods are robust to differences in the source patterns across observations. We found that this robustness property also transfers to the associated patterns. △ Less

Submitted 29 July, 2021; originally announced July 2021.

arXiv:1908.01555 [pdf, other]

doi 10.1371/journal.pone.0232296

Interpretable brain age prediction using linear latent variable models of functional connectivity

Authors: Ricardo Pio Monti, Alex Gibberd, Sandipan Roy, Matt Nunes, Romy Lorenz, Robert Leech, Takeshi Ogawa, Motoaki Kawanabe, Aapo Hyvarinen

Abstract: Neuroimaging-driven prediction of brain age, defined as the predicted biological age of a subject using only brain imaging data, is an exciting avenue of research. In this work we seek to build models of brain age based on functional connectivity while prioritizing model interpretability and understanding. This way, the models serve to both provide accurate estimates of brain age as well as allow… ▽ More Neuroimaging-driven prediction of brain age, defined as the predicted biological age of a subject using only brain imaging data, is an exciting avenue of research. In this work we seek to build models of brain age based on functional connectivity while prioritizing model interpretability and understanding. This way, the models serve to both provide accurate estimates of brain age as well as allow us to investigate changes in functional connectivity which occur during the ageing process. The methods proposed in this work consist of a two-step procedure: first, linear latent variable models, such as PCA and its extensions, are employed to learn reproducible functional connectivity networks present across a cohort of subjects. The activity within each network is subsequently employed as a feature in a linear regression model to predict brain age. The proposed framework is employed on the data from the CamCAN repository and the inferred brain age models are further demonstrated to generalize using data from two open-access repositories: the Human Connectome Project and the ATR Wide-Age-Range. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: 21 pages, 11 figures

arXiv:1112.3697 [pdf, ps, other]

doi 10.1371/journal.pone.0038897

Insights from Classifying Visual Concepts with Multiple Kernel Learning

Authors: Alexander Binder, Shinichi Nakajima, Marius Kloft, Christina Müller, Wojciech Samek, Ulf Brefeld, Klaus-Robert Müller, Motoaki Kawanabe

Abstract: Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unf… ▽ More Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often observed to be outperformed by an unweighted sum kernel. The contribution of this paper is twofold: We apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks within computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum kernel SVM and the sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. About to be submitted to PLoS ONE. △ Less

Submitted 15 December, 2011; originally announced December 2011.

Comments: 18 pages, 8 tables, 4 figures, format deviating from plos one submission format requirements for aesthetic reasons

Journal ref: PLoS ONE 7(8): e38897, 2012

arXiv:0912.2412 [pdf, other]

doi 10.1109/TBME.2010.2046325

Modeling sparse connectivity between underlying brain sources for EEG/MEG

Authors: Stefan Haufe, Ryota Tomioka, Guido Nolte, Klaus-Robert Mueller, Motoaki Kawanabe

Abstract: We propose a novel technique to assess functional brain connectivity in EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA), can overcome the problem of volume conduction by modeling neural data innovatively with the following ingredients: (a) the EEG is assumed to be a linear mixture of correlated sources following a multivariate autoregressive (MVAR) model, (b) the d… ▽ More We propose a novel technique to assess functional brain connectivity in EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA), can overcome the problem of volume conduction by modeling neural data innovatively with the following ingredients: (a) the EEG is assumed to be a linear mixture of correlated sources following a multivariate autoregressive (MVAR) model, (b) the demixing is estimated jointly with the source MVAR parameters, (c) overfitting is avoided by using the Group Lasso penalty. This approach allows to extract the appropriate level cross-talk between the extracted sources and in this manner we obtain a sparse data-driven model of functional connectivity. We demonstrate the usefulness of SCSA with simulated data, and compare to a number of existing algorithms with excellent results. △ Less

Submitted 12 December, 2009; originally announced December 2009.

Comments: 9 pages, 6 figures

Journal ref: IEEE Trans. Biomed. Eng. 57(8) (2010) 1954 - 1963;

arXiv:0912.1128 [pdf, ps, other]

How to Explain Individual Classification Decisions

Authors: David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, Klaus-Robert Mueller

Abstract: After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influenti… ▽ More After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method. △ Less

Submitted 6 December, 2009; originally announced December 2009.

Comments: 31 pages, 14 figures

Showing 1–13 of 13 results for author: Kawanabe, M