Search | arXiv e-print repository

doi 10.1016/j.neunet.2024.106774

Modelling multivariate spatio-temporal data with identifiable variational autoencoders

Authors: Mika Sipilä, Claudia Cappello, Sandra De Iaco, Klaus Nordhausen, Sara Taskinen

Abstract: Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unmixing transformation based on the obse… ▽ More Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unmixing transformation based on the observed data only. The current methods for spatio-temporal blind source separation are restricted to linear unmixing, and nonlinear variants have not been implemented. In this paper, we extend identifiable variational autoencoder to the nonlinear nonstationary spatio-temporal blind source separation setting and demonstrate its performance using comprehensive simulation studies. Additionally, we introduce two alternative methods for the latent dimension estimation, which is a crucial task in order to obtain the correct latent representation. Finally, we illustrate the proposed methods using a meteorological application, where we estimate the latent dimension and the latent components, interpret the components, and show how nonstationarity can be accounted and prediction accuracy can be improved by using the proposed nonlinear blind source separation method as a preprocessing method. △ Less

Submitted 1 November, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

Journal ref: Neural Networks 2025, 181, 106774

arXiv:2405.15290 [pdf, other]

Question Answering models for information extraction from perovskite materials science literature

Authors: M. Sipilä, F. Mehryary, S. Pyysalo, F. Ginter, Milica Todorović

Abstract: Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a novel approach to extract material-property relationships from scientific publications using the Question Answering (QA) method. QA performance was evaluated for information extraction of perovskite bandgaps based on… ▽ More Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a novel approach to extract material-property relationships from scientific publications using the Question Answering (QA) method. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatBERT and F1-scores improved on the current state-of-the-art. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity, versatility and accuracy of the QA approach all point to its considerable potential for text-driven discoveries in materials research. △ Less

Submitted 13 September, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: The following article has been submitted to npj Computational Materials

arXiv:2311.08004 [pdf, other]

doi 10.1016/j.ins.2024.120365

Nonlinear blind source separation exploiting spatial nonstationarity

Authors: Mika Sipilä, Klaus Nordhausen, Sara Taskinen

Abstract: In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for sp… ▽ More In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for spatial data are scarce. In this paper we extend an identifiable variational autoencoder that can estimate nonlinear unmixing transformations to spatially dependent data and demonstrate its performance for both stationary and nonstationary spatial data using simulations. In addition, we introduce scaled mean absolute Shapley additive explanations for interpreting the latent components through nonlinear mixing transformation. The spatial identifiable variational autoencoder is applied to a geochemical dataset to find the latent random fields, which are then interpreted by using the scaled mean absolute Shapley additive explanations. Finally, we illustrate how the proposed method can be used as a pre-processing method when making multivariate predictions. △ Less

Submitted 20 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Journal ref: Information Sciences 2024, 665, 120365

arXiv:1509.03474 [pdf]

doi 10.1073/pnas.1404853111

Neutral molecular cluster formation of sulfuric acid dimethylamine observed in real time under atmospheric conditions

Authors: Andreas Kürten, Tuija Jokinen, Mario Simon, Mikko Sipilä, Nina Sarnela, Heikki Junninen, Alexey Adamov, João Almeida, Antonio Amorim, Federico Bianchi, Martin Breitenlechner, Josef Dommen, Neil M. Donahue, Jonathan Duplissy, Sebastian Ehrharta, Richard C. Flagan, Alessandro Franchin, Jani Hakala, Armin Hansel, Martin Heinritzia, Manuel Hutterli, Juha Kangasluoma, Jasper Kirkby, Ari Laaksonen, Katrianne Lehtipalo , et al. (23 additional authors not shown)

Abstract: For atmospheric sulfuric acid (SA) concentrations the presence of dimethylamine (DMA) at mixing ratios of several parts per trillion by volume can explain observed boundary layer new particle formation rates. However, the concentration and molecular composition of the neutral (uncharged) clusters have not been reported so far due to the lack of suitable instrumentation. Here we report on experimen… ▽ More For atmospheric sulfuric acid (SA) concentrations the presence of dimethylamine (DMA) at mixing ratios of several parts per trillion by volume can explain observed boundary layer new particle formation rates. However, the concentration and molecular composition of the neutral (uncharged) clusters have not been reported so far due to the lack of suitable instrumentation. Here we report on experiments from the Cosmics Leaving Outdoor Droplets chamber at the European Organization for Nuclear Research revealing the formation of neutral particles containing up to 14 SA and 16 DMA molecules, corresponding to a mobility diameter of about 2 nm, under atmospherically relevant conditions. These measurements bridge the gap between the molecular and particle perspectives of nucleation, revealing the fundamental processes involved in particle formation and growth. The neutral clusters are found to form at or close to the kinetic limit where particle formation is limited only by the collision rate of SA molecules. Even though the neutral particles are stable against evaporation from the SA dimer onward, the formation rates of particles at 1.7-nm size, which contain about 10 SA molecules, are up to 4 orders of magnitude smaller comparedwith those of the dimer due to coagulation and wall loss of particles before they reach 1.7 nm in diameter. This demonstrates that neither the atmospheric particle formation rate nor its dependence on SA can simply be interpreted in terms of cluster evaporation or the molecular composition of a critical nucleus. △ Less

Submitted 11 September, 2015; originally announced September 2015.

Comments: Main text plus SI

Journal ref: Proceedings of the National Academy of Sciences of United State of America, 2014

Showing 1–4 of 4 results for author: Sipilä, M