-
Modelling multivariate spatio-temporal data with identifiable variational autoencoders
Authors:
Mika Sipilä,
Claudia Cappello,
Sandra De Iaco,
Klaus Nordhausen,
Sara Taskinen
Abstract:
Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unmixing transformation based on the obse…
▽ More
Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unmixing transformation based on the observed data only. The current methods for spatio-temporal blind source separation are restricted to linear unmixing, and nonlinear variants have not been implemented. In this paper, we extend identifiable variational autoencoder to the nonlinear nonstationary spatio-temporal blind source separation setting and demonstrate its performance using comprehensive simulation studies. Additionally, we introduce two alternative methods for the latent dimension estimation, which is a crucial task in order to obtain the correct latent representation. Finally, we illustrate the proposed methods using a meteorological application, where we estimate the latent dimension and the latent components, interpret the components, and show how nonstationarity can be accounted and prediction accuracy can be improved by using the proposed nonlinear blind source separation method as a preprocessing method.
△ Less
Submitted 1 November, 2024; v1 submitted 6 September, 2024;
originally announced September 2024.
-
Question Answering models for information extraction from perovskite materials science literature
Authors:
M. Sipilä,
F. Mehryary,
S. Pyysalo,
F. Ginter,
Milica Todorović
Abstract:
Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a novel approach to extract material-property relationships from scientific publications using the Question Answering (QA) method. QA performance was evaluated for information extraction of perovskite bandgaps based on…
▽ More
Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a novel approach to extract material-property relationships from scientific publications using the Question Answering (QA) method. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatBERT and F1-scores improved on the current state-of-the-art. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity, versatility and accuracy of the QA approach all point to its considerable potential for text-driven discoveries in materials research.
△ Less
Submitted 13 September, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Nonlinear blind source separation exploiting spatial nonstationarity
Authors:
Mika Sipilä,
Klaus Nordhausen,
Sara Taskinen
Abstract:
In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for sp…
▽ More
In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for spatial data are scarce. In this paper we extend an identifiable variational autoencoder that can estimate nonlinear unmixing transformations to spatially dependent data and demonstrate its performance for both stationary and nonstationary spatial data using simulations. In addition, we introduce scaled mean absolute Shapley additive explanations for interpreting the latent components through nonlinear mixing transformation. The spatial identifiable variational autoencoder is applied to a geochemical dataset to find the latent random fields, which are then interpreted by using the scaled mean absolute Shapley additive explanations. Finally, we illustrate how the proposed method can be used as a pre-processing method when making multivariate predictions.
△ Less
Submitted 20 December, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Neutral molecular cluster formation of sulfuric acid dimethylamine observed in real time under atmospheric conditions
Authors:
Andreas Kürten,
Tuija Jokinen,
Mario Simon,
Mikko Sipilä,
Nina Sarnela,
Heikki Junninen,
Alexey Adamov,
João Almeida,
Antonio Amorim,
Federico Bianchi,
Martin Breitenlechner,
Josef Dommen,
Neil M. Donahue,
Jonathan Duplissy,
Sebastian Ehrharta,
Richard C. Flagan,
Alessandro Franchin,
Jani Hakala,
Armin Hansel,
Martin Heinritzia,
Manuel Hutterli,
Juha Kangasluoma,
Jasper Kirkby,
Ari Laaksonen,
Katrianne Lehtipalo
, et al. (23 additional authors not shown)
Abstract:
For atmospheric sulfuric acid (SA) concentrations the presence of dimethylamine (DMA) at mixing ratios of several parts per trillion by volume can explain observed boundary layer new particle formation rates. However, the concentration and molecular composition of the neutral (uncharged) clusters have not been reported so far due to the lack of suitable instrumentation. Here we report on experimen…
▽ More
For atmospheric sulfuric acid (SA) concentrations the presence of dimethylamine (DMA) at mixing ratios of several parts per trillion by volume can explain observed boundary layer new particle formation rates. However, the concentration and molecular composition of the neutral (uncharged) clusters have not been reported so far due to the lack of suitable instrumentation. Here we report on experiments from the Cosmics Leaving Outdoor Droplets chamber at the European Organization for Nuclear Research revealing the formation of neutral particles containing up to 14 SA and 16 DMA molecules, corresponding to a mobility diameter of about 2 nm, under atmospherically relevant conditions. These measurements bridge the gap between the molecular and particle perspectives of nucleation, revealing the fundamental processes involved in particle formation and growth. The neutral clusters are found to form at or close to the kinetic limit where particle formation is limited only by the collision rate of SA molecules. Even though the neutral particles are stable against evaporation from the SA dimer onward, the formation rates of particles at 1.7-nm size, which contain about 10 SA molecules, are up to 4 orders of magnitude smaller comparedwith those of the dimer due to coagulation and wall loss of particles before they reach 1.7 nm in diameter. This demonstrates that neither the atmospheric particle formation rate nor its dependence on SA can simply be interpreted in terms of cluster evaporation or the molecular composition of a critical nucleus.
△ Less
Submitted 11 September, 2015;
originally announced September 2015.