-
Unsupervised hyperspectral data mining and bioimaging by information entropy and self-modeling curve resolution
Authors:
Simon Vilms Pedersen,
Anders R. Walther,
Anthony Callanan,
Molly M. Stevens,
Martin A. B. Hedegaard,
Eva C. Arnspang
Abstract:
Unsupervised estimation of the dimensionality of hyperspectral microspectroscopy datasets containing pure and mixed spectral features, and extraction of their representative endmember spectra, remains a challenge in biochemical data mining. We report a new versatile algorithm building on semi-nonnegativity constrained self-modeling curve resolution and information entropy, to estimate the quantity…
▽ More
Unsupervised estimation of the dimensionality of hyperspectral microspectroscopy datasets containing pure and mixed spectral features, and extraction of their representative endmember spectra, remains a challenge in biochemical data mining. We report a new versatile algorithm building on semi-nonnegativity constrained self-modeling curve resolution and information entropy, to estimate the quantity of separable biochemical species from hyperspectral microspectroscopy, and extraction of their representative spectra. The algorithm is benchmarked with established methods from satellite remote sensing, spectral unmixing, and clustering. To demonstrate the widespread applicability of the developed algorithm, we collected hyperspectral datasets using spontaneous Raman, Coherent Anti-stokes Raman Scattering and Fourier Transform IR, of seven reference compounds, an oil-in-water emulsion, and tissue-engineered extracellular matrices on poly-L-lactic acid and porcine jejunum-derived small intestine submucosa scaffolds seeded with bovine chondrocytes. We show the potential of the developed algorithm by consolidating hyperspectral molecular information with sample microstructure, pertinent to fields ranging from gastrophysics to regenerative medicine.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Inferring exemplar discriminability in brain representations
Authors:
Hamed Nili,
Alexander Walther,
Arjen Alink,
Nikolaus Kriegeskorte
Abstract:
Representational distinctions within categories are important in all perceptual modalities and also in cognitive and motor representations. Recent pattern-information studies of brain activity have used condition-rich designs to sample the stimulus space more densely. To test whether brain response patterns discriminate among a set of stimuli (e.g. exemplars within a category) with good sensitivit…
▽ More
Representational distinctions within categories are important in all perceptual modalities and also in cognitive and motor representations. Recent pattern-information studies of brain activity have used condition-rich designs to sample the stimulus space more densely. To test whether brain response patterns discriminate among a set of stimuli (e.g. exemplars within a category) with good sensitivity, we can pool statistical evidence over all pairwise comparisons. Here we describe a wide range of statistical tests of exemplar discriminability and assess the validity (specificity) and power (sensitivity) of each test. The tests include previously used and novel, parametric and nonparametric tests, which treat subject as a random or fixed effect, and are based on different dissimilarity measures, different test statistics, and different inference procedures. We use simulated and real data to determine which tests are valid and which are most sensitive. A popular test statistic reflecting exemplar information is the exemplar discriminability index (EDI), which is defined as the average of the pattern dissimilarity estimates between different exemplars minus the average of the pattern dissimilarity estimates between repetitions of identical exemplars. The popular across-subject t test of the EDI (typically using correlation distance as the pattern dissimilarity measure) requires the assumption that the EDI is 0-mean normal under H0. Although this assumption is not strictly true, our simulations suggest that the test controls the false-positives rate at the nominal level, and is thus valid, in practice. However, test statistics based on average Mahalanobis distances or average linear-discriminant t values (both accounting for the multivariate error covariance among responses) are substantially more powerful for both random- and fixed-effects inference.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Visualizing the geometry of labeled high-dimensional data with spheres
Authors:
Andrew D Zaharia,
Anish S Potnis,
Alexander Walther,
Nikolaus Kriegeskorte
Abstract:
Data visualizations summarize high-dimensional distributions in two or three dimensions. Dimensionality reduction entails a loss of information, and what is preserved differs between methods. Existing methods preserve the local or the global geometry of the points, and most techniques do not consider labels. Here we introduce "hypersphere2sphere" (H2S), a new method that aims to visualize not the…
▽ More
Data visualizations summarize high-dimensional distributions in two or three dimensions. Dimensionality reduction entails a loss of information, and what is preserved differs between methods. Existing methods preserve the local or the global geometry of the points, and most techniques do not consider labels. Here we introduce "hypersphere2sphere" (H2S), a new method that aims to visualize not the points, but the relationships between the labeled distributions. H2S fits a hypersphere to each labeled set of points in a high-dimensional space and visualizes each hypersphere as a sphere in 3D (or circle in 2D). H2S perfectly captures the geometry of up to 4 hyperspheres in 3D (or 3 in 2D), and approximates the geometry for larger numbers of distributions, matching the sizes (radii), and the pairwise separations (between-center distances) and overlaps (along the center-connection line). The resulting visualizations are robust to sampling imbalances. Leveraging labels and the sphere as the simplest geometrical primitive, H2S provides an important addition to the toolbox of visualization techniques.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.