-
clustra: A multi-platform k-means clustering algorithm for analysis of longitudinal trajectories in large electronic health records data
Authors:
Nimish Adhikari,
Hanna Gerlovin,
George Ostrouchov,
Rachel Ehrbar,
Alyssa B. Dufour,
Brian R. Ferolito,
Serkalem Demissie,
Lauren Costa,
Yuk-Lam Ho,
Laura Tarko,
Edmon Begoli,
Kelly Cho,
David R. Gagnon
Abstract:
Background and Objective: Variables collected over time, or longitudinally, such as biologic measurements in electronic health records data, are not simple to summarize with a single time-point, and thus can be more holistically conceptualized as trajectories over time. Cluster analysis with longitudinal data further allows for clinical representation of groups of subjects with similar trajectorie…
▽ More
Background and Objective: Variables collected over time, or longitudinally, such as biologic measurements in electronic health records data, are not simple to summarize with a single time-point, and thus can be more holistically conceptualized as trajectories over time. Cluster analysis with longitudinal data further allows for clinical representation of groups of subjects with similar trajectories and identification of unique characteristics, or phenotypes, that can be investigated as risk factors or disease outcomes. Some of the challenges in estimating these clustered trajectories lie in the handling of observations at inconsistent time intervals and the usability of algorithms across programming languages.
Methods: We propose longitudinal trajectory clustering using a k-means algorithm with thin-plate regression splines, implemented across multiple platforms, the R package clustra and corresponding \SAS macros. The \SAS macros accommodate flexible clustering approaches, and also include visualization of the clusters, and silhouette plots for diagnostic evaluation of the appropriate cluster number. The R package, designed in parallel, has similar functionality, with additional multi-core processing and Rand-index-based diagnostics.
Results: The package and macros achieve comparable results when applied to an example of simulated blood pressure measurements based on real data from Veterans Affairs Healthcare recipients who were initiated on anti-hypertensive medication.
Conclusion: The R package clustra and the SAS macros integrate a K-means clustering algorithm for longitudinal trajectories that operates with large electronic health record data. The implementations provide comparable results in both platforms, satisfying the needs of investigators familiar with, or constrained by access to, one or the other platform.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Bayesian Predictive Coding
Authors:
Alexander Tschantz,
Magnus Koudahl,
Hampus Linander,
Lancelot Da Costa,
Conor Heins,
Jeff Beck,
Christopher Buckley
Abstract:
Predictive coding (PC) is an influential theory of information processing in the brain, providing a biologically plausible alternative to backpropagation. It is motivated in terms of Bayesian inference, as hidden states and parameters are optimised via gradient descent on variational free energy. However, implementations of PC rely on maximum \textit{a posteriori} (MAP) estimates of hidden states…
▽ More
Predictive coding (PC) is an influential theory of information processing in the brain, providing a biologically plausible alternative to backpropagation. It is motivated in terms of Bayesian inference, as hidden states and parameters are optimised via gradient descent on variational free energy. However, implementations of PC rely on maximum \textit{a posteriori} (MAP) estimates of hidden states and maximum likelihood (ML) estimates of parameters, limiting their ability to quantify epistemic uncertainty. In this work, we investigate a Bayesian extension to PC that estimates a posterior distribution over network parameters. This approach, termed Bayesian Predictive coding (BPC), preserves the locality of PC and results in closed-form Hebbian weight updates. Compared to PC, our BPC algorithm converges in fewer epochs in the full-batch setting and remains competitive in the mini-batch setting. Additionally, we demonstrate that BPC offers uncertainty quantification comparable to existing methods in Bayesian deep learning, while also improving convergence properties. Together, these results suggest that BPC provides a biologically plausible method for Bayesian learning in the brain, as well as an attractive approach to uncertainty quantification in deep learning.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Modeling sparsity in count-weighted networks
Authors:
Andressa Cerqueira,
Laila L. S. Costa
Abstract:
Community detection methods have been extensively studied to recover communities structures in network data. While many models and methods focus on binary data, real-world networks also present the strength of connections, which could be considered in the network analysis. We propose a probabilistic model for generating weighted networks that allows us to control network sparsity and incorporates…
▽ More
Community detection methods have been extensively studied to recover communities structures in network data. While many models and methods focus on binary data, real-world networks also present the strength of connections, which could be considered in the network analysis. We propose a probabilistic model for generating weighted networks that allows us to control network sparsity and incorporates degree corrections for each node. We propose a community detection method based on the Variational Expectation-Maximization (VEM) algorithm. We show that the proposed method works well in practice for simulated networks. We analyze the Brazilian airport network to compare the community structures before and during the COVID-19 pandemic.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Authors:
Hugo Thimonier,
José Lucas De Melo Costa,
Fabrice Popineau,
Arpad Rimmel,
Bich-Liên Doan
Abstract:
Self-supervision is often used for pre-training to foster performance on a downstream task by constructing meaningful representations of samples. Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations that are challenging to construct for tabular data. This constitutes one of the main challenges of self-supervision for s…
▽ More
Self-supervision is often used for pre-training to foster performance on a downstream task by constructing meaningful representations of samples. Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations that are challenging to construct for tabular data. This constitutes one of the main challenges of self-supervision for structured data. In the present work, we propose a novel augmentation-free SSL method for tabular data. Our approach, T-JEPA, relies on a Joint Embedding Predictive Architecture (JEPA) and is akin to mask reconstruction in the latent space. It involves predicting the latent representation of one subset of features from the latent representation of a different subset within the same sample, thereby learning rich representations without augmentations. We use our method as a pre-training technique and train several deep classifiers on the obtained representation. Our experimental results demonstrate a substantial improvement in both classification and regression tasks, outperforming models trained directly on samples in their original data space. Moreover, T-JEPA enables some methods to consistently outperform or match the performance of traditional methods likes Gradient Boosted Decision Trees. To understand why, we extensively characterize the obtained representations and show that T-JEPA effectively identifies relevant features for downstream tasks without access to the labels. Additionally, we introduce regularization tokens, a novel regularization method critical for training of JEPA-based models on structured data.
△ Less
Submitted 3 May, 2025; v1 submitted 7 October, 2024;
originally announced October 2024.
-
A theory of generalised coordinates for stochastic differential equations
Authors:
Lancelot Da Costa,
Nathaël Da Costa,
Conor Heins,
Johan Medrano,
Grigorios A. Pavliotis,
Thomas Parr,
Ajith Anil Meera,
Karl Friston
Abstract:
Stochastic differential equations are ubiquitous modelling tools in physics and the sciences. In most modelling scenarios, random fluctuations driving dynamics or motion have some non-trivial temporal correlation structure, which renders the SDE non-Markovian; a phenomenon commonly known as ``colored'' noise. Thus, an important objective is to develop effective tools for mathematically and numeric…
▽ More
Stochastic differential equations are ubiquitous modelling tools in physics and the sciences. In most modelling scenarios, random fluctuations driving dynamics or motion have some non-trivial temporal correlation structure, which renders the SDE non-Markovian; a phenomenon commonly known as ``colored'' noise. Thus, an important objective is to develop effective tools for mathematically and numerically studying (possibly non-Markovian) SDEs. In this report, we formalise a mathematical theory for analysing and numerically studying SDEs based on so-called `generalised coordinates of motion'. Like the theory of rough paths, we analyse SDEs pathwise for any given realisation of the noise, not solely probabilistically. Like the established theory of Markovian realisation, we realise non-Markovian SDEs as a Markov process in an extended space. Unlike the established theory of Markovian realisation however, the Markovian realisations here are accurate on short timescales and may be exact globally in time, when flows and fluctuations are analytic. This theory is exact for SDEs with analytic flows and fluctuations, and is approximate when flows and fluctuations are differentiable. It provides useful analysis tools, which we employ to solve linear SDEs with analytic fluctuations. It may also be useful for studying rougher SDEs, as these may be identified as the limit of smoother ones. This theory supplies effective, computationally straightforward methods for simulation, filtering and control of SDEs; amongst others, we re-derive generalised Bayesian filtering, a state-of-the-art method for time-series analysis. Looking forward, this report suggests that generalised coordinates have far-reaching applications throughout stochastic differential equations.
△ Less
Submitted 18 April, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Sample Path Regularity of Gaussian Processes from the Covariance Kernel
Authors:
Nathaël Da Costa,
Marvin Pförtner,
Lancelot Da Costa,
Philipp Hennig
Abstract:
Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a…
▽ More
Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a covariance kernel. In this paper we provide necessary and sufficient conditions on the covariance kernel for the sample paths of the corresponding GP to attain a given regularity. We use the framework of Hölder regularity as it grants particularly straightforward conditions, which simplify further in the cases of stationary and isotropic GPs. We then demonstrate that our results allow for novel and unusually tight characterisations of the sample path regularities of the GPs commonly used in machine learning applications, such as the Matérn GPs.
△ Less
Submitted 16 February, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents
Authors:
Alessandro Barp,
Lancelot Da Costa,
Guilherme França,
Karl Friston,
Mark Girolami,
Michael I. Jordan,
Grigorios A. Pavliotis
Abstract:
In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving pr…
▽ More
In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein.
△ Less
Submitted 25 July, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
The Classic Cross-Correlation and the Real-Valued Jaccard and Coincidence Indices
Authors:
Luciano da F. Costa
Abstract:
In this work we describe and compare the classic inner product and Pearson correlation coefficient as well as the recently introduced real-valued Jaccard and coincidence indices. Special attention is given to diverse schemes for taking into account the signs of the operands, as well as on the study of the geometry of the scalar field surface related to the generalized multiset binary operations un…
▽ More
In this work we describe and compare the classic inner product and Pearson correlation coefficient as well as the recently introduced real-valued Jaccard and coincidence indices. Special attention is given to diverse schemes for taking into account the signs of the operands, as well as on the study of the geometry of the scalar field surface related to the generalized multiset binary operations underling the considered similarity indices. The possibility to split the classic inner product, cross-correlation, and Pearson correlation coefficient is also described.
△ Less
Submitted 25 November, 2021;
originally announced December 2021.
-
Active inference, Bayesian optimal design, and expected utility
Authors:
Noor Sajid,
Lancelot Da Costa,
Thomas Parr,
Karl Friston
Abstract:
Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference combines Bayesian decision theory and optimal Bayesian design principles under a single imperative to minimize expected free energy. It is this aspect of active in…
▽ More
Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference combines Bayesian decision theory and optimal Bayesian design principles under a single imperative to minimize expected free energy. It is this aspect of active inference that allows for the natural emergence of information-seeking behavior. When removing prior outcomes preferences from expected free energy, active inference reduces to optimal Bayesian design, i.e., information gain maximization. Conversely, active inference reduces to Bayesian decision theory in the absence of ambiguity and relative risk, i.e., expected utility maximization. Using these limiting cases, we illustrate how behaviors differ when agents select actions that optimize expected utility, expected information gain, and expected free energy. Our T-maze simulations show optimizing expected free energy produces goal-directed information-seeking behavior while optimizing expected utility induces purely exploitive behavior and maximizing information gain engenders intrinsically motivated behavior.
△ Less
Submitted 21 September, 2021;
originally announced October 2021.
-
Power laws in the Roman Empire: a survival analysis
Authors:
Pedro L. Ramos,
Luciano da F. Costa,
Francisco Louzada,
Francisco A. Rodrigues
Abstract:
The Roman Empire shaped Western civilization, and many Roman principles are embodied in modern institutions. Although its political institutions proved both resilient and adaptable, allowing it to incorporate diverse populations, the Empire suffered from many internal conflicts. Indeed, most emperors died violently, from assassination, suicide, or in battle. These internal conflicts produced patte…
▽ More
The Roman Empire shaped Western civilization, and many Roman principles are embodied in modern institutions. Although its political institutions proved both resilient and adaptable, allowing it to incorporate diverse populations, the Empire suffered from many internal conflicts. Indeed, most emperors died violently, from assassination, suicide, or in battle. These internal conflicts produced patterns in the length of time that can be identified by statistical analysis. In this paper, we study the underlying patterns associated with the reign of the Roman emperors by using statistical tools of survival data analysis. We consider all the 175 Roman emperors and propose a new power-law model with change points to predict the time-to-violent-death of the Roman emperors. This model encompasses data in the presence of censoring and long-term survivors, providing more accurate predictions than previous models. Our results show that power-law distributions can also occur in survival data, as verified in other data types from natural and artificial systems, reinforcing the ubiquity of power law distributions. The generality of our approach paves the way to further related investigations not only in other ancient civilizations but also in applications in engineering and medicine.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Revisiting Agglomerative Clustering
Authors:
Eric K. Tokuda,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involvi…
▽ More
An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. This paved the way to defining an objective means for identifying the clusters from dendrograms. The adopted model also allowed the relevance of the clusters to be quantified in terms of the height of their subtrees. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. The possibility of identifying the type of distribution was also investigated.
△ Less
Submitted 26 June, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Pattern Recognition Approach to Violin Shapes of MIMO database
Authors:
Thomas Peron,
Francisco A. Rodrigues,
Luciano da F. Costa
Abstract:
Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this…
▽ More
Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this paper we study the morphology of violins retrieved from the Musical Instrument Museums Online (MIMO) database -- the largest freely accessible platform providing information about instruments held in public museums. From the violin images, we derive a set of measurements that reflect relevant geometrical features of the instruments. The application of Principal Component Analysis (PCA) uncovered similarities between violin makers and their respective copyists, as well as among luthiers belonging to the same family lineage, in the context of historical narrative. Combined with a time-windowed approach, thin plate splines visualizations revealed that the average violin outline has remained mostly stable over time, not adhering to any particular trends of design across different periods in music history.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
Principal Component Analysis: A Natural Approach to Data Exploration
Authors:
Felipe L. Gewers,
Gustavo R. Ferreira,
Henrique F. de Arruda,
Filipi N. Silva,
Cesar H. Comin,
Diego R. Amancio,
Luciano da F. Costa
Abstract:
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative wor…
▽ More
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative works illustrating the potential of PCA applications to a wide range of areas. An experimental investigation of the ability of PCA for variance explanation and dimensionality reduction is also developed, which confirms the efficacy of PCA and also shows that standardizing or not the original data can have important effects on the obtained results. Overall, we believe the several covered issues can assist researchers from the most diverse areas in using and interpreting PCA.
△ Less
Submitted 19 June, 2018; v1 submitted 6 April, 2018;
originally announced April 2018.
-
Clustering Algorithms: A Comparative Approach
Authors:
Mayra Z. Rodriguez,
Cesar H. Comin,
Dalcimar Casanova,
Odemir M. Bruno,
Diego R. Amancio,
Francisco A. Rodrigues,
Luciano da F. Costa
Abstract:
Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While a myriad of classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in…
▽ More
Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While a myriad of classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in many possible scenarios. In this context, we performed a systematic comparison of 7 well-known clustering methods available in the R language. In order to account for the many possible variations of data, we considered artificial datasets with several tunable properties (number of classes, separation between classes, etc). In addition, we also evaluated the sensitivity of the clustering methods with regard to their parameters configuration. The results revealed that, when considering the default configurations of the adopted methods, the spectral approach usually outperformed the other clustering algorithms. We also found that the default configuration of the adopted implementations was not accurate. In these cases, a simple approach based on random selection of parameters values proved to be a good alternative to improve the performance. All in all, the reported approach provides subsidies guiding the choice of clustering algorithms.
△ Less
Submitted 26 December, 2016;
originally announced December 2016.
-
Complex systems: features, similarity and connectivity
Authors:
Cesar H. Comin,
Thomas K. DM. Peron,
Filipi N. Silva,
Diego R. Amancio,
Francisco A. Rodrigues,
Luciano da F. Costa
Abstract:
The increasing interest in complex networks research has been a consequence of several intrinsic features of this area, such as the generality of the approach to represent and model virtually any discrete system, and the incorporation of concepts and methods deriving from many areas, from statistical physics to sociology, which are often used in an independent way. Yet, for this same reason, it wo…
▽ More
The increasing interest in complex networks research has been a consequence of several intrinsic features of this area, such as the generality of the approach to represent and model virtually any discrete system, and the incorporation of concepts and methods deriving from many areas, from statistical physics to sociology, which are often used in an independent way. Yet, for this same reason, it would be desirable to integrate these various aspects into a more coherent and organic framework, which would imply in several benefits normally allowed by the systematization in science, including the identification of new types of problems and the cross-fertilization between fields. More specifically, the identification of the main areas to which the concepts frequently used in complex networks can be applied paves the way to adopting and applying a larger set of concepts and methods deriving from those respective areas. Among the several areas that have been used in complex networks research, pattern recognition, optimization, linear algebra, and time series analysis seem to play a more basic and recurrent role. In the present manuscript, we propose a systematic way to integrate the concepts from these diverse areas regarding complex networks research. In order to do so, we start by grouping the multidisciplinary concepts into three main groups, namely features, similarity, and network connectivity. Then we show that several of the analysis and modeling approaches to complex networks can be thought as a composition of maps between these three groups, with emphasis on nine main types of mappings, which are presented and illustrated. Such a systematization of principles and approaches also provides an opportunity to review some of the most closely related works in the literature, which is also developed in this article.
△ Less
Submitted 16 June, 2016;
originally announced June 2016.
-
Searching Multiregression Dynamic Models of Resting-State fMRI Networks Using Integer Programming
Authors:
Lilia Costa,
Jim Smith,
Thomas Nichols,
James Cussens,
Eugene P. Duff,
Tamar R. Makin
Abstract:
A Multiregression Dynamic Model (MDM) is a class of multivariate time series that represents various dynamic causal processes in a graphical way. One of the advantages of this class is that, in contrast to many other Dynamic Bayesian Networks, the hypothesised relationships accommodate conditional conjugate inference. We demonstrate for the first time how straightforward it is to search over all p…
▽ More
A Multiregression Dynamic Model (MDM) is a class of multivariate time series that represents various dynamic causal processes in a graphical way. One of the advantages of this class is that, in contrast to many other Dynamic Bayesian Networks, the hypothesised relationships accommodate conditional conjugate inference. We demonstrate for the first time how straightforward it is to search over all possible connectivity networks with dynamically changing intensity of transmission to find the Maximum a Posteriori Probability (MAP) model within this class. This search method is made feasible by using a novel application of an Integer Programming algorithm. The efficacy of applying this particular class of dynamic models to this domain is shown and more specifically the computational efficiency of a corresponding search of 11-node Directed Acyclic Graph (DAG) model space. We proceed to show how diagnostic methods, analogous to those defined for static Bayesian Networks, can be used to suggest embellishment of the model class to extend the process of model selection. All methods are illustrated using simulated and real resting-state functional Magnetic Resonance Imaging (fMRI) data.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.
-
Towards a Multi-Subject Analysis of Neural Connectivity
Authors:
Chris J. Oates,
Lilia Carneiro da Costa,
Tom Nichols
Abstract:
Directed acyclic graphs (DAGs) and associated probability models are widely used to model neural connectivity and communication channels. In many experiments, data are collected from multiple subjects whose connectivities may differ but are likely to share many features. In such circumstances it is natural to leverage similarity between subjects to improve statistical efficiency. The first exact a…
▽ More
Directed acyclic graphs (DAGs) and associated probability models are widely used to model neural connectivity and communication channels. In many experiments, data are collected from multiple subjects whose connectivities may differ but are likely to share many features. In such circumstances it is natural to leverage similarity between subjects to improve statistical efficiency. The first exact algorithm for estimation of multiple related DAGs was recently proposed by Oates et al. 2014; in this letter we present examples and discuss implications of the methodology as applied to the analysis of fMRI data from a multi-subject experiment. Elicitation of tuning parameters requires care and we illustrate how this may proceed retrospectively based on technical replicate data. In addition to joint learning of subject-specific connectivity, we allow for heterogeneous collections of subjects and simultaneously estimate relationships between the subjects themselves. This letter aims to highlight the potential for exact estimation in the multi-subject setting.
△ Less
Submitted 14 November, 2014; v1 submitted 4 April, 2014;
originally announced April 2014.
-
A quantitative approach to evolution of music and philosophy
Authors:
Vilson Vieira,
Renato Fabbri,
Gonzalo Travieso,
Osvaldo N. Oliveira Jr.,
Luciano da Fontoura Costa
Abstract:
The development of new statistical and computational methods is increasingly making it possible to bridge the gap between hard sciences and humanities. In this study, we propose an approach based on a quantitative evaluation of attributes of objects in fields of humanities, from which concepts such as dialectics and opposition are formally defined mathematically. As case studies, we analyzed the t…
▽ More
The development of new statistical and computational methods is increasingly making it possible to bridge the gap between hard sciences and humanities. In this study, we propose an approach based on a quantitative evaluation of attributes of objects in fields of humanities, from which concepts such as dialectics and opposition are formally defined mathematically. As case studies, we analyzed the temporal evolution of classical music and philosophy by obtaining data for 8 features characterizing the corresponding fields for 7 well-known composers and philosophers, which were treated with multivariate statistics and pattern recognition methods. A bootstrap method was applied to avoid statistical bias caused by the small sample data set, with which hundreds of artificial composers and philosophers were generated, influenced by the 7 names originally chosen. Upon defining indices for opposition, skewness and counter-dialectics, we confirmed the intuitive analysis of historians in that classical music evolved according to a master-apprentice tradition, while in philosophy changes were driven by opposition. Though these case studies were meant only to show the possibility of treating phenomena in humanities quantitatively, including a quantitative measure of concepts such as dialectics and opposition the results are encouraging for further application of the approach presented here to many other areas, since it is entirely generic.
△ Less
Submitted 13 November, 2013;
originally announced March 2014.
-
A Quantitative Approach to Painting Styles
Authors:
Vilson Vieira,
Renato Fabbri,
David Sbrissa,
Luciano da Fontoura Costa,
Gonzalo Travieso
Abstract:
This research extends a method previously applied to music and philosophy,representing the evolution of art as a time-series where relations like dialectics are measured quantitatively. For that, a corpus of paintings of 12 well-known artists from baroque and modern art is analyzed. A set of 93 features is extracted and the features which most contributed to the classification of painters are sele…
▽ More
This research extends a method previously applied to music and philosophy,representing the evolution of art as a time-series where relations like dialectics are measured quantitatively. For that, a corpus of paintings of 12 well-known artists from baroque and modern art is analyzed. A set of 93 features is extracted and the features which most contributed to the classification of painters are selected. The projection space obtained provides the basis to the analysis of measurements. This quantitative measures underlie revealing observations about the evolution of painting styles, specially when compared with other humanity fields already analyzed: while music evolved along a master-apprentice tradition (high dialectics) and philosophy by opposition, painting presents another pattern: constant increasing skewness, low opposition between members of the same movement and opposition peaks in the transition between movements. Differences between baroque and modern movements are also observed in the projected "painting space": while baroque paintings are presented as an overlapped cluster, the modern paintings present minor overlapping and are disposed more widely in the projection than the baroque counterparts. This finding suggests that baroque painters shared aesthetics while modern painters tend to "break rules" and develop their own style.
△ Less
Submitted 13 November, 2013;
originally announced March 2014.
-
Can the evolution of music be analyzed in a quantitative manner?
Authors:
Vilson Vieira,
Renato Fabbri,
Gonzalo Travieso,
Luciano da Fontoura Costa
Abstract:
We propose a methodology to study music development by applying multivariate statistics on composers characteristics. Seven representative composers were considered in terms of eight main musical features. Grades were assigned to each characteristic and their correlations were analyzed. A bootstrap method was applied to simulate hundreds of artificial composers influenced by the seven representati…
▽ More
We propose a methodology to study music development by applying multivariate statistics on composers characteristics. Seven representative composers were considered in terms of eight main musical features. Grades were assigned to each characteristic and their correlations were analyzed. A bootstrap method was applied to simulate hundreds of artificial composers influenced by the seven representatives chosen. Afterwards we quantify non-numeric relations like dialectics, opposition and innovation. Composers differences on style and technique were represented as geometrical distances in the feature space, making it possible to quantify, for example, how much Bach and Stockhausen differ from other composers or how much Beethoven influenced Brahms. In addition, we compared the results with a prior investigation on philosophy. Opposition, strong on philosophy, was not remarkable on music. Supporting an observation already considered by music theorists, strong influences were identified between composers by the quantification of dialectics, implying inheritance and suggesting a stronger master-disciple evolution when compared to the philosophy analysis.
△ Less
Submitted 4 March, 2012; v1 submitted 21 September, 2011;
originally announced September 2011.
-
Unveiling the Relationship Between Structure and Dynamics in Complex Networks
Authors:
Cesar H. Comin,
João B. Bunoro,
Matheus P. Viana,
Luciano da F. Costa
Abstract:
Over the last years, a great deal of attention has been focused on complex networked systems, characterized by intricate structure and dynamics. The latter has been often represented in terms of overall statistics (e.g. average and standard deviations) of the time signals. While such approaches have led to many insights, they have failed to take into account that signals at different parts of the…
▽ More
Over the last years, a great deal of attention has been focused on complex networked systems, characterized by intricate structure and dynamics. The latter has been often represented in terms of overall statistics (e.g. average and standard deviations) of the time signals. While such approaches have led to many insights, they have failed to take into account that signals at different parts of the system can undergo distinct evolutions, which cannot be properly represented in terms of average values. A novel framework for identifying the principal aspects of the dynamics and how it is influenced by the network structure is proposed in this work. The potential of this approach is illustrated with respect to three important models (Integrate-and-Fire, SIS and Kuramoto), allowing the identification of highly structured dynamics, in the sense that different groups of nodes not only presented specific dynamics but also felt the structure of the network in different ways.
△ Less
Submitted 13 September, 2011;
originally announced September 2011.