Skip to main content

Showing 1–21 of 21 results for author: Costa, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2507.00962  [pdf, ps, other

    stat.CO stat.AP

    clustra: A multi-platform k-means clustering algorithm for analysis of longitudinal trajectories in large electronic health records data

    Authors: Nimish Adhikari, Hanna Gerlovin, George Ostrouchov, Rachel Ehrbar, Alyssa B. Dufour, Brian R. Ferolito, Serkalem Demissie, Lauren Costa, Yuk-Lam Ho, Laura Tarko, Edmon Begoli, Kelly Cho, David R. Gagnon

    Abstract: Background and Objective: Variables collected over time, or longitudinally, such as biologic measurements in electronic health records data, are not simple to summarize with a single time-point, and thus can be more holistically conceptualized as trajectories over time. Cluster analysis with longitudinal data further allows for clinical representation of groups of subjects with similar trajectorie… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 15 pages, 11 figures, clustra package available in https://cran.r-project.org/web/packages/clustra/index.html, SAS macros available in https://github.com/MVP-CHAMPION/clustra-SAS

  2. arXiv:2503.24016  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian Predictive Coding

    Authors: Alexander Tschantz, Magnus Koudahl, Hampus Linander, Lancelot Da Costa, Conor Heins, Jeff Beck, Christopher Buckley

    Abstract: Predictive coding (PC) is an influential theory of information processing in the brain, providing a biologically plausible alternative to backpropagation. It is motivated in terms of Bayesian inference, as hidden states and parameters are optimised via gradient descent on variational free energy. However, implementations of PC rely on maximum \textit{a posteriori} (MAP) estimates of hidden states… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  3. arXiv:2411.03100  [pdf, other

    stat.ME cs.SI

    Modeling sparsity in count-weighted networks

    Authors: Andressa Cerqueira, Laila L. S. Costa

    Abstract: Community detection methods have been extensively studied to recover communities structures in network data. While many models and methods focus on binary data, real-world networks also present the strength of connections, which could be considered in the network analysis. We propose a probabilistic model for generating weighted networks that allows us to control network sparsity and incorporates… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    MSC Class: 62Fxx

  4. arXiv:2410.05016  [pdf, other

    cs.LG stat.ML

    T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data

    Authors: Hugo Thimonier, José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan

    Abstract: Self-supervision is often used for pre-training to foster performance on a downstream task by constructing meaningful representations of samples. Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations that are challenging to construct for tabular data. This constitutes one of the main challenges of self-supervision for s… ▽ More

    Submitted 3 May, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025: https://openreview.net/forum?id=gx3LMRB15C

  5. arXiv:2409.15532  [pdf, other

    math.PR math.DS stat.ME

    A theory of generalised coordinates for stochastic differential equations

    Authors: Lancelot Da Costa, Nathaël Da Costa, Conor Heins, Johan Medrano, Grigorios A. Pavliotis, Thomas Parr, Ajith Anil Meera, Karl Friston

    Abstract: Stochastic differential equations are ubiquitous modelling tools in physics and the sciences. In most modelling scenarios, random fluctuations driving dynamics or motion have some non-trivial temporal correlation structure, which renders the SDE non-Markovian; a phenomenon commonly known as ``colored'' noise. Thus, an important objective is to develop effective tools for mathematically and numeric… ▽ More

    Submitted 18 April, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: 38 pages of main; 47 pages including abstract, TOC, Appendix and references

  6. arXiv:2312.14886  [pdf, other

    cs.LG math.PR math.ST stat.ML

    Sample Path Regularity of Gaussian Processes from the Covariance Kernel

    Authors: Nathaël Da Costa, Marvin Pförtner, Lancelot Da Costa, Philipp Hennig

    Abstract: Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a… ▽ More

    Submitted 16 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  7. arXiv:2203.10592  [pdf, other

    stat.ML cs.LG math.DG math.OC math.ST

    Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

    Authors: Alessandro Barp, Lancelot Da Costa, Guilherme França, Karl Friston, Mark Girolami, Michael I. Jordan, Grigorios A. Pavliotis

    Abstract: In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving pr… ▽ More

    Submitted 25 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: 30 pages, 4 figures; 42 pages including table of contents and references

    Journal ref: Handbook of Statistics, vol. 46, pp. 21--78 (2022)

  8. arXiv:2112.01369  [pdf, other

    stat.ME cs.IT

    The Classic Cross-Correlation and the Real-Valued Jaccard and Coincidence Indices

    Authors: Luciano da F. Costa

    Abstract: In this work we describe and compare the classic inner product and Pearson correlation coefficient as well as the recently introduced real-valued Jaccard and coincidence indices. Special attention is given to diverse schemes for taking into account the signs of the operands, as well as on the study of the geometry of the scalar field surface related to the generalized multiset binary operations un… ▽ More

    Submitted 25 November, 2021; originally announced December 2021.

    Comments: 9 pages, 8 figure. A preprint

  9. arXiv:2110.04074  [pdf

    stat.ML cs.AI cs.IT cs.LG

    Active inference, Bayesian optimal design, and expected utility

    Authors: Noor Sajid, Lancelot Da Costa, Thomas Parr, Karl Friston

    Abstract: Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference combines Bayesian decision theory and optimal Bayesian design principles under a single imperative to minimize expected free energy. It is this aspect of active in… ▽ More

    Submitted 21 September, 2021; originally announced October 2021.

    Comments: 19 pages; 3 figures

  10. arXiv:2008.10344  [pdf, other

    stat.AP

    Power laws in the Roman Empire: a survival analysis

    Authors: Pedro L. Ramos, Luciano da F. Costa, Francisco Louzada, Francisco A. Rodrigues

    Abstract: The Roman Empire shaped Western civilization, and many Roman principles are embodied in modern institutions. Although its political institutions proved both resilient and adaptable, allowing it to incorporate diverse populations, the Empire suffered from many internal conflicts. Indeed, most emperors died violently, from assassination, suicide, or in battle. These internal conflicts produced patte… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 18 pages, 6 figures

  11. arXiv:2005.07995  [pdf, other

    cs.LG cs.CV stat.ML

    Revisiting Agglomerative Clustering

    Authors: Eric K. Tokuda, Cesar H. Comin, Luciano da F. Costa

    Abstract: An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involvi… ▽ More

    Submitted 26 June, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

  12. arXiv:1808.02848  [pdf, other

    stat.AP cs.CV

    Pattern Recognition Approach to Violin Shapes of MIMO database

    Authors: Thomas Peron, Francisco A. Rodrigues, Luciano da F. Costa

    Abstract: Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

  13. arXiv:1804.02502  [pdf, other

    cs.CE stat.CO stat.ME

    Principal Component Analysis: A Natural Approach to Data Exploration

    Authors: Felipe L. Gewers, Gustavo R. Ferreira, Henrique F. de Arruda, Filipi N. Silva, Cesar H. Comin, Diego R. Amancio, Luciano da F. Costa

    Abstract: Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative wor… ▽ More

    Submitted 19 June, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

    Journal ref: ACM Computing Surveys (CSUR), 54(4), pp.1-34 (2021)

  14. arXiv:1612.08388  [pdf, other

    cs.LG stat.ML

    Clustering Algorithms: A Comparative Approach

    Authors: Mayra Z. Rodriguez, Cesar H. Comin, Dalcimar Casanova, Odemir M. Bruno, Diego R. Amancio, Francisco A. Rodrigues, Luciano da F. Costa

    Abstract: Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While a myriad of classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in… ▽ More

    Submitted 26 December, 2016; originally announced December 2016.

  15. arXiv:1606.05400  [pdf, other

    physics.soc-ph physics.data-an stat.ML

    Complex systems: features, similarity and connectivity

    Authors: Cesar H. Comin, Thomas K. DM. Peron, Filipi N. Silva, Diego R. Amancio, Francisco A. Rodrigues, Luciano da F. Costa

    Abstract: The increasing interest in complex networks research has been a consequence of several intrinsic features of this area, such as the generality of the approach to represent and model virtually any discrete system, and the incorporation of concepts and methods deriving from many areas, from statistical physics to sociology, which are often used in an independent way. Yet, for this same reason, it wo… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

  16. arXiv:1505.06832  [pdf, ps, other

    stat.ME math.ST

    Searching Multiregression Dynamic Models of Resting-State fMRI Networks Using Integer Programming

    Authors: Lilia Costa, Jim Smith, Thomas Nichols, James Cussens, Eugene P. Duff, Tamar R. Makin

    Abstract: A Multiregression Dynamic Model (MDM) is a class of multivariate time series that represents various dynamic causal processes in a graphical way. One of the advantages of this class is that, in contrast to many other Dynamic Bayesian Networks, the hypothesised relationships accommodate conditional conjugate inference. We demonstrate for the first time how straightforward it is to search over all p… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

    Comments: Published at http://dx.doi.org/10.1214/14-BA913 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

    Report number: VTeX-BA-BA913

    Journal ref: Bayesian Analysis 2015, Vol. 10, No. 2, 441-478

  17. arXiv:1404.1239  [pdf, other

    stat.ME

    Towards a Multi-Subject Analysis of Neural Connectivity

    Authors: Chris J. Oates, Lilia Carneiro da Costa, Tom Nichols

    Abstract: Directed acyclic graphs (DAGs) and associated probability models are widely used to model neural connectivity and communication channels. In many experiments, data are collected from multiple subjects whose connectivities may differ but are likely to share many features. In such circumstances it is natural to leverage similarity between subjects to improve statistical efficiency. The first exact a… ▽ More

    Submitted 14 November, 2014; v1 submitted 4 April, 2014; originally announced April 2014.

    Comments: to appear in Neural Computation 27:1-20

  18. A quantitative approach to evolution of music and philosophy

    Authors: Vilson Vieira, Renato Fabbri, Gonzalo Travieso, Osvaldo N. Oliveira Jr., Luciano da Fontoura Costa

    Abstract: The development of new statistical and computational methods is increasingly making it possible to bridge the gap between hard sciences and humanities. In this study, we propose an approach based on a quantitative evaluation of attributes of objects in fields of humanities, from which concepts such as dialectics and opposition are formally defined mathematically. As case studies, we analyzed the t… ▽ More

    Submitted 13 November, 2013; originally announced March 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1109.4653

    MSC Class: 62A01

    Journal ref: J. Stat. Mech. (2012) P08010

  19. arXiv:1403.4512  [pdf, other

    stat.AP cs.OH

    A Quantitative Approach to Painting Styles

    Authors: Vilson Vieira, Renato Fabbri, David Sbrissa, Luciano da Fontoura Costa, Gonzalo Travieso

    Abstract: This research extends a method previously applied to music and philosophy,representing the evolution of art as a time-series where relations like dialectics are measured quantitatively. For that, a corpus of paintings of 12 well-known artists from baroque and modern art is analyzed. A set of 93 features is extracted and the features which most contributed to the classification of painters are sele… ▽ More

    Submitted 13 November, 2013; originally announced March 2014.

  20. arXiv:1109.4653  [pdf, ps, other

    physics.data-an cs.SD stat.ME

    Can the evolution of music be analyzed in a quantitative manner?

    Authors: Vilson Vieira, Renato Fabbri, Gonzalo Travieso, Luciano da Fontoura Costa

    Abstract: We propose a methodology to study music development by applying multivariate statistics on composers characteristics. Seven representative composers were considered in terms of eight main musical features. Grades were assigned to each characteristic and their correlations were analyzed. A bootstrap method was applied to simulate hundreds of artificial composers influenced by the seven representati… ▽ More

    Submitted 4 March, 2012; v1 submitted 21 September, 2011; originally announced September 2011.

    Comments: 8 pages, 6 figures, added references for sections 1 and 4.C, better mathematical description on section 2. New values and interpretation, now considering a bootstrap method

  21. arXiv:1109.2963  [pdf, other

    physics.data-an cs.SI nlin.CD physics.soc-ph stat.ME

    Unveiling the Relationship Between Structure and Dynamics in Complex Networks

    Authors: Cesar H. Comin, João B. Bunoro, Matheus P. Viana, Luciano da F. Costa

    Abstract: Over the last years, a great deal of attention has been focused on complex networked systems, characterized by intricate structure and dynamics. The latter has been often represented in terms of overall statistics (e.g. average and standard deviations) of the time signals. While such approaches have led to many insights, they have failed to take into account that signals at different parts of the… ▽ More

    Submitted 13 September, 2011; originally announced September 2011.

    Comments: 16 pages, 10 figures