Skip to main content

Showing 1–50 of 65 results for author: Matteson, D

.
  1. arXiv:2506.21723  [pdf, ps, other

    stat.AP cs.CY stat.ME

    Dynamic Bayesian Item Response Model with Decomposition (D-BIRD): Modeling Cohort and Individual Learning Over Time

    Authors: Hansol Lee, Jason B. Cho, David S. Matteson, Benjamin W. Domingue

    Abstract: We present D-BIRD, a Bayesian dynamic item response model for estimating student ability from sparse, longitudinal assessments. By decomposing ability into a cohort trend and individual trajectory, D-BIRD supports interpretable modeling of learning over time. We evaluate parameter recovery in simulation and demonstrate the model using real-world personalized learning data.

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Submitted to the NCME Special Conference: Artificial Intelligence in Measurement and Education Conference (AIME-Con)

  2. arXiv:2408.11315  [pdf, ps, other

    stat.ME stat.AP stat.CO

    Smoothing Variances Across Time: Adaptive Stochastic Volatility

    Authors: Jason B. Cho, David S. Matteson

    Abstract: We introduce a novel Bayesian framework for estimating time-varying volatility by extending the Random Walk Stochastic Volatility (RWSV) model with Dynamic Shrinkage Processes (DSP) in log-variances. Unlike the classical Stochastic Volatility (SV) or GARCH-type models with restrictive parametric stationarity assumptions, our proposed Adaptive Stochastic Volatility (ASV) model provides smooth yet d… ▽ More

    Submitted 4 June, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2407.17669  [pdf

    cond-mat.mtrl-sci

    Atomic Resolution Observations of Nanoparticle Surface Dynamics and Instabilities Enabled by Artificial Intelligence

    Authors: Peter A. Crozier, Matan Leibovich, Piyush Haluai, Mai Tan, Andrew M. Thomas, Joshua Vincent, Sreyas Mohan, Adria Marcos Morales, Shreyas A. Kulkarni, David S. Matteson, Yifan Wang, Carlos Fernandez-Granda

    Abstract: Nanoparticle surface structural dynamics is believed to play a significant role in regulating functionalities such as diffusion, reactivity, and catalysis but the atomic-level processes are not well understood. Atomic resolution characterization of nanoparticle surface dynamics is challenging since it requires both high spatial and temporal resolution. Though ultrafast transmission electron micros… ▽ More

    Submitted 2 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  4. arXiv:2406.19702  [pdf, ps, other

    stat.ME econ.EM

    Vector AutoRegressive Moving Average Models: A Review

    Authors: Marie-Christine Düker, David S. Matteson, Ruey S. Tsay, Ines Wilms

    Abstract: Vector AutoRegressive Moving Average (VARMA) models form a powerful and general model class for analyzing dynamics among multiple time series. While VARMA models encompass the Vector AutoRegressive (VAR) models, their popularity in empirical applications is dominated by the latter. Can this phenomenon be explained fully by the simplicity of VAR models? Perhaps many users of VAR models have not ful… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2401.02917  [pdf, other

    stat.ME

    Bayesian changepoint detection via logistic regression and the topological analysis of image series

    Authors: Andrew M. Thomas, Michael Jauch, David S. Matteson

    Abstract: We present a Bayesian method for multivariate changepoint detection that allows for simultaneous inference on the location of a changepoint and the coefficients of a logistic regression model for distinguishing pre-changepoint data from post-changepoint data. In contrast to many methods for multivariate changepoint detection, the proposed method is applicable to data of mixed type and avoids stric… ▽ More

    Submitted 7 March, 2025; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: 39 pages (main), 24 pages (supplementary), 13 figures, and 11 tables

  6. arXiv:2309.00080  [pdf, other

    stat.ME

    Locally Adaptive Shrinkage Priors for Trends and Breaks in Count Time Series

    Authors: Toryn L. J. Schafer, David S. Matteson

    Abstract: Non-stationary count time series characterized by features such as abrupt changes and fluctuations about the trend arise in many scientific domains including biophysics, ecology, energy, epidemiology, and social science domains. Current approaches for integer-valued time series lack the flexibility to capture local transient features while more flexible models for continuous data types are inadequ… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: 31 pages, 6 figures

  7. Dynamic Atomic Column Detection in Transmission Electron Microscopy Videos via Ridge Estimation

    Authors: Yuchen Xu, Andrew M. Thomas, Peter A. Crozier, David S. Matteson

    Abstract: Ridge detection is a classical tool to extract curvilinear features in image processing. As such, it has great promise in applications to material science problems; specifically, for trend filtering relatively stable atom-shaped objects in image sequences, such as Transmission Electron Microscopy (TEM) videos. Standard analysis of TEM videos is limited to frame-by-frame object recognition. We inst… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: 27 pages, 11 figures

    Journal ref: IEEE Transactions on Image Processing (2025)

  8. Non-fungible token transactions: data and challenges

    Authors: Jason B. Cho, Sven Serneels, David S. Matteson

    Abstract: Non-fungible tokens (NFT) have recently emerged as a novel blockchain hosted financial asset class that has attracted major transaction volumes. Investment decisions rely on data and adequate preprocessing and application of analytics to them. Both owing to the non-fungible nature of the tokens and to a blockchain being the primary data source, NFT transaction data pose several challenges not comm… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Journal ref: Data Science in Science 2:1 (2023)

  9. arXiv:2209.13584  [pdf, other

    stat.AP eess.IV

    Feature detection and hypothesis testing for extremely noisy nanoparticle images using topological data analysis

    Authors: Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, David S. Matteson

    Abstract: We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in transmission electron microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanopart… ▽ More

    Submitted 18 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: 42 pages, 21 figures, 9 tables

    MSC Class: 62R40; 62P35

  10. arXiv:2207.00039  [pdf, other

    stat.ME stat.CO stat.ML

    K-ARMA Models for Clustering Time Series Data

    Authors: Derek O. Hoare, David S. Matteson, Martin T. Wells

    Abstract: We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm which we call K-Models. We prove the convergence of this general algorithm and relate it to the hard-EM algorithm for mixture modeling. We then apply our method first with an AR($p$) clustering example and show how the clustering algorithm can be made robust to outliers using a least-a… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: 24 pages, 8 figures

  11. arXiv:2203.02057  [pdf, other

    stat.ML cs.LG

    Interpretable Latent Variables in Deep State Space Models

    Authors: Haoxuan Wu, David S. Matteson, Martin T. Wells

    Abstract: We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that ar… ▽ More

    Submitted 19 May, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  12. arXiv:2203.01912  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Bayesian Spillover Graphs for Dynamic Networks

    Authors: Grace Deng, David S. Matteson

    Abstract: We present Bayesian Spillover Graphs (BSG), a novel method for learning temporal relationships, identifying critical nodes, and quantifying uncertainty for multi-horizon spillover effects in a dynamic system. BSG leverages both an interpretable framework via forecast error variance decompositions (FEVD) and comprehensive uncertainty quantification via Bayesian time series models to contextualize t… ▽ More

    Submitted 16 June, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  13. arXiv:2201.06606  [pdf, other

    stat.ME

    Drift vs Shift: Decoupling Trends and Changepoint Analysis

    Authors: Haoxuan Wu, Toryn L. J. Schafer, Sean Ryan, David S. Matteson

    Abstract: We introduce a new approach for decoupling trends (drift) and changepoints (shifts) in time series. Our locally adaptive model-based approach for robustly decoupling combines Bayesian trend filtering and machine learning based regularization. An over-parameterized Bayesian dynamic linear model (DLM) is first applied to characterize drift. Then a weighted penalized likelihood estimator is paired wi… ▽ More

    Submitted 6 January, 2024; v1 submitted 17 January, 2022; originally announced January 2022.

  14. arXiv:2112.12791  [pdf, other

    q-bio.OT

    Analysis of animal-related electric outages using species distribution models and community science data

    Authors: Mei-Ling E. Feng, Olukunle O. Owolabi, Toryn L. J. Schafer, Sanhita Sengupta, Lan Wang, David S. Matteson, Judy P. Che-Castaldo, Deborah A. Sunter

    Abstract: Animal-related outages (AROs) are a prevalent form of outages in electrical distribution systems. Animal-infrastructure interactions vary across focal species and regions, underlining the need to study the animal-outage relationship in more species and diverse systems. Animal activity has been used as an indicator of reliability in the electrical grid system and to describe temporal patterns in AR… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

  15. arXiv:2112.11338  [pdf, other

    q-fin.GN stat.AP

    Role of Variable Renewable Energy Penetration on Electricity Price and its Volatility Across Independent System Operators in the United States

    Authors: Olukunle O. Owolabi, Toryn L. J. Schafer, Georgia E. Smits, Sanhita Sengupta, Sean E. Ryan, Lan Wang, David S. Matteson, Mila Getmansky Sherman, Deborah A. Sunter

    Abstract: The U.S. electrical grid has undergone substantial transformation with increased penetration of wind and solar -- forms of variable renewable energy (VRE). Despite the benefits of VRE for decarbonization, it has garnered some controversy for inducing unwanted effects in regional electricity markets. In this study, the role of VRE penetration is examined on the system electricity price and price vo… ▽ More

    Submitted 28 November, 2022; v1 submitted 10 November, 2021; originally announced December 2021.

  16. arXiv:2111.15670  [pdf, other

    stat.AP stat.CO

    Log-Gaussian Cox Process Modeling of Large Spatial Lightning Data using Spectral and Laplace Approximations

    Authors: Megan L. Gelsinger, Maryclare Griffin, David S. Matteson, Joseph Guinness

    Abstract: Lightning is a destructive and highly visible product of severe storms, yet there is still much to be learned about the conditions under which lightning is most likely to occur. The GOES-16 and GOES-17 satellites, launched in 2016 and 2018 by NOAA and NASA, collect a wealth of data regarding individual lightning strike occurrence and potentially related atmospheric variables. The acute nature and… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

  17. arXiv:2111.11381  [pdf, ps, other

    stat.ME

    Spatial Correlation in Weather Forecast Accuracy: A Functional Time Series Approach

    Authors: Phillip A. Jang, David S. Matteson

    Abstract: A functional time series approach is proposed for investigating spatial correlation in daily maximum temperature forecast errors for 111 cities spread across the U.S. The modelling of spatial correlation is most fruitful for longer forecast horizons, and becomes less relevant as the forecast horizon shrinks towards zero. For 6-day-ahead forecasts, the functional approach uncovers interpretable reg… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  18. arXiv:2110.07460  [pdf, other

    stat.ML cs.LG

    IB-GAN: A Unified Approach for Multivariate Time Series Classification under Class Imbalance

    Authors: Grace Deng, Cuize Han, Tommaso Dreossi, Clarence Lee, David S. Matteson

    Abstract: Classification of large multivariate time series with strong class imbalance is an important task in real-world applications. Standard methods of class weights, oversampling, or parametric data augmentation do not always yield significant improvements for predicting minority classes of interest. Non-parametric data augmentation with Generative Adversarial Networks (GANs) offers a promising solutio… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  19. arXiv:2110.04852  [pdf, other

    stat.ME math.ST

    Mixture representations and Bayesian nonparametric inference for likelihood ratio ordered distributions

    Authors: Michael Jauch, Andrés F. Barrientos, Víctor Peña, David S. Matteson

    Abstract: In this article, we introduce mixture representations for likelihood ratio ordered distributions. Essentially, the ratio of two probability densities, or mass functions, is monotone if and only if one can be expressed as a mixture of one-sided truncations of the other. To illustrate the practical value of the mixture representations, we address the problem of density estimation for likelihood rati… ▽ More

    Submitted 26 October, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

  20. arXiv:2107.14754  [pdf, other

    stat.ME

    A Survey of Estimation Methods for Sparse High-dimensional Time Series Models

    Authors: Sumanta Basu, David S. Matteson

    Abstract: High-dimensional time series datasets are becoming increasingly common in many areas of biological and social sciences. Some important applications include gene regulatory network reconstruction using time course gene expression data, brain connectivity analysis from neuroimaging data, structural analysis of a large panel of macroeconomic indicators, and studying linkages among financial firms for… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

  21. arXiv:2107.10572  [pdf, other

    stat.ME stat.AP stat.CO

    Graphical Influence Diagnostics for Changepoint Models

    Authors: Ines Wilms, Rebecca Killick, David S. Matteson

    Abstract: Changepoint models enjoy a wide appeal in a variety of disciplines to model the heterogeneity of ordered data. Graphical influence diagnostics to characterize the influence of single observations on changepoint models are, however, lacking. We address this gap by developing a framework for investigating instabilities in changepoint segmentations and assessing the influence of single observations o… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

  22. arXiv:2105.08512  [pdf, other

    q-bio.QM stat.AP

    Classifying Contaminated Cell Cultures using Time Series Features

    Authors: Laura L. Tupper, Charles R. Keese, David S. Matteson

    Abstract: We examine the use of time series data, derived from Electric Cell-substrate Impedance Sensing (ECIS), to differentiate between standard mammalian cell cultures and those infected with a mycoplasma organism. With the goal of interpretable results, we perform low-dimensional feature-based classification, extracting application-relevant features from the ECIS time courses. We can achieve very high c… ▽ More

    Submitted 22 February, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 30 pages, 7 figures

  23. Testing Simultaneous Diagonalizability

    Authors: Yuchen Xu, Marie-Christine Düker, David S. Matteson

    Abstract: This paper proposes novel methods to test for simultaneous diagonalization of possibly asymmetric matrices. Motivated by various applications, a two-sample test as well as a generalization for multiple matrices are proposed. A partial version of the test is also studied to check whether a partial set of eigenvectors is shared across samples. Additionally, a novel algorithm for the considered testi… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Comments: 35 pages, 7 figures

    Journal ref: Journal of the American Statistical Association 119 (2024) 1513-1525

  24. arXiv:2101.07771  [pdf, other

    stat.AP

    Critical Risk Indicators (CRIs) for the electric power grid: A survey and discussion of interconnected effects

    Authors: Judy P. Che-Castaldo, Rémi Cousin, Stefani Daryanto, Grace Deng, Mei-Ling E. Feng, Rajesh K. Gupta, Dezhi Hong, Ryan M. McGranaghan, Olukunle O. Owolabi, Tianyi Qu, Wei Ren, Toryn L. J. Schafer, Ashutosh Sharma, Chaopeng Shen, Mila Getmansky Sherman, Deborah A. Sunter, Lan Wang, David S. Matteson

    Abstract: The electric power grid is a critical societal resource connecting multiple infrastructural domains such as agriculture, transportation, and manufacturing. The electrical grid as an infrastructure is shaped by human activity and public policy in terms of demand and supply requirements. Further, the grid is subject to changes and stresses due to solar weather, climate, hydrology, and ecology. The e… ▽ More

    Submitted 9 June, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

  25. arXiv:2101.07770  [pdf

    cond-mat.mtrl-sci eess.IV

    Developing and Evaluating Deep Neural Network-based Denoising for Nanoparticle TEM Images with Ultra-low Signal-to-Noise

    Authors: Joshua L. Vincent, Ramon Manzorro, Sreyas Mohan, Binh Tang, Dev Y. Sheth, Eero P. Simoncelli, David S. Matteson, Carlos Fernandez-Granda, Peter A. Crozier

    Abstract: A deep convolutional neural network has been developed to denoise atomic-resolution TEM image datasets of nanoparticles acquired using direct electron counting detectors, for applications where the image signal is severely limited by shot noise. The network was applied to a model system of CeO2-supported Pt nanoparticles. We leverage multislice image simulations to generate a large and flexible da… ▽ More

    Submitted 17 March, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

    Journal ref: Microscopy and Microanalysis, vol 27, no 6, pp 1431--1447, Dec 2021

  26. arXiv:2101.07408  [pdf, other

    q-bio.QM stat.AP

    Clustering Future Scenarios Based on Predicted Range Maps

    Authors: Matthew Davidow, Cory Merow, Judy Che-Castaldo, Toryn Schafer, Marie-Christine Duker, Derek Corcoran, David Matteson

    Abstract: Predictions of biodiversity trajectories under climate change are crucial in order to act effectively in maintaining the diversity of species. In many ecological applications, future predictions are made under various global warming scenarios as described by a range of different climate models. The outputs of these various predictions call for a reliable interpretation. We propose a interpretable… ▽ More

    Submitted 17 July, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: 26 pages, 10 figures

  27. arXiv:2101.04809  [pdf, other

    stat.ME

    Group Linear non-Gaussian Component Analysis with Applications to Neuroimaging

    Authors: Yuxuan Zhao, David S. Matteson, Mary Beth Nebel, Stewart H. Mostofsky, Benjamin Risk

    Abstract: Independent component analysis (ICA) is an unsupervised learning method popular in functional magnetic resonance imaging (fMRI). Group ICA has been used to search for biomarkers in neurological disorders including autism spectrum disorder and dementia. However, current methods use a principal component analysis (PCA) step that may remove low-variance features. Linear non-Gaussian component analysi… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  28. arXiv:2101.02330  [pdf, other

    stat.ML cs.LG

    Copula Quadrant Similarity for Anomaly Scores

    Authors: Matthew Davidow, David Matteson

    Abstract: Practical anomaly detection requires applying numerous approaches due to the inherent difficulty of unsupervised learning. Direct comparison between complex or opaque anomaly detection algorithms is intractable; we instead propose a framework for associating the scores of multiple methods. Our aim is to answer the question: how should one measure the similarity between anomaly scores generated by… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: 17 pages, 11 figures

  29. arXiv:2012.10030  [pdf, other

    stat.ME

    Regularized Estimation in High-Dimensional Vector Auto-Regressive Models using Spatio-Temporal Information

    Authors: Zhenzhong Wang, Abolfazl Safikhani, Zhengyuan Zhu, David S. Matteson

    Abstract: A Vector Auto-Regressive (VAR) model is commonly used to model multivariate time series, and there are many penalized methods to handle high dimensionality. However in terms of spatio-temporal data, most methods do not take the spatial and temporal structure of the data into consideration, which may lead to unreliable network detection and inaccurate forecasts. This paper proposes a data-driven we… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  30. arXiv:2011.09437  [pdf, other

    stat.ME

    Trend and Variance Adaptive Bayesian Changepoint Analysis & Local Outlier Scoring

    Authors: Haoxuan Wu, Toryn L. J. Schafer, David S. Matteson

    Abstract: We adaptively estimate both changepoints and local outlier processes in a Bayesian dynamic linear model with global-local shrinkage priors in a novel model we call Adaptive Bayesian Changepoints with Outliers (ABCO). We utilize a state-space approach to identify a dynamic signal in the presence of outliers and measurement error with stochastic volatility. We find that global state equation paramet… ▽ More

    Submitted 13 March, 2024; v1 submitted 18 November, 2020; originally announced November 2020.

  31. arXiv:2011.04168  [pdf, other

    stat.ME

    Likelihood Inference for Possibly Non-Stationary Processes via Adaptive Overdifferencing

    Authors: Maryclare Griffin, Gennady Samorodnitsky, David S. Matteson

    Abstract: We make an observation that facilitates exact likelihood-based inference for the parameters of the popular ARFIMA model without requiring stationarity by allowing the upper bound $\bar{d}$ for the memory parameter $d$ to exceed $0.5$: estimating the parameters of a single non-stationary ARFIMA model is equivalent to estimating the parameters of a sequence of stationary ARFIMA models. This allows f… ▽ More

    Submitted 9 January, 2025; v1 submitted 8 November, 2020; originally announced November 2020.

  32. arXiv:2011.02089  [pdf, other

    stat.ML cs.LG stat.AP

    Extended Missing Data Imputation via GANs for Ranking Applications

    Authors: Grace Deng, Cuize Han, David S. Matteson

    Abstract: We propose Conditional Imputation GAN, an extended missing data imputation method based on Generative Adversarial Networks (GANs). The motivating use case is learning-to-rank, the cornerstone of modern search, recommendation system, and information retrieval applications. Empirical ranking datasets do not always follow standard Gaussian distributions or Missing Completely At Random (MCAR) mechanis… ▽ More

    Submitted 10 November, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

  33. arXiv:2010.12970  [pdf, other

    cs.CV cs.LG eess.IV

    Deep Denoising For Scientific Discovery: A Case Study In Electron Microscopy

    Authors: Sreyas Mohan, Ramon Manzorro, Joshua L. Vincent, Binh Tang, Dev Yashpal Sheth, Eero P. Simoncelli, David S. Matteson, Peter A. Crozier, Carlos Fernandez-Granda

    Abstract: Denoising is a fundamental challenge in scientific imaging. Deep convolutional neural networks (CNNs) provide the current state of the art in denoising natural images, where they produce impressive results. However, their potential has barely been explored in the context of scientific imaging. Denoising CNNs are typically trained on real natural images artificially corrupted with simulated noise.… ▽ More

    Submitted 13 July, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: The dataset and the code used to train and evaluate and our models are available at https://sreyas-mohan.github.io/electron-microscopy-denoising/

    Journal ref: IEEE Trans. Computational Imaging, vol.8 pp. 585--597, Jul 2022

  34. arXiv:2007.13012  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci

    PyXtal FF: a Python Library for Automated Force Field Generation

    Authors: Howard Yanxon, David Zagaceta, Binh Tang, David Matteson, Qiang Zhu

    Abstract: We present PyXtal FF, a package based on Python programming language, for developing machine learning potentials (MLPs). The aim of PyXtal FF is to promote the application of atomistic simulations by providing several choices of structural descriptors and machine learning regressions in one platform. Based on the given choice of structural descriptors (including the atom-centered symmetry function… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 13 pages, 4 figures

    Journal ref: Machine Learning: Science and Technology, 2 027001, 2021

  35. arXiv:2007.09417  [pdf, ps, other

    stat.AP q-bio.QM

    Modeling a Nonlinear Biophysical Trend Followed by Long-Memory Equilibrium with Unknown Change Point

    Authors: Wenyu Zhang, Maryclare Griffin, David S. Matteson

    Abstract: Measurements of many biological processes are characterized by an initial trend period followed by an equilibrium period. Scientists may wish to quantify features of the two periods, as well as the timing of the change point. Specifically, we are motivated by problems in the study of electrical cell-substrate impedance sensing (ECIS) data. ECIS is a popular new technology which measures cell behav… ▽ More

    Submitted 19 September, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

  36. arXiv:2007.04813  [pdf, other

    stat.ML cs.LG

    Graph-Based Continual Learning

    Authors: Binh Tang, David S. Matteson

    Abstract: Despite significant advances, continual learning models still suffer from catastrophic forgetting when exposed to incrementally available data from non-stationary distributions. Rehearsal approaches alleviate the problem by maintaining and replaying a small episodic memory of previous samples, often implemented as an array of independent memory slots. In this work, we propose to augment such an ar… ▽ More

    Submitted 28 February, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper at ICLR 2021

  37. arXiv:2005.12129  [pdf, other

    stat.ML cs.LG

    Factor Analysis of Mixed Data for Anomaly Detection

    Authors: Matthew Davidow, David S. Matteson

    Abstract: Anomaly detection aims to identify observations that deviate from the typical pattern of data. Anomalous observations may correspond to financial fraud, health risks, or incorrectly measured data in practice. We show detecting anomalies in high-dimensional mixed data is enhanced through first embedding the data then assessing an anomaly scoring scheme. We focus on unsupervised detection and the co… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

  38. arXiv:1810.06167  [pdf, other

    stat.ME stat.AP stat.ML

    ABACUS: Unsupervised Multivariate Change Detection via Bayesian Source Separation

    Authors: Wenyu Zhang, Daniel Gilbert, David Matteson

    Abstract: Change detection involves segmenting sequential data such that observations in the same segment share some desired properties. Multivariate change detection continues to be a challenging problem due to the variety of ways change points can be correlated across channels and the potentially poor signal-to-noise ratio on individual channels. In this paper, we are interested in locating additive outli… ▽ More

    Submitted 14 October, 2018; originally announced October 2018.

  39. arXiv:1805.06640  [pdf, other

    math.ST stat.AP stat.CO stat.ME stat.ML

    Testing for Conditional Mean Independence with Covariates through Martingale Difference Divergence

    Authors: Ze Jin, Xiaohan Yan, David S. Matteson

    Abstract: As a crucial problem in statistics is to decide whether additional variables are needed in a regression model. We propose a new multivariate test to investigate the conditional mean independence of Y given X conditioning on some known effect Z, i.e., E(Y|X, Z) = E(Y|Z). Assuming that E(Y|Z) and Z are linearly related, we reformulate an equivalent notion of conditional mean independence through tra… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: 10 pages, 3 figures

  40. arXiv:1805.06639  [pdf, other

    stat.ME math.ST stat.AP stat.CO stat.ML

    Independent Component Analysis via Energy-based and Kernel-based Mutual Dependence Measures

    Authors: Ze Jin, David S. Matteson

    Abstract: We apply both distance-based (Jin and Matteson, 2017) and kernel-based (Pfister et al., 2016) mutual dependence measures to independent component analysis (ICA), and generalize dCovICA (Matteson and Tsay, 2017) to MDMICA, minimizing empirical dependence measures as an objective function in both deflation and parallel manners. Solving this minimization problem, we introduce Latin hypercube sampling… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: 11 pages, 4 figures

  41. arXiv:1712.08837  [pdf, other

    stat.ME math.ST stat.AP stat.CO stat.ML

    Optimization and Testing in Linear Non-Gaussian Component Analysis

    Authors: Ze Jin, Benjamin B. Risk, David S. Matteson

    Abstract: Independent component analysis (ICA) decomposes multivariate data into mutually independent components (ICs). The ICA model is subject to a constraint that at most one of these components is Gaussian, which is required for model identifiability. Linear non-Gaussian component analysis (LNGCA) generalizes the ICA model to a linear latent factor model with any number of both non-Gaussian components (… ▽ More

    Submitted 29 December, 2017; v1 submitted 23 December, 2017; originally announced December 2017.

    Comments: 33 pages, 3 tables, 8 figures

  42. arXiv:1711.03623  [pdf, other

    stat.ML stat.AP

    Interpretable Vector AutoRegressions with Exogenous Time Series

    Authors: Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson

    Abstract: The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. However, since the parameter space grows quadratically with the number of time series, estim… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

  43. arXiv:1710.09821  [pdf, other

    q-bio.QM stat.AP

    Cell Line Classification Using Electric Cell-substrate Impedance Sensing (ECIS)

    Authors: Megan L. Gelsinger, Laura L. Tupper, David S. Matteson

    Abstract: We consider cell line classification using multivariate time series data obtained from electric cell-substrate impedance sensing (ECIS) technology. The ECIS device, which monitors the attachment and spreading of mammalian cells in real time through the collection of electrical impedance data, has historically been used to study one cell line at a time. However, we show that if applied to data from… ▽ More

    Submitted 20 November, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

    Comments: 40 pages, 10 figures, 8 tables

  44. arXiv:1709.06421  [pdf, other

    stat.ME

    Pruning and Nonparametric Multiple Change Point Detection

    Authors: Wenyu Zhang, Nicholas James, David Matteson

    Abstract: Change point analysis is a statistical tool to identify homogeneity within time series data. We propose a pruning approach for approximate nonparametric estimation of multiple change points. This general purpose change point detection procedure `cp3o' applies a pruning routine within a dynamic program to greatly reduce the search space and computational costs. Existing goodness-of-fit change point… ▽ More

    Submitted 16 September, 2017; originally announced September 2017.

    Comments: 9 pages. arXiv admin note: text overlap with arXiv:1505.04302

  45. arXiv:1709.02532  [pdf, other

    math.ST stat.AP stat.CO stat.ME stat.ML

    Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence

    Authors: Ze Jin, David S. Matteson

    Abstract: We propose three measures of mutual dependence between multiple random vectors. All the measures are zero if and only if the random vectors are mutually independent. The first measure generalizes distance covariance from pairwise dependence to mutual dependence, while the other two measures are sums of squared distance covariance. All the measures share similar properties and asymptotic distributi… ▽ More

    Submitted 25 February, 2018; v1 submitted 8 September, 2017; originally announced September 2017.

    Comments: 34 pages, 10 tables, 1 figure

  46. arXiv:1707.09208  [pdf, other

    stat.ME

    Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages

    Authors: Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson

    Abstract: The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the theory of multivariate time series; however, identifiability issues have led practitioners to abandon it in favor of the simpler but more restrictive Vector AutoRegressive (VAR) model. We narrow this gap with a new optimization-based approach to VARMA identification built upon the principle of parsimony. Among all equival… ▽ More

    Submitted 8 June, 2021; v1 submitted 28 July, 2017; originally announced July 2017.

  47. Dynamic Shrinkage Processes

    Authors: Daniel R. Kowal, David S. Matteson, David Ruppert

    Abstract: We propose a novel class of dynamic shrinkage processes for Bayesian time series and regression analysis. Building upon a global-local framework of prior construction, in which continuous scale mixtures of Gaussian distributions are employed for both desirable shrinkage properties and computational tractability, we model dependence among the local scale parameters. The resulting processes inherit… ▽ More

    Submitted 23 February, 2018; v1 submitted 3 July, 2017; originally announced July 2017.

  48. arXiv:1702.07094  [pdf, other

    stat.CO

    BigVAR: Tools for Modeling Sparse High-Dimensional Multivariate Time Series

    Authors: William Nicholson, David Matteson, Jacob Bien

    Abstract: The R package BigVAR allows for the simultaneous estimation of high-dimensional time series by applying structured penalties to the conventional vector autoregression (VAR) and vector autoregression with exogenous variables (VARX) frameworks. Our methods can be utilized in many forecasting applications that make use of time-dependent data such as macroeconomics, finance, and internet traffic. Our… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

  49. arXiv:1611.04026  [pdf, other

    stat.AP

    Mixed Data and Classification of Transit Stops

    Authors: Laura L. Tupper, David S. Matteson, John C. Handley

    Abstract: An analysis of the characteristics and behavior of individual bus stops can reveal clusters of similar stops, which can be of use in making routing and scheduling decisions, as well as determining what facilities to provide at each stop. This paper provides an exploratory analysis, including several possible clustering results, of a dataset provided by the Regional Transit Service of Rochester, NY… ▽ More

    Submitted 12 November, 2016; originally announced November 2016.

  50. Functional Autoregression for Sparsely Sampled Data

    Authors: Daniel R. Kowal, David S. Matteson, David Ruppert

    Abstract: We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with non-negligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose… ▽ More

    Submitted 19 October, 2016; v1 submitted 9 March, 2016; originally announced March 2016.