Computation
See recent articles
Showing new listings for Friday, 23 May 2025
- [1] arXiv:2505.15930 [pdf, html, other]
-
Title: Simulating random variates from the Pearson IV and betaized Meixner-Morris distributionsSubjects: Computation (stat.CO)
We develop uniformly fast random variate generators for the Pearson IV distribution that can be used over the entire range of both shape parameters. Additionally, we derive an efficient algorithm for sampling from the betaized Meixner-Morris density, which is proportional to the product of two generalized hyperbolic secant densities.
- [2] arXiv:2505.16788 [pdf, html, other]
-
Title: Interpretable contour level selection for heat maps for gridded dataSubjects: Computation (stat.CO); Applications (stat.AP)
Gridded data formats, where the observed multivariate data are aggregated into grid cells, ensure confidentiality and reduce storage requirements, with the trade-off that access to the underlying point data is lost. Heat maps are a highly pertinent visualisation for gridded data, and heat maps with a small number of well-selected contour levels offer improved interpretability over continuous contour levels. There are many possible contour level choices. Amongst them, density contour levels are highly suitable in many cases, and their probabilistic interpretation form a rigorous statistical basis for further quantitative data analyses. Current methods for computing density contour levels requires access to the observed point data, so they are not applicable to gridded data. To remedy this, we introduce an approximation of density contour levels for gridded data. We then compare our proposed method to existing contour level selection methods, and conclude that our proposal provides improved interpretability for synthetic and experimental gridded data.
New submissions (showing 2 of 2 entries)
- [3] arXiv:2505.16878 (cross-list from stat.ME) [pdf, html, other]
-
Title: A monotonic MM-type algorithm for estimation of nonparametric finite mixture models with dependent marginalsComments: 24 pages, 5 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
In this manuscript, we consider a finite nonparametric mixture model with non-independent marginal density functions. Dependence between the marginal densities is modeled using a copula device. Until recently, no deterministic algorithms capable of estimating components of such a model have been available. A deterministic algorithm that is capable of this has been proposed in \citet*{levine2024smoothed}. That algorithm seeks to maximize a smoothed nonparametric penalized log-likelihood; it seems to perform well in practice but does not possess the monotonicity property. In this manuscript, we introduce a deterministic MM (Minorization-Maximization) algorithm for estimation of components of this model that is also maximizing a smoothed penalized nonparametric log-likelihood but that is monotonic with respect to this objective functional. Besides the convergence of the objective functional, the convergence of a subsequence of arguments of this functional, generated by this algorithm, is also established. The behavior of this algorithm is illustrated using both simulated datasets as well as a real dataset. The results illustrate performance that is at least comparable to the earlier algorithm of \citet*{levine2024smoothed}. A discussion of the results and possible future research directions make up the last part of the manuscript.
Cross submissions (showing 1 of 1 entries)
- [4] arXiv:2411.09225 (replaced) [pdf, html, other]
-
Title: fdesigns: Bayesian Optimal Designs of Experiments for Functional Models in RSubjects: Computation (stat.CO); Methodology (stat.ME)
This paper describes the R package fdesigns that implements a methodology for identifying Bayesian optimal experimental designs for models whose factor settings are functions, known as profile factors. This type of experiments which involve factors that vary dynamically over time, presenting unique challenges in both estimation and design due to the infinite-dimensional nature of functions. The package fdesigns implements a dimension reduction method leveraging basis functions of the B-spline basis system. The package fdesigns contains functions that effectively reduce the design problem to the optimisation of basis coefficients for functional linear functional generalised linear models, and it accommodates various options. Applications of the fdesigns package are demonstrated through a series of examples that showcase its capabilities in identifying optimal designs for functional linear and generalised linear models. The examples highlight how the package's functions can be used to efficiently design experiments involving both profile and scalar factors, including interactions and polynomial effects.
- [5] arXiv:2209.03318 (replaced) [pdf, html, other]
-
Title: On the Wasserstein median of probability measuresComments: 40 pages, 16 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
The primary choice to summarize a finite collection of random objects is by using measures of central tendency, such as mean and median. In the field of optimal transport, the Wasserstein barycenter corresponds to the Fréchet or geometric mean of a set of probability measures, which is defined as a minimizer of the sum of squared distances to each element in a given set with respect to the Wasserstein distance of order 2. We introduce the Wasserstein median as a robust alternative to the Wasserstein barycenter. The Wasserstein median corresponds to the Fréchet median under the 2-Wasserstein metric. The existence and consistency of the Wasserstein median are first established, along with its robustness property. In addition, we present a general computational pipeline that employs any recognized algorithms for the Wasserstein barycenter in an iterative fashion and demonstrate its convergence. The utility of the Wasserstein median as a robust measure of central tendency is demonstrated using real and simulated data.
- [6] arXiv:2311.10001 (replaced) [pdf, html, other]
-
Title: Fast return-level estimates for flood insurance via an improved Bennett inequality for random variables with differing upper boundsComments: To appear in The Annals of Applied StatisticsSubjects: Applications (stat.AP); Computation (stat.CO)
Insurance losses due to flooding can be estimated by simulating and then summing losses over a large number of locations and a large set of hypothetical years of flood events. Replicated realisations lead to Monte Carlo return-level estimates and associated uncertainty. The procedure, however, is highly computationally intensive. We develop and use a new, Bennett-like concentration inequality to provide conservative but relatively accurate estimates of return levels. Bennett's inequality accounts for the different variances of each of the variables in a sum but uses a uniform upper bound on their support. Motivated by the variability in the total insured value of risks within a portfolio, we incorporate both individual upper bounds and variances and obtain tractable concentration bounds. Simulation studies and application to a representative portfolio demonstrate a substantial tightening compared with Bennett's bound. We then develop an importance-sampling procedure that repeatedly samples annual losses from the distributions implied by each year's concentration inequality, leading to conservative estimates of the return levels and their uncertainty using orders of magnitude less computation. This enables a simulation study of the sensitivity of the predictions to perturbations in quantities that are usually assumed fixed and known but, in truth, are not.
- [7] arXiv:2406.15573 (replaced) [pdf, html, other]
-
Title: Sparse Bayesian multidimensional scaling(s)Subjects: Methodology (stat.ME); Computation (stat.CO)
Bayesian multidimensional scaling (BMDS) is a probabilistic dimension reduction tool that allows one to model and visualize data consisting of dissimilarities between pairs of objects. Although BMDS has proven useful within, e.g., Bayesian phylogenetic inference, its likelihood and gradient calculations require a burdensome order of $N^2$ floating-point operations, where $N$ is the number of data points. Thus, BMDS becomes impractical as $N$ grows large. We propose and compare two sparse versions of BMDS (sBMDS) that apply log-likelihood and gradient computations to subsets of the observed dissimilarity matrix data. Landmark sBMDS (L-sBMDS) extracts columns, while banded sBMDS (B-sBMDS) extracts diagonals of the data. These sparse variants let one specify a time complexity between $N^2$ and $N$. Under simplified settings, we prove posterior consistency for subsampled distance matrices. Through simulations, we examine the accuracy and computational efficiency across all models using both the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms. We observe approximately 3-fold, 10-fold and 40-fold speedups with negligible loss of accuracy, when applying the sBMDS likelihoods and gradients to 500, 1,000 and 5,000 data points with 50 bands (landmarks); these speedups only increase with the size of data considered. Finally, we apply the sBMDS variants to: 1) the phylogeographic modeling of multiple influenza subtypes to better understand how these strains spread through global air transportation networks and 2) the clustering of ArXiv manuscripts based on low-dimensional representations of article abstracts. In the first application, sBMDS contributes to holistic uncertainty quantification within a larger Bayesian hierarchical model. In the second, sBMDS provides uncertainty quantification for a downstream modeling task.
- [8] arXiv:2410.19236 (replaced) [pdf, html, other]
-
Title: SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future QueriesSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Genomics (q-bio.GN); Computation (stat.CO)
The growing adoption of machine learning models for biological sequences has intensified the need for interpretable predictions, with Shapley values emerging as a theoretically grounded standard for model explanation. While effective for local explanations of individual input sequences, scaling Shapley-based interpretability to extract global biological insights requires evaluating thousands of sequences--incurring exponential computational cost per query. We introduce SHAP zero, a novel algorithm that amortizes the cost of Shapley value computation across large-scale biological datasets. After a one-time model sketching step, SHAP zero enables near-zero marginal cost for future queries by uncovering an underexplored connection between Shapley values, high-order feature interactions, and the sparse Fourier transform of the model. Applied to models of guide RNA efficacy, DNA repair outcomes, and protein fitness, SHAP zero explains predictions orders of magnitude faster than existing methods, recovering rich combinatorial interactions previously inaccessible at scale. This work opens the door to principled, efficient, and scalable interpretability for black-box sequence models in biology.
- [9] arXiv:2502.07993 (replaced) [pdf, other]
-
Title: What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation?Subjects: Numerical Analysis (math.NA); Computational Complexity (cs.CC); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
Randomized sketching accelerates large-scale numerical linear algebra by reducing computational complexity. While the traditional sketch-and-solve approach reduces the problem size directly through sketching, the sketch-and-precondition method leverages sketching to construct a computational friendly preconditioner. This preconditioner improves the convergence speed of iterative solvers applied to the original problem, maintaining accuracy in the full space. Furthermore, the convergence rate of the solver improves at least linearly with the sketch size. Despite its potential, developing a sketch-and-precondition framework for randomized algorithms in low-rank matrix approximation remains an open challenge. We introduce the Error-Powered Sketched Inverse Iteration (EPSI) Method via run sketched Newton iteration for the Lagrange form as a sketch-and-precondition variant for randomized low-rank approximation. Our method achieves theoretical guarantees, including a convergence rate that improves at least linearly with the sketch size.