Search | arXiv e-print repository

Bayesian Graph Traversal

Authors: William N. Caballero, Phillip R. Jenkins, David Banks, Matthew Robbins

Abstract: This research considers Bayesian decision-analytic approaches toward the traversal of an uncertain graph. Namely, a traveler progresses over a graph in which rewards are gained upon a node's first visit and costs are incurred for every edge traversal. The traveler knows the graph's adjacency matrix and his starting position but does not know the rewards and costs. The traveler is a Bayesian who en… ▽ More This research considers Bayesian decision-analytic approaches toward the traversal of an uncertain graph. Namely, a traveler progresses over a graph in which rewards are gained upon a node's first visit and costs are incurred for every edge traversal. The traveler knows the graph's adjacency matrix and his starting position but does not know the rewards and costs. The traveler is a Bayesian who encodes his beliefs about these values using a Gaussian process prior and who seeks to maximize his expected utility over these beliefs. Adopting a decision-analytic perspective, we develop sequential decision-making solution strategies for this coupled information-collection and network-routing problem. We show that the problem is NP-Hard and derive properties of the optimal walk. These properties provide heuristics for the traveler's problem that balance exploration and exploitation. We provide a practical case study focused on the use of unmanned aerial systems for public safety and empirically study policy performance in myriad Erdos-Renyi settings. △ Less

Submitted 7 March, 2025; originally announced March 2025.

Comments: 26 pages, 7 tables, 2 figures

MSC Class: 62C99; 68T20

arXiv:2502.12279 [pdf, other]

Bayesian inference from time series of allele frequency data using exact simulation techniques

Authors: Jaromir Sant, Paul A. Jenkins, Jere Koskela, Dario Spano

Abstract: A central statistical problem in population genetics is to infer evolutionary and biological parameters such as the strength of natural selection and allele age from DNA samples extracted from a contemporary population. That all samples come only from the present-day has long been known to limit statistical inference; there is potentially more information available if one also has access to ancien… ▽ More A central statistical problem in population genetics is to infer evolutionary and biological parameters such as the strength of natural selection and allele age from DNA samples extracted from a contemporary population. That all samples come only from the present-day has long been known to limit statistical inference; there is potentially more information available if one also has access to ancient DNA so that inference is based on a time-series of historical changes in allele frequencies. We introduce a Markov Chain Monte Carlo (MCMC) method for Bayesian inference from allele frequency time-series data based on an underlying Wright--Fisher diffusion model of evolution, through which one can infer the parameters of essentially any selection model including those with frequency-dependent effects. The chief novelty is that we show this method to be exact in the sense that it is possible to augment the state space explored by MCMC with the unobserved diffusion trajectory, even though the transition function of this diffusion is intractable. Through careful design of a proposal distribution, we describe an efficient method in which updates to the trajectory and accept/reject decisions are calculated without error. We illustrate the method on data capturing changes in coat colour over the past 20,000 years, and find evidence to support previous findings that the mutant alleles ASIP and MC1R responsible for changes in coat color have experienced very strong, possibly overdominant, selection and further provide estimates for the ages of these genes. △ Less

Submitted 17 February, 2025; originally announced February 2025.

MSC Class: 92D25; 60J70; 65C40; 60J60; 62F15

arXiv:2412.12028 [pdf]

Heterogeneous Freeform Metasurfaces: A Platform for Advanced Broadband Dispersion Engineering

Authors: Zhaoyi Li, Sawyer D. Campbell, Joon-Suh Park, Ronald P. Jenkins, Soon Wei Daniel Lim, Douglas H. Werner, Federico Capasso

Abstract: Metasurfaces, with their ability to control electromagnetic waves, hold immense potential in optical device design, especially for applications requiring precise control over dispersion. This work introduces an approach to dispersion engineering using heterogeneous freeform metasurfaces, which overcomes the limitations of conventional metasurfaces that often suffer from poor transmission, narrow b… ▽ More Metasurfaces, with their ability to control electromagnetic waves, hold immense potential in optical device design, especially for applications requiring precise control over dispersion. This work introduces an approach to dispersion engineering using heterogeneous freeform metasurfaces, which overcomes the limitations of conventional metasurfaces that often suffer from poor transmission, narrow bandwidth, and restricted polarization responses. By transitioning from single-layer, canonical meta-atoms to bilayer architectures with non-intuitive geometries, our design decouples intrinsic material properties (refractive index and group index), enabling independent engineering of phase and group delays as well as higher-order dispersion properties, while achieving high-efficiency under arbitrary polarization states. We implement a two-stage multi-objective optimization process to generate libraries of meta-atoms, which are then utilized for the rapid design of dispersion-engineered metasurfaces. Additionally, we present a bilayer metasurface stacking technique, paving the way for the realization of high-performance, dispersion-engineered optical devices. Our approach is validated through the demonstration of metasurfaces exhibiting superior chromatic aberration correction and broadband performance, with over 81% averaged efficiency across the 420-nm visible-to-near-infrared bandwidth. Our synergistic combination of advanced design physics, powerful freeform optimization methods, and bi-layer nanofabrication techniques represents a significant breakthrough compared to the state-of-the-art while opening new possibilities for broadband metasurface applications. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2410.15955 [pdf, ps, other]

The mutual arrangement of Wright-Fisher diffusion path measures and its impact on parameter estimation

Authors: Paul A. Jenkins

Abstract: The Wright-Fisher diffusion is a fundamentally important model of evolution encompassing genetic drift, mutation, and natural selection. Suppose you want to infer the parameters associated with these processes from an observed sample path. Then to write down the likelihood one first needs to know the mutual arrangement of two path measures under different parametrizations; that is, whether they ar… ▽ More The Wright-Fisher diffusion is a fundamentally important model of evolution encompassing genetic drift, mutation, and natural selection. Suppose you want to infer the parameters associated with these processes from an observed sample path. Then to write down the likelihood one first needs to know the mutual arrangement of two path measures under different parametrizations; that is, whether they are absolutely continuous, equivalent, singular, and so on. In this paper we give a complete answer to this question by finding the separating times for the diffusion - the stopping time before which one measure is absolutely continuous with respect to the other and after which the pair is mutually singular. In one dimension this extends a classical result of Dawson on the local equivalence between neutral and non-neutral Wright-Fisher diffusion measures. Along the way we also develop new zero-one type laws for the diffusion on its approach to, and emergence from, the boundary. As an application we derive an explicit expression for the joint maximum likelihood estimator of the mutation and selection parameters and show that its convergence properties are closely related to the separating time. △ Less

Submitted 21 October, 2024; originally announced October 2024.

MSC Class: 60J60 (Primary) 92D10; 60H30; 62M05 (Secondary)

arXiv:2407.03453 [pdf, other]

On Large Language Models in National Security Applications

Authors: William N. Caballero, Phillip R. Jenkins

Abstract: The overwhelming success of GPT-4 in early 2023 highlighted the transformative potential of large language models (LLMs) across various sectors, including national security. This article explores the implications of LLM integration within national security contexts, analyzing their potential to revolutionize information processing, decision-making, and operational efficiency. Whereas LLMs offer su… ▽ More The overwhelming success of GPT-4 in early 2023 highlighted the transformative potential of large language models (LLMs) across various sectors, including national security. This article explores the implications of LLM integration within national security contexts, analyzing their potential to revolutionize information processing, decision-making, and operational efficiency. Whereas LLMs offer substantial benefits, such as automating tasks and enhancing data analysis, they also pose significant risks, including hallucinations, data privacy concerns, and vulnerability to adversarial attacks. Through their coupling with decision-theoretic principles and Bayesian reasoning, LLMs can significantly improve decision-making processes within national security organizations. Namely, LLMs can facilitate the transition from data to actionable decisions, enabling decision-makers to quickly receive and distill available information with less manpower. Current applications within the US Department of Defense and beyond are explored, e.g., the USAF's use of LLMs for wargaming and automatic summarization, that illustrate their potential to streamline operations and support decision-making. However, these applications necessitate rigorous safeguards to ensure accuracy and reliability. The broader implications of LLM integration extend to strategic planning, international relations, and the broader geopolitical landscape, with adversarial nations leveraging LLMs for disinformation and cyber operations, emphasizing the need for robust countermeasures. Despite exhibiting "sparks" of artificial general intelligence, LLMs are best suited for supporting roles rather than leading strategic decisions. Their use in training and wargaming can provide valuable insights and personalized learning experiences for military personnel, thereby improving operational readiness. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 20 pages

MSC Class: 62P99

arXiv:2406.16465 [pdf, ps, other]

Genealogical processes of sequential Monte Carlo methods and other non-neutral population models under rapid mutation

Authors: Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano

Abstract: We show that genealogical trees arising from a broad class of non-neutral models of population evolution converge to the Kingman coalescent under a suitable rescaling of time. As well as non-neutral biological evolution, our results apply to genetic algorithms encompassing the prominent class of sequential Monte Carlo (SMC) methods. The time rescaling we need differs slightly from that used in cla… ▽ More We show that genealogical trees arising from a broad class of non-neutral models of population evolution converge to the Kingman coalescent under a suitable rescaling of time. As well as non-neutral biological evolution, our results apply to genetic algorithms encompassing the prominent class of sequential Monte Carlo (SMC) methods. The time rescaling we need differs slightly from that used in classical results for convergence to the Kingman coalescent, which has implications for the performance of different resampling schemes in SMC algorithms. In addition, our work substantially simplifies earlier proofs of convergence to the Kingman coalescent, and corrects an error common to several earlier results. △ Less

Submitted 8 April, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

MSC Class: 60J90; 65C35; 92D15

arXiv:2406.09262 [pdf, other]

Fully Heteroscedastic Count Regression with Deep Double Poisson Networks

Authors: Spencer Young, Porter Jenkins, Longchao Da, Jeff Dotson, Hua Wei

Abstract: Neural networks capable of accurate, input-conditional uncertainty representation are essential for real-world AI systems. Deep ensembles of Gaussian networks have proven highly effective for continuous regression due to their ability to flexibly represent aleatoric uncertainty via unrestricted heteroscedastic variance, which in turn enables accurate epistemic uncertainty estimation. However, no a… ▽ More Neural networks capable of accurate, input-conditional uncertainty representation are essential for real-world AI systems. Deep ensembles of Gaussian networks have proven highly effective for continuous regression due to their ability to flexibly represent aleatoric uncertainty via unrestricted heteroscedastic variance, which in turn enables accurate epistemic uncertainty estimation. However, no analogous approach exists for count regression, despite many important applications. To address this gap, we propose the Deep Double Poisson Network (DDPN), a novel neural discrete count regression model that outputs the parameters of the Double Poisson distribution, enabling arbitrarily high or low predictive aleatoric uncertainty for count data and improving epistemic uncertainty estimation when ensembled. We formalize and prove that DDPN exhibits robust regression properties similar to heteroscedastic Gaussian models via learnable loss attenuation, and introduce a simple loss modification to control this behavior. Experiments on diverse datasets demonstrate that DDPN outperforms current baselines in accuracy, calibration, and out-of-distribution detection, establishing a new state-of-the-art in deep count regression. △ Less

Submitted 28 May, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Forty-Second International Conference on Machine Learning (ICML 2025)

arXiv:2406.07769 [pdf, other]

doi 10.1145/3637528.3671518

Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff Estimation

Authors: Porter Jenkins, Michael Selander, J. Stockton Jenkins, Andrew Merrill, Kyle Armstrong

Abstract: Product assortment selection is a critical challenge facing physical retailers. Effectively aligning inventory with the preferences of shoppers can increase sales and decrease out-of-stocks. However, in real-world settings the problem is challenging due to the combinatorial explosion of product assortment possibilities. Consumer preferences are typically heterogeneous across space and time, making… ▽ More Product assortment selection is a critical challenge facing physical retailers. Effectively aligning inventory with the preferences of shoppers can increase sales and decrease out-of-stocks. However, in real-world settings the problem is challenging due to the combinatorial explosion of product assortment possibilities. Consumer preferences are typically heterogeneous across space and time, making inventory-preference alignment challenging. Additionally, existing strategies rely on syndicated data, which tends to be aggregated, low resolution, and suffer from high latency. To solve these challenges, we introduce a real-time recommendation system, which we call EdgeRec3D. Our system utilizes recent advances in 3D computer vision for perception and automatic, fine grained sales estimation. These perceptual components run on the edge of the network and facilitate real-time reward signals. Additionally, we develop a Bayesian payoff model to account for noisy estimates from 3D LIDAR data. We rely on spatial clustering to allow the system to adapt to heterogeneous consumer preferences, and a graph-based candidate generation algorithm to address the combinatorial search problem. We test our system in real-world stores across two, 6-8 week A/B tests with beverage products and demonstrate a 35% and 27% increase in sales respectively. Finally, we monitor the deployed system for a period of 28 weeks with an observational study and show a 9.4% increase in sales. △ Less

Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to KDD 2024

arXiv:2406.00173 [pdf, ps, other]

The effect of the trace operator on the duality of modular grids in genus zero levels

Authors: Archer Clayton, Paul Jenkins

Abstract: Griffin, the second author, and Molnar studied coefficient duality for canonical bases for a broad range of spaces of weakly holomorphic modular forms, showing that the Fourier coefficients of canonical basis elements appear as negatives of Fourier coefficients for elements of a canonical basis of a related space of forms. We investigate the effect of the trace operator on this duality for modular… ▽ More Griffin, the second author, and Molnar studied coefficient duality for canonical bases for a broad range of spaces of weakly holomorphic modular forms, showing that the Fourier coefficients of canonical basis elements appear as negatives of Fourier coefficients for elements of a canonical basis of a related space of forms. We investigate the effect of the trace operator on this duality for modular forms for $Γ_0(N)$ of genus zero and show exactly when duality still holds after applying the trace operator. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 19 pages

MSC Class: 11F30; 11F37

arXiv:2405.12412 [pdf, other]

Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence

Authors: Spencer Young, Cole Edgren, Riley Sinema, Andrew Hall, Nathan Dong, Porter Jenkins

Abstract: While significant progress has been made in specifying neural networks capable of representing uncertainty, deep networks still often suffer from overconfidence and misaligned predictive distributions. Existing approaches for addressing this misalignment are primarily developed under the framework of calibration, with common metrics such as Expected Calibration Error (ECE). However, calibration ca… ▽ More While significant progress has been made in specifying neural networks capable of representing uncertainty, deep networks still often suffer from overconfidence and misaligned predictive distributions. Existing approaches for addressing this misalignment are primarily developed under the framework of calibration, with common metrics such as Expected Calibration Error (ECE). However, calibration can only provide a strictly marginal assessment of probabilistic alignment. Consequently, calibration metrics such as ECE are distribution-wise measures and cannot diagnose the point-wise reliability of individual inputs, which is important for real-world decision-making. We propose a stronger condition, which we term conditional congruence, for assessing probabilistic fit. We also introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance, at any point, between the learned predictive distribution and the empirical, conditional distribution in a dataset. We show that using CCE to measure congruence 1) accurately quantifies misalignment between distributions when the data generating process is known, 2) effectively scales to real-world, high dimensional image regression tasks, and 3) can be used to gauge model reliability on unseen instances. △ Less

Submitted 14 October, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2312.17406 [pdf, ps, other]

doi 10.1214/25-EJP1302

Sampling probabilities, diffusions, ancestral graphs, and duality under strong selection

Authors: Martina Favero, Paul A. Jenkins

Abstract: Wright-Fisher diffusions and their dual ancestral graphs occupy a central role in the study of allele frequency change and genealogical structure, and they provide expressions, explicit in some special cases but generally implicit, for the sampling probability, a crucial quantity in inference. Under a finite-allele mutation model, with possibly parent-dependent mutation, we consider the asymptotic… ▽ More Wright-Fisher diffusions and their dual ancestral graphs occupy a central role in the study of allele frequency change and genealogical structure, and they provide expressions, explicit in some special cases but generally implicit, for the sampling probability, a crucial quantity in inference. Under a finite-allele mutation model, with possibly parent-dependent mutation, we consider the asymptotic regime where the selective advantage of one allele grows to infinity, while the other parameters remain fixed. In this regime, we show that the Wright-Fisher diffusion can be approximated either by a Gaussian process or by a process whose components are independent continuous-state branching processes with immigration, aligning with analogous results for Wright-Fisher models but employing different methods. While the first process becomes degenerate at stationarity, the latter does not and provides a simple, analytic approximation for the leading term of the sampling probability. Furthermore, using another approach based on a recursion formula, we characterise all remaining terms to provide a full asymptotic expansion for the sampling probability. Finally, we study the asymptotic behaviour of the rates of the block-counting process of the conditional ancestral selection graph and establish an asymptotic duality relationship between this and the diffusion. △ Less

Submitted 13 March, 2025; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 44 pages

MSC Class: Primary 60J90; Secondary 60J60; 92D15

Journal ref: Electronic Journal of Probability 30: 1-43 (2025)

arXiv:2312.11551 [pdf, other]

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Authors: Longchao Da, Porter Jenkins, Trevor Schwantes, Jeffrey Dotson, Hua Wei

Abstract: In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special cases performance (e.g., worst or best cases), due to the lack of holistic characterization of… ▽ More In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special cases performance (e.g., worst or best cases), due to the lack of holistic characterization of policies performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst, best, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 19 pages with 7 pages main paper, 10 pages appendix. Accepted to AAAI 2024 main track

ACM Class: I.2.6

arXiv:2309.16271 [pdf, other]

Excursion theory for the Wright-Fisher diffusion

Authors: Paul A. Jenkins, Jere Koskela, Victor M. Rivero, Jaromir Sant, Dario Spano, Ivana Valentic

Abstract: In this work, we develop excursion theory for the Wright--Fisher diffusion with mutation. Our construction is intermediate between the classical excursion theory where all excursions begin and end at a single point and the more general approach considering excursions of processes from general sets. Since the Wright--Fisher diffusion has two boundary points, it is natural to construct excursions wh… ▽ More In this work, we develop excursion theory for the Wright--Fisher diffusion with mutation. Our construction is intermediate between the classical excursion theory where all excursions begin and end at a single point and the more general approach considering excursions of processes from general sets. Since the Wright--Fisher diffusion has two boundary points, it is natural to construct excursions which start from a specified boundary point, and end at one of two boundary points which determine the next starting point. In order to do this we study the killed Wright--Fisher diffusion, which is sent to a cemetery state whenever it hits either endpoint. We then construct a marked Poisson process of such killed paths which, when concatenated, produce a pathwise construction of the Wright--Fisher diffusion. △ Less

Submitted 4 November, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 29 pages, 3 figures

MSC Class: 60J70; 60J60; 92D25; 60J55

arXiv:2301.05459 [pdf, ps, other]

doi 10.1093/bioinformatics/btad017

EWF : simulating exact paths of the Wright--Fisher diffusion

Authors: Jaromir Sant, Paul A. Jenkins, Jere Koskela, Dario Spanò

Abstract: The Wright--Fisher diffusion is important in population genetics in modelling the evolution of allele frequencies over time subject to the influence of biological phenomena such as selection, mutation, and genetic drift. Simulating paths of the process is challenging due to the form of the transition density. We present EWF, a robust and efficient sampler which returns exact draws for the diffusio… ▽ More The Wright--Fisher diffusion is important in population genetics in modelling the evolution of allele frequencies over time subject to the influence of biological phenomena such as selection, mutation, and genetic drift. Simulating paths of the process is challenging due to the form of the transition density. We present EWF, a robust and efficient sampler which returns exact draws for the diffusion and diffusion bridge processes, accounting for general models of selection including those with frequency-dependence. Given a configuration of selection, mutation, and endpoints, EWF returns draws at the requested sampling times from the law of the corresponding Wright--Fisher process. Output was validated by comparison to approximations of the transition density via the Kolmogorov--Smirnov test and QQ plots. All software is available at https://github.com/JaroSant/EWF △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 34 pages, 12 figures

MSC Class: 92D25; 60J70; 65C99; 60J60

arXiv:2212.07747 [pdf, other]

An estimator for the recombination rate from a continuously observed diffusion of haplotype frequencies

Authors: Robert C. Griffiths, Paul A. Jenkins

Abstract: Recombination is a fundamental evolutionary force, but it is difficult to quantify because the effect of a recombination event on patterns of variation in a sample of genetic data can be hard to discern. Estimators for the recombination rate, which are usually based on the idea of integrating over the unobserved possible evolutionary histories of a sample, can therefore be noisy. Here we consider… ▽ More Recombination is a fundamental evolutionary force, but it is difficult to quantify because the effect of a recombination event on patterns of variation in a sample of genetic data can be hard to discern. Estimators for the recombination rate, which are usually based on the idea of integrating over the unobserved possible evolutionary histories of a sample, can therefore be noisy. Here we consider a related question: how would an estimator behave if the evolutionary history actually was observed? This would offer an upper bound on the performance of estimators used in practice. In this paper we derive an expression for the maximum likelihood estimator for the recombination rate based on a continuously observed, multi-locus, Wright--Fisher diffusion of haplotype frequencies, complementing existing work for an estimator of selection. We show that, contrary to selection, the estimator has unusual properties because the observed information matrix can explode in finite time whereupon the recombination parameter is learned without error. We also show that the recombination estimator is robust to the presence of selection in the sense that incorporating selection into the model leaves the estimator unchanged. We study the properties of the estimator by simulation and show that its distribution can be quite sensitive to the underlying mutation rates. △ Less

Submitted 4 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: 28 pages, 3 figures

MSC Class: 92D15 (Primary) 62M05 (Secondary)

arXiv:2207.14339

Contact tracing Inspired Efficient Computation by Energy Tracing

Authors: Wending Mai, Ronald P. Jenkins, Yifan Chen, Douglas H. Werner

Abstract: Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domai… ▽ More Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domains. As an example, we employ this method to solve several optics problems. The proposed method shows high efficiency while maintaining a good accuracy. The energy tracing method is based on the causality principle, and therefore is potentially transformative into other computational physics and associated algorithms. △ Less

Submitted 8 August, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: This article has been withdrawn due to an unresolvable internal author dispute

arXiv:2110.05356 [pdf, ps, other]

Weak Convergence of Non-neutral Genealogies to Kingman's Coalescent

Authors: Suzie Brown, Paul A. Jenkins, Adam M. Johansen, Jere Koskela

Abstract: Interacting particle systems undergoing repeated mutation and selection steps model genetic evolution, and also describe a broad class of sequential Monte Carlo methods. The genealogical tree embedded into the system is important in both applications. Under neutrality, when fitnesses of particles are independent from those of their parents, rescaled genealogies are known to converge to Kingman's c… ▽ More Interacting particle systems undergoing repeated mutation and selection steps model genetic evolution, and also describe a broad class of sequential Monte Carlo methods. The genealogical tree embedded into the system is important in both applications. Under neutrality, when fitnesses of particles are independent from those of their parents, rescaled genealogies are known to converge to Kingman's coalescent. Recent work has established convergence under non-neutrality, but only for finite-dimensional distributions. We prove weak convergence of non-neutral genealogies on the space of càdlàg paths under standard assumptions, enabling analysis of the whole genealogical tree. △ Less

Submitted 19 April, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 37 pages, 1 figure

MSC Class: 60J90; 65C35; 92D15

arXiv:2106.05820 [pdf, other]

Flexible Bayesian inference for diffusion processes using splines

Authors: Paul A. Jenkins, Murray Pollock, Gareth O. Roberts

Abstract: We introduce a flexible method to simultaneously infer both the drift and volatility functions of a discretely observed scalar diffusion. We introduce spline bases to represent these functions and develop a Markov chain Monte Carlo algorithm to infer, a posteriori, the coefficients of these functions in the spline basis. A key innovation is that we use spline bases to model transformed versions of… ▽ More We introduce a flexible method to simultaneously infer both the drift and volatility functions of a discretely observed scalar diffusion. We introduce spline bases to represent these functions and develop a Markov chain Monte Carlo algorithm to infer, a posteriori, the coefficients of these functions in the spline basis. A key innovation is that we use spline bases to model transformed versions of the drift and volatility functions rather than the functions themselves. The output of the algorithm is a posterior sample of plausible drift and volatility functions that are not constrained to any particular parametric family. The flexibility of this approach provides practitioners a powerful investigative tool, allowing them to posit a variety of parametric models to better capture the underlying dynamics of their processes of interest. We illustrate the versatility of our method by applying it to challenging datasets from finance, paleoclimatology, and astrophysics. In view of the parametric diffusion models widely employed in the literature for those examples, some of our results are surprising since they call into question some aspects of these models. △ Less

Submitted 29 September, 2023; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: 24 pages, 8 figures

MSC Class: 65C05; 60H35; 60J60; 62G08; 65D07

arXiv:2103.13273 [pdf]

doi 10.2139/ssrn.3486496

Co-Creation of Innovative Gamification Based Learning: A Case of Synchronous Partnership

Authors: Nicholas Dacre, Vasilis Gkogkidis, Peter Jenkins

Abstract: In higher education, gamification offers the prospect of providing a pivotal shift from traditional asynchronous forms of engagement, to developing methods to foster greater levels of synchronous interactivity and partnership between and amongst teaching and learning stakeholders. The small vein of research that focuses on gamification in teaching and learning contexts, has mainly focused on the i… ▽ More In higher education, gamification offers the prospect of providing a pivotal shift from traditional asynchronous forms of engagement, to developing methods to foster greater levels of synchronous interactivity and partnership between and amongst teaching and learning stakeholders. The small vein of research that focuses on gamification in teaching and learning contexts, has mainly focused on the implementation of pre-determined game elements. This approach reflects a largely asynchronous approach to the development of learning practices in educational settings, thereby limiting stakeholder engagement in their design and adoption. Therefore, we draw on the theory of co-creation to examine the development process of gamification-based learning as a synchronous partnership between and amongst teaching and learning stakeholders. Empirical insights suggest that students gain a greater sense of partnership and inclusivity as part of a synchronous co-creation gamification-based learning development and implementation process. △ Less

Submitted 18 April, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: Society for Research into Higher Education (SRHE)

arXiv:2012.14403 [pdf, ps, other]

The arithmetic of modular grids

Authors: Michael Griffin, Paul Jenkins, Grant Molnar

Abstract: A modular grid is a pair of sequences $(f_m)_m$ and $(g_n)_n$ of weakly holomorphic modular forms such that for almost all $m$ and $n$, the coefficient of $q^n$ in $f_m$ is the negative of the coefficient of $q^m$ in $g_n$. Zagier proved this coefficient duality in weights $1/2$ and $3/2$ in the Kohnen plus space, and such grids have appeared for Poincaré series, for modular forms of integral weig… ▽ More A modular grid is a pair of sequences $(f_m)_m$ and $(g_n)_n$ of weakly holomorphic modular forms such that for almost all $m$ and $n$, the coefficient of $q^n$ in $f_m$ is the negative of the coefficient of $q^m$ in $g_n$. Zagier proved this coefficient duality in weights $1/2$ and $3/2$ in the Kohnen plus space, and such grids have appeared for Poincaré series, for modular forms of integral weight, and in many other situations. We give a general proof of coefficient duality for canonical row-reduced bases of spaces of weakly holomorphic modular forms of integral or half-integral weight for every group $Γ\subseteq {\text{SL}}_2(\mathbb{R})$ commensurable with ${\text{SL}}_2(\mathbb{Z})$. We construct bivariate generate functions that encode these modular forms, and study linear operations on the resulting modular grids. △ Less

Submitted 12 May, 2022; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: Second revision

MSC Class: 11F30; 11F37

arXiv:2012.10316 [pdf, other]

Diffusion Limits at Small Times for Coalescent Processes with Mutation and Selection

Authors: Philip A. Hanson, Paul A. Jenkins, Jere Koskela, Dario Spanò

Abstract: The Ancestral Selection Graph (ASG) is an important genealogical process which extends the well-known Kingman coalescent to incorporate natural selection. We show that the number of lineages of the ASG with and without mutation is asymptotic to $2/t$ as $t\to 0$, in agreement with the limiting behaviour of the Kingman coalescent. We couple these processes on the same probability space using a Pois… ▽ More The Ancestral Selection Graph (ASG) is an important genealogical process which extends the well-known Kingman coalescent to incorporate natural selection. We show that the number of lineages of the ASG with and without mutation is asymptotic to $2/t$ as $t\to 0$, in agreement with the limiting behaviour of the Kingman coalescent. We couple these processes on the same probability space using a Poisson random measure construction that allows us to precisely compare their hitting times. These comparisons enable us to characterise the speed of coming down from infinity of the ASG as well as its fluctuations in a functional central limit theorem. This extends similar results for the Kingman coalescent. △ Less

Submitted 22 December, 2020; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: 22 pages, 1 figure

MSC Class: Primary 60J90; 60F05; secondary 60J80

arXiv:2012.09562 [pdf, other]

doi 10.1093/bioinformatics/btab351

KwARG: Parsimonious reconstruction of ancestral recombination graphs with recurrent mutation

Authors: Anastasia Ignatieva, Rune B. Lyngsø, Paul A. Jenkins, Jotun Hein

Abstract: The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or nea… ▽ More The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or near-minimal in the number of posited recombination and mutation events. Given an input dataset of aligned sequences, KwARG outputs a list of possible candidate solutions, each comprising a list of mutation and recombination events that could have generated the dataset; the relative proportion of recombinations and recurrent mutations in a solution can be controlled via specifying a set of 'cost' parameters. We demonstrate that the algorithm performs well when compared against existing methods. The software is made available on GitHub. △ Less

Submitted 13 May, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: 18 pages, 12 figures; accepted for publication in Bioinformatics

arXiv:2009.10440 [pdf, other]

The computational cost of blocking for sampling discretely observed diffusions

Authors: Marcin Mider, Paul A. Jenkins, Murray Pollock, Gareth O. Roberts

Abstract: Many approaches for conducting Bayesian inference on discretely observed diffusions involve imputing diffusion bridges between observations. This can be computationally challenging in settings in which the temporal horizon between subsequent observations is large, due to the poor scaling of algorithms for simulating bridges as observation distance increases. It is common in practical settings to u… ▽ More Many approaches for conducting Bayesian inference on discretely observed diffusions involve imputing diffusion bridges between observations. This can be computationally challenging in settings in which the temporal horizon between subsequent observations is large, due to the poor scaling of algorithms for simulating bridges as observation distance increases. It is common in practical settings to use a blocking scheme, in which the path is split into a (user-specified) number of overlapping segments and a Gibbs sampler is employed to update segments in turn. Substituting the independent simulation of diffusion bridges for one obtained using blocking introduces an inherent trade-off: we are now imputing shorter bridges at the cost of introducing a dependency between subsequent iterations of the bridge sampler. This is further complicated by the fact that there are a number of possible ways to implement the blocking scheme, each of which introduces a different dependency structure between iterations. Although blocking schemes have had considerable empirical success in practice, there has been no analysis of this trade-off nor guidance to practitioners on the particular specifications that should be used to obtain a computationally efficient implementation. In this article we conduct this analysis and demonstrate that the expected computational cost of a blocked path-space rejection sampler applied to Brownian bridges scales asymptotically at a cubic rate with respect to the observation distance and that this rate is linear in the case of the Ornstein-Uhlenbeck process. Numerical experiments suggest applicability both of the results of our paper and of the guidance we provide beyond the class of linear diffusions considered. △ Less

Submitted 6 April, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: 15 pages, 3 figures

MSC Class: 60H35 (Primary); 60J22; 60J60; 60J65 (Secondary)

arXiv:2007.14987 [pdf, other]

Presentation and Analysis of a Multimodal Dataset for Grounded Language Learning

Authors: Patrick Jenkins, Rishabh Sachdeva, Gaoussou Youssouf Kebe, Padraig Higgins, Kasra Darvish, Edward Raff, Don Engel, John Winder, Francis Ferraro, Cynthia Matuszek

Abstract: Grounded language acquisition -- learning how language-based interactions refer to the world around them -- is amajor area of research in robotics, NLP, and HCI. In practice the data used for learning consists almost entirely of textual descriptions, which tend to be cleaner, clearer, and more grammatical than actual human interactions. In this work, we present the Grounded Language Dataset (GoLD)… ▽ More Grounded language acquisition -- learning how language-based interactions refer to the world around them -- is amajor area of research in robotics, NLP, and HCI. In practice the data used for learning consists almost entirely of textual descriptions, which tend to be cleaner, clearer, and more grammatical than actual human interactions. In this work, we present the Grounded Language Dataset (GoLD), a multimodal dataset of common household objects described by people using either spoken or written language. We analyze the differences and present an experiment showing how the different modalities affect language learning from human in-put. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, text, and speech interact, as well as show differences in the vernacular of these modalities impact results. △ Less

Submitted 28 September, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: 11 pages, 6 figures

arXiv:2007.00096 [pdf, ps, other]

Simple conditions for convergence of sequential Monte Carlo genealogies with applications

Authors: Suzie Brown, Paul A. Jenkins, Adam M. Johansen, Jere Koskela

Abstract: We present simple conditions under which the limiting genealogical process associated with a class of interacting particle systems with non-neutral selection mechanisms, as the number of particles grows, is a time-rescaled Kingman coalescent. Sequential Monte Carlo algorithms are popular methods for approximating integrals in problems such as non-linear filtering and smoothing which employ this ty… ▽ More We present simple conditions under which the limiting genealogical process associated with a class of interacting particle systems with non-neutral selection mechanisms, as the number of particles grows, is a time-rescaled Kingman coalescent. Sequential Monte Carlo algorithms are popular methods for approximating integrals in problems such as non-linear filtering and smoothing which employ this type of particle system. Their performance depends strongly on the properties of the induced genealogical process. We verify the conditions of our main result for standard sequential Monte Carlo algorithms with a broad class of low-variance resampling schemes, as well as for conditional sequential Monte Carlo with multinomial resampling. △ Less

Submitted 7 December, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

Comments: 22 pages, 1 figure

MSC Class: 60J90; 60J95; 65C05; 65C35

arXiv:2001.04337 [pdf, ps, other]

Congruences for coefficients of modular functions in levels 3, 5, and 7 with poles at 0

Authors: Paul Jenkins, Ryan Keck

Abstract: We give congruences modulo powers of $p \in \{3, 5,7\}$ for the Fourier coefficients of certain modular functions in level $p$ with poles only at 0, answering a question posed by Andersen and the first author and continuing work done by the authors and Moss. The congruences involve a modulus that depends on the base $p$ expansion of the modular form's order of vanishing at $\infty$. We give congruences modulo powers of $p \in \{3, 5,7\}$ for the Fourier coefficients of certain modular functions in level $p$ with poles only at 0, answering a question posed by Andersen and the first author and continuing work done by the authors and Moss. The congruences involve a modulus that depends on the base $p$ expansion of the modular form's order of vanishing at $\infty$. △ Less

Submitted 1 April, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: arXiv admin note: text overlap with arXiv:1709.10189 Version 2: Corrected typos

MSC Class: 11F30; 11F37

arXiv:2001.03527 [pdf, ps, other]

Convergence of Likelihood Ratios and Estimators for Selection in non-neutral Wright-Fisher Diffusions

Authors: Jaromir Sant, Paul A. Jenkins, Jere Koskela, Dario Spano

Abstract: A number of discrete time, finite population size models in genetics describing the dynamics of allele frequencies are known to converge (subject to suitable scaling) to a diffusion process in the infinite population limit, termed the Wright-Fisher diffusion. In this article we show that the diffusion is ergodic uniformly in the selection and mutation parameters, and that the measures induced by t… ▽ More A number of discrete time, finite population size models in genetics describing the dynamics of allele frequencies are known to converge (subject to suitable scaling) to a diffusion process in the infinite population limit, termed the Wright-Fisher diffusion. In this article we show that the diffusion is ergodic uniformly in the selection and mutation parameters, and that the measures induced by the solution to the stochastic differential equation are uniformly locally asymptotically normal. Subsequently these two results are used to analyse the statistical properties of the Maximum Likelihood and Bayesian estimators for the selection parameter, when both selection and mutation are acting on the population. In particular, it is shown that these estimators are uniformly over compact sets consistent, display uniform in the selection parameter asymptotic normality and convergence of moments over compact sets, and are asymptotically efficient for a suitable class of loss functions. △ Less

Submitted 13 September, 2021; v1 submitted 10 January, 2020; originally announced January 2020.

MSC Class: 92D10 60J60 60J70

arXiv:2001.03210 [pdf, other]

A Probabilistic Simulator of Spatial Demand for Product Allocation

Authors: Porter Jenkins, Hua Wei, J. Stockton Jenkins, Zhenhui Li

Abstract: Connecting consumers with relevant products is a very important problem in both online and offline commerce. In physical retail, product placement is an effective way to connect consumers with products. However, selecting product locations within a store can be a tedious process. Moreover, learning important spatial patterns in offline retail is challenging due to the scarcity of data and the high… ▽ More Connecting consumers with relevant products is a very important problem in both online and offline commerce. In physical retail, product placement is an effective way to connect consumers with products. However, selecting product locations within a store can be a tedious process. Moreover, learning important spatial patterns in offline retail is challenging due to the scarcity of data and the high cost of exploration and experimentation in the physical world. To address these challenges, we propose a stochastic model of spatial demand in physical retail. We show that the proposed model is more predictive of demand than existing baselines. We also perform a preliminary study into different automation techniques and show that an optimal product allocation policy can be learned through Deep Q-Learning. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 8 pages, The AAAI-20 Workshop on Intelligent Process Automation

arXiv:1912.04861 [pdf, other]

A characterisation of the reconstructed birth-death process through time rescaling

Authors: Anastasia Ignatieva, Jotun Hein, Paul A. Jenkins

Abstract: The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present t… ▽ More The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability $ψ$, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as $ψ$ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of $n$ logistic random variables with mode $\log(1/ψ)$; moreover, we show that the inter-event times are approximately exponentially distributed. △ Less

Submitted 6 May, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: 32 pages, 5 figures

arXiv:1903.10184 [pdf, other]

Simulating bridges using confluent diffusions

Authors: Paul A. Jenkins, Murray Pollock, Gareth O. Roberts, Michael Sørensen

Abstract: Diffusions are a fundamental class of models in many fields, including finance, engineering, and biology. Simulating diffusions is challenging as their sample paths are infinite-dimensional and their transition functions are typically intractable. In statistical settings such as parameter inference for discretely observed diffusions, we require simulation techniques for diffusions conditioned on h… ▽ More Diffusions are a fundamental class of models in many fields, including finance, engineering, and biology. Simulating diffusions is challenging as their sample paths are infinite-dimensional and their transition functions are typically intractable. In statistical settings such as parameter inference for discretely observed diffusions, we require simulation techniques for diffusions conditioned on hitting a given endpoint, which introduces further complication. In this paper we introduce a Markov chain Monte Carlo algorithm for simulating bridges of ergodic diffusions which (i) is exact in the sense that there is no discretisation error, (ii) has computational cost that is linear in the duration of the bridges, and (iii) provides bounds on local maxima and minima of the simulated trajectory. Our approach works directly on diffusion path space, by constructing a proposal (which we term a confluence) that is then corrected with an accept/reject step in a pseudo-marginal algorithm. Our method requires only the simulation of unconditioned diffusion sample paths. We apply our approach to the simulation of Langevin diffusion bridges, a practical problem arising naturally in many situations, such as statistical inference in distributed settings. △ Less

Submitted 10 June, 2021; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: Significant revision of prior submission, with an improved methodology which is far broader in its applicability. Updated author listing. 19 pages, 5 figures

MSC Class: 65C05; 60H35; 60J60

arXiv:1805.02835 [pdf, other]

Crossing points in survival analysis sensitively depend on system conditions

Authors: Thomas McAndrew, Bjorn Redfors, Yiran Zhang, Aaron Crowley, Shmuel Chen, Gregg Stone, Paul Jenkins

Abstract: Crossing survival curves complicate how we interpret results from a clinical trial's primary endpoint. We find the function to determine a crossing point's location depends exponentially on individual survival curves. This exponential relationship between survival curves and the crossing point transforms small survival curve errors into large crossing point errors. In most cases, crossing points a… ▽ More Crossing survival curves complicate how we interpret results from a clinical trial's primary endpoint. We find the function to determine a crossing point's location depends exponentially on individual survival curves. This exponential relationship between survival curves and the crossing point transforms small survival curve errors into large crossing point errors. In most cases, crossing points are sensitive to individual survival errors and may make accurately locating a crossing point unsuccessful. We argue more complicated analyses for mitigating crossing points should be reserved only after first exploring a crossing point's variability, or hypothesis tests account for crossing point variability. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: 6 Pages: Survival Analysis, Crossing Survival Curves, Error Propogation

arXiv:1805.02821 [pdf, other]

How Cox models react to a study-specific confounder in a patient-level pooled dataset: Random-effects better cope with an imbalanced covariate across trials unless baseline hazards differ

Authors: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran Zhang, Shmuel Chen, Mordechai Golomb, Maria Alu, Dominic Francese, Ori Ben-Yehuda, Akiko Maehara, Gary Mintz, Gregg Stone, Paul Jenkins

Abstract: Combining patient-level data from clinical trials can connect rare phenomena with clinical endpoints, but statistical techniques applied to a single trial may become problematical when trials are pooled. Estimating the hazard of a binary variable unevenly distributed across trials showcases a common pooled database issue. We studied how an unevenly distributed binary variable can compromise the… ▽ More Combining patient-level data from clinical trials can connect rare phenomena with clinical endpoints, but statistical techniques applied to a single trial may become problematical when trials are pooled. Estimating the hazard of a binary variable unevenly distributed across trials showcases a common pooled database issue. We studied how an unevenly distributed binary variable can compromise the integrity of fixed and random effects Cox proportional hazards models. We compared fixed effect and random effects Cox proportional hazards models on a set of simulated datasets inspired by a 17-trial pooled database of patients presenting with ST-segment elevation myocardial infarction (STEMI) and non-STEMI undergoing percutaneous coronary intervention. An unevenly distributed covariate can bias hazard ratio estimates, inflate standard errors, raise type I error, and reduce power. While uneveness causes problems for all Cox proportional hazards models, random effects suffer least. Compared to fixed effect models, random effects suffer lower bias and trade inflated type I errors for improved power. Contrasting hazard rates between trials prevent accurate estimates from both fixed and random effects models. When modeling a covariate unevenly distributed across pooled trials with similar baseline hazard rates, Cox proportional hazards models with a random trial effect more accurately estimate hazard ratios than fixed effects. Differing between-trial baseline hazard rates bias both random and fixed effect models. With an unevenly-distributed covariate and similar baseline hazard rates across trials, a random effects Cox proportional hazards model outperforms a fixed effect model, but cannot overcome contrasting baseline hazard rates. △ Less

Submitted 7 May, 2018; originally announced May 2018.

Comments: 9 Pages: Cox-Proportional Hazards, Frailty, Fixed-Effects, Random-Effects, Pooling Data

arXiv:1804.07065 [pdf, other]

Bayesian nonparametric analysis of Kingman's coalescent

Authors: Stefano Favaro, Shui Feng, Paul A. Jenkins

Abstract: Kingman's coalescent is one of the most popular models in population genetics. It describes the genealogy of a population whose genetic composition evolves in time according to the Wright-Fisher model, or suitable approximations of it belonging to the broad class of Fleming-Viot processes. Ancestral inference under Kingman's coalescent has had much attention in the literature, both in practical da… ▽ More Kingman's coalescent is one of the most popular models in population genetics. It describes the genealogy of a population whose genetic composition evolves in time according to the Wright-Fisher model, or suitable approximations of it belonging to the broad class of Fleming-Viot processes. Ancestral inference under Kingman's coalescent has had much attention in the literature, both in practical data analysis, and from a theoretical and methodological point of view. Given a sample of individuals taken from the population at time $t>0$, most contributions have aimed at making frequentist or Bayesian parametric inference on quantities related to the genealogy of the sample. In this paper we propose a Bayesian nonparametric predictive approach to ancestral inference. That is, under the prior assumption that the composition of the population evolves in time according to a neutral Fleming-Viot process, and given the information contained in an initial sample of $m$ individuals taken from the population at time $t>0$, we estimate quantities related to the genealogy of an additional unobservable sample of size $m^{\prime}\geq1$. As a by-product of our analysis we introduce a class of Bayesian nonparametric estimators (predictors) which can be thought of as Good-Turing type estimators for ancestral inference. The proposed approach is illustrated through an application to genetic data. △ Less

Submitted 19 April, 2018; originally announced April 2018.

Comments: 37 pages, 2 figures. To appear in Annales de l'Institut Henri Poincaré - Probabilités et Statistiques

MSC Class: 62C10 (Primary) 62M05 (Secondary)

arXiv:1804.01811 [pdf, other]

doi 10.1214/19-AOS1823

Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo

Authors: Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano

Abstract: We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-… ▽ More We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman n-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable n-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary. △ Less

Submitted 16 July, 2021; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 28 pages, 1 figure. An earlier version of this manuscript contained an error, which we have been able to correct and in so doing give a stronger result under cleaner conditions. v7: Added several technical lemmas which make the overall argument more explicit

MSC Class: Primary 60E15; secondary 60G99; 62E20

Journal ref: Annals of Statistics 48(1):560-583, 2020

arXiv:1802.06153 [pdf, other]

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

Authors: Jeffrey Chan, Valerio Perrone, Jeffrey P. Spence, Paul A. Jenkins, Sara Mathieson, Yun S. Song

Abstract: An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential chal… ▽ More An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art. △ Less

Submitted 5 November, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

Comments: 9 pages, 8 figures

arXiv:1711.06239 [pdf, ps, other]

Divisibility properties of coefficients of modular functions in genus zero levels

Authors: Victoria Iba, Paul Jenkins, Merrill Warnick

Abstract: We prove divisibility results for the Fourier coefficients of canonical basis elements for the spaces of weakly holomorphic modular forms of weight $0$ and levels $6, 10, 12, 18$ with poles only at the cusp at infinity. In addition, we show that these Fourier coefficients satisfy Zagier duality in all weights, and give a general formula for the generating functions of such canonical bases for all… ▽ More We prove divisibility results for the Fourier coefficients of canonical basis elements for the spaces of weakly holomorphic modular forms of weight $0$ and levels $6, 10, 12, 18$ with poles only at the cusp at infinity. In addition, we show that these Fourier coefficients satisfy Zagier duality in all weights, and give a general formula for the generating functions of such canonical bases for all genus zero levels. △ Less

Submitted 27 July, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

MSC Class: 11F03; 11F33; 11F37

arXiv:1709.10189 [pdf, ps, other]

Congruences for coefficients of level 2 modular functions with poles at 0

Authors: Paul Jenkins, Ryan Keck, Eric Moss

Abstract: We give congruences modulo powers of 2 for the Fourier coefficients of certain level 2 modular functions with poles only at 0, answering a question posed by Andersen and the first author. The congruences involve a modulus that depends on the binary expansion of the modular form's order of vanishing at $\infty$. We give congruences modulo powers of 2 for the Fourier coefficients of certain level 2 modular functions with poles only at 0, answering a question posed by Andersen and the first author. The congruences involve a modulus that depends on the binary expansion of the modular form's order of vanishing at $\infty$. △ Less

Submitted 28 September, 2017; originally announced September 2017.

MSC Class: 11F30; 11F37

arXiv:1709.10023 [pdf, ps, other]

Zagier duality for level $p$ weakly holomorphic modular forms

Authors: Paul Jenkins, Grant Molnar

Abstract: We prove Zagier duality between the Fourier coefficients of canonical bases for spaces of weakly holomorphic modular forms of prime level $p$ with $11 \leq p \leq 37$ with poles only at the cusp at $\infty$, and special cases of duality for an infinite class of prime levels. We derive generating functions for the bases for genus 1 levels. We prove Zagier duality between the Fourier coefficients of canonical bases for spaces of weakly holomorphic modular forms of prime level $p$ with $11 \leq p \leq 37$ with poles only at the cusp at $\infty$, and special cases of duality for an infinite class of prime levels. We derive generating functions for the bases for genus 1 levels. △ Less

Submitted 8 February, 2018; v1 submitted 28 September, 2017; originally announced September 2017.

MSC Class: 11F30; 11F37

arXiv:1704.08649 [pdf, ps, other]

Differential operators on polar harmonic Maass forms and elliptic duality

Authors: Kathrin Bringmann, Paul Jenkins, Ben Kane

Abstract: In this paper, we study polar harmonic Maass forms of negative integral weight. Using work of Fay, we construct Poincaré series which span the space of such forms and show that their elliptic coefficients exhibit duality properties which are similar to the properties known for Fourier coefficients of harmonic Maass forms and weakly holomorphic modular forms. In this paper, we study polar harmonic Maass forms of negative integral weight. Using work of Fay, we construct Poincaré series which span the space of such forms and show that their elliptic coefficients exhibit duality properties which are similar to the properties known for Fourier coefficients of harmonic Maass forms and weakly holomorphic modular forms. △ Less

Submitted 27 April, 2017; originally announced April 2017.

MSC Class: 11F25; 11F37

arXiv:1703.08145 [pdf, ps, other]

Weakly holomorphic modular forms in prime power levels of genus zero

Authors: Paul Jenkins, DJ Thornton

Abstract: Let $M_k^\sharp(N)$ be the space of weight $k$, level $N$ weakly holomorphic modular forms with poles only at the cusp at $\infty$. We explicitly construct a canonical basis for $M_k^\sharp(N)$ for $N\in\{8,9,16,25\}$, and show that many of the Fourier coefficients of the basis elements in $M_0^\sharp(N)$ are divisible by high powers of the prime dividing the level $N$. Additionally, we show that… ▽ More Let $M_k^\sharp(N)$ be the space of weight $k$, level $N$ weakly holomorphic modular forms with poles only at the cusp at $\infty$. We explicitly construct a canonical basis for $M_k^\sharp(N)$ for $N\in\{8,9,16,25\}$, and show that many of the Fourier coefficients of the basis elements in $M_0^\sharp(N)$ are divisible by high powers of the prime dividing the level $N$. Additionally, we show that these basis elements satisfy a Zagier duality property, and extend Griffin's results on congruences in level 1 to levels 2, 3, 4, 5, 7, 8, 9, 16, and 25. △ Less

Submitted 23 March, 2017; originally announced March 2017.

MSC Class: 11F37; 11F33

arXiv:1703.00208 [pdf, other]

Wright-Fisher diffusion bridges

Authors: Robert Griffiths, Paul A. Jenkins, Dario Spanò

Abstract: {\bf Abstract} The trajectory of the frequency of an allele which begins at $x$ at time $0$ and is known to have frequency $z$ at time $T$ can be modelled by the bridge process of the Wright-Fisher diffusion. Bridges when $x=z=0$ are particularly interesting because they model the trajectory of the frequency of an allele which appears at a time, then is lost by random drift or mutation after a tim… ▽ More {\bf Abstract} The trajectory of the frequency of an allele which begins at $x$ at time $0$ and is known to have frequency $z$ at time $T$ can be modelled by the bridge process of the Wright-Fisher diffusion. Bridges when $x=z=0$ are particularly interesting because they model the trajectory of the frequency of an allele which appears at a time, then is lost by random drift or mutation after a time $T$. The coalescent genealogy back in time of a population in a neutral Wright-Fisher diffusion process is well understood. In this paper we obtain a new interpretation of the coalescent genealogy of the population in a bridge from a time $t\in (0,T)$. In a bridge with allele frequencies of 0 at times 0 and $T$ the coalescence structure is that the population coalesces in two directions from $t$ to $0$ and $t$ to $T$ such that there is just one lineage of the allele under consideration at times $0$ and $T$. The genealogy in Wright-Fisher diffusion bridges with selection is more complex than in the neutral model, but still with the property of the population branching and coalescing in two directions from time $t\in (0,T)$. The density of the frequency of an allele at time $t$ is expressed in a way that shows coalescence in the two directions. A new algorithm for exact simulation of a neutral Wright-Fisher bridge is derived. This follows from knowing the density of the frequency in a bridge and exact simulation from the Wright-Fisher diffusion. The genealogy of the neutral Wright-Fisher bridge is also modelled by branching Pólya urns, extending a representation in a Wright-Fisher diffusion. This is a new very interesting representation that relates Wright-Fisher bridges to classical urn models in a Bayesian setting. △ Less

Submitted 21 August, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

MSC Class: 92D15; 60J60; 97K60

arXiv:1612.01872 [pdf, other]

Simulation from quasi-stationary distributions on reducible state spaces

Authors: Adam Griffin, Paul A. Jenkins, Gareth O. Roberts, Simon E. F. Spencer

Abstract: Quasi-stationary distributions (QSDs)arise from stochastic processes that exhibit transient equilibrium behaviour on the way to absorption QSDs are often mathematically intractable and even drawing samples from them is not straightforward. In this paper the framework of Sequential Monte Carlo samplers is utilized to simulate QSDs and several novel resampling techniques are proposed to accommodate… ▽ More Quasi-stationary distributions (QSDs)arise from stochastic processes that exhibit transient equilibrium behaviour on the way to absorption QSDs are often mathematically intractable and even drawing samples from them is not straightforward. In this paper the framework of Sequential Monte Carlo samplers is utilized to simulate QSDs and several novel resampling techniques are proposed to accommodate models with reducible state spaces, with particular focus on preserving particle diversity on discrete spaces. Finally an approach is considered to estimate eigenvalues associated with QSDs, such as the decay parameter. △ Less

Submitted 17 January, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

Comments: 30 pages, 9 Figures

MSC Class: 60J27; 62G09

arXiv:1611.07460 [pdf, other]

Poisson Random Fields for Dynamic Feature Models

Authors: Valerio Perrone, Paul A. Jenkins, Dario Spano, Yee Whye Teh

Abstract: We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features. This model is suitable as a prior in Bayesian nonparametric feature allocation models in which the features underlying the observed data exhibit a dependency structure over time. More specifically, we establish a new fram… ▽ More We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features. This model is suitable as a prior in Bayesian nonparametric feature allocation models in which the features underlying the observed data exhibit a dependency structure over time. More specifically, we establish a new framework for generating dependent Indian buffet processes, where the Poisson random field model from population genetics is used as a way of constructing dependent beta processes. Inference in the model is complex, and we describe a sophisticated Markov Chain Monte Carlo algorithm for exact posterior simulation. We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015. △ Less

Submitted 22 November, 2016; originally announced November 2016.

arXiv:1604.04145 [pdf, other]

doi 10.1016/j.tpb.2016.08.007

A coalescent dual process for a Wright-Fisher diffusion with recombination and its application to haplotype partitioning

Authors: Robert C. Griffiths, Paul A. Jenkins, Sabin Lessard

Abstract: Duality plays an important role in population genetics. It can relate results from forwards-in-time models of allele frequency evolution with those of backwards-in-time genealogical models; a well known example is the duality between the Wright-Fisher diffusion for genetic drift and its genealogical counterpart, the coalescent. There have been a number of articles extending this relationship to in… ▽ More Duality plays an important role in population genetics. It can relate results from forwards-in-time models of allele frequency evolution with those of backwards-in-time genealogical models; a well known example is the duality between the Wright-Fisher diffusion for genetic drift and its genealogical counterpart, the coalescent. There have been a number of articles extending this relationship to include other evolutionary processes such as mutation and selection, but little has been explored for models also incorporating crossover recombination. Here, we derive from first principles a new genealogical process which is dual to a Wright-Fisher diffusion model of drift, mutation, and recombination. Our approach is based on expressing a putative duality relationship between two models via their infinitesimal generators, and then seeking an appropriate test function to ensure the validity of the duality equation. This approach is quite general, and we use it to find dualities for several important variants, including both a discrete L-locus model of a gene and a continuous model in which mutation and recombination events are scattered along the gene according to continuous distributions. As an application of our results, we derive a series expansion for the transition function of the diffusion. Finally, we study in further detail the case in which mutation is absent. Then the dual process describes the dispersal of ancestral genetic material across the ancestors of a sample. The stationary distribution of this process is of particular interest; we show how duality relates this distribution to haplotype fixation probabilities. We develop an efficient method for computing such probabilities in multilocus models. △ Less

Submitted 8 August, 2019; v1 submitted 14 April, 2016; originally announced April 2016.

Comments: This version corrects typographical errors in equations (25), (26), (27), (B.3), (B.4). 39 pages, 3 figures

Journal ref: Theoretical Population Biology, 112: 126-138 (2016)

arXiv:1603.02834 [pdf, other]

doi 10.1007/s11222-017-9722-1

Inference and rare event simulation for stopped Markov processes via reverse-time sequential Monte Carlo

Authors: Jere Koskela, Dario Spano, Paul A. Jenkins

Abstract: We present a sequential Monte Carlo algorithm for Markov chain trajectories with proposals constructed in reverse time, which is advantageous when paths are conditioned to end in a rare set. The reverse time proposal distribution is constructed by approximating the ratio of Green's functions in Nagasawa's formula. Conditioning arguments can be used to interpret these ratios as low-dimensional cond… ▽ More We present a sequential Monte Carlo algorithm for Markov chain trajectories with proposals constructed in reverse time, which is advantageous when paths are conditioned to end in a rare set. The reverse time proposal distribution is constructed by approximating the ratio of Green's functions in Nagasawa's formula. Conditioning arguments can be used to interpret these ratios as low-dimensional conditional sampling distributions of some coordinates of the process given the others. Hence the difficulty in designing SMC proposals in high dimension is greatly reduced. We illustrate our method on estimating an overflow probability in a queueing model, the probability that a diffusion follows a narrowing corridor, and the initial location of an infection in an epidemic model on a network. △ Less

Submitted 2 January, 2017; v1 submitted 9 March, 2016; originally announced March 2016.

Comments: 21 pages, 6 figures

MSC Class: Primary: 62M05; Secondary: 60J20; 60J22

Journal ref: Statistics and Computing 28(1):131-144, 2018

arXiv:1602.00589 [pdf, other]

Zeros of modular forms of half integral weight

Authors: Amanda Folsom, Paul Jenkins

Abstract: We study canonical bases for spaces of weakly holomorphic modular forms of level 4 and weights in $\mathbb{Z}+\frac{1}{2}$ and show that almost all modular forms in these bases have the property that many of their zeros in a fundamental domain for $Γ_0(4)$ lie on a lower boundary arc of the fundamental domain. Additionally, we show that at many places on this arc, the generating function for Hurwi… ▽ More We study canonical bases for spaces of weakly holomorphic modular forms of level 4 and weights in $\mathbb{Z}+\frac{1}{2}$ and show that almost all modular forms in these bases have the property that many of their zeros in a fundamental domain for $Γ_0(4)$ lie on a lower boundary arc of the fundamental domain. Additionally, we show that at many places on this arc, the generating function for Hurwitz class numbers is equal to a particular mock modular Poincaré series, and show that for positive weights, a particular set of Fourier coefficients of cusp forms in this canonical basis cannot simultaneously vanish. △ Less

Submitted 2 February, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

MSC Class: 11F37; 11F30

arXiv:1512.00982 [pdf, other]

doi 10.3150/16-BEJ923

Bayesian non-parametric inference for $Λ$-coalescents: consistency and a parametric method

Authors: Jere Koskela, Paul A. Jenkins, Dario Spanò

Abstract: We investigate Bayesian non-parametric inference of the $Λ$-measure of $Λ$-coalescent processes with recurrent mutation, parametrised by probability measures on the unit interval. We give verifiable criteria on the prior for posterior consistency when observations form a time series, and prove that any non-trivial prior is inconsistent when all observations are contemporaneous. We then show that t… ▽ More We investigate Bayesian non-parametric inference of the $Λ$-measure of $Λ$-coalescent processes with recurrent mutation, parametrised by probability measures on the unit interval. We give verifiable criteria on the prior for posterior consistency when observations form a time series, and prove that any non-trivial prior is inconsistent when all observations are contemporaneous. We then show that the likelihood given a data set of size $n \in \mathbb{N}$ is constant across $Λ$-measures whose leading $n - 2$ moments agree, and focus on inferring truncated sequences of moments. We provide a large class of functionals which can be extremised using finite computation given a credible region of posterior truncated moment sequences, and a pseudo-marginal Metropolis-Hastings algorithm for sampling the posterior. Finally, we compare the efficiency of the exact and noisy pseudo-marginal algorithms with and without delayed acceptance acceleration using a simulation study. △ Less

Submitted 23 January, 2017; v1 submitted 3 December, 2015; originally announced December 2015.

Comments: 28 pages, 3 figures

MSC Class: Primary: 62M05; Secondary: 62G05; 92D15

Journal ref: Bernoulli 24(3):2122-2153, 2018

arXiv:1506.06998 [pdf, other]

doi 10.1214/16-AAP1236

Exact simulation of the Wright-Fisher diffusion

Authors: Paul A. Jenkins, Dario Spano

Abstract: The Wright-Fisher family of diffusion processes is a widely used class of evolutionary models. However, simulation is difficult because there is no known closed-form formula for its transition function. In this article we demonstrate that it is in fact possible to simulate exactly from a broad class of Wright-Fisher diffusion processes and their bridges. For those diffusions corresponding to rever… ▽ More The Wright-Fisher family of diffusion processes is a widely used class of evolutionary models. However, simulation is difficult because there is no known closed-form formula for its transition function. In this article we demonstrate that it is in fact possible to simulate exactly from a broad class of Wright-Fisher diffusion processes and their bridges. For those diffusions corresponding to reversible, neutral evolution, our key idea is to exploit an eigenfunction expansion of the transition function; this approach even applies to its infinite-dimensional analogue, the Fleming-Viot process. We then develop an exact rejection algorithm for processes with more general drift functions, including those modelling natural selection, using ideas from retrospective simulation. Our approach also yields methods for exact simulation of the moment dual of the Wright-Fisher diffusion, the ancestral process of an infinite-leaf Kingman coalescent tree. We believe our new perspective on diffusion simulation holds promise for other models admitting a transition eigenfunction expansion. △ Less

Submitted 29 September, 2023; v1 submitted 23 June, 2015; originally announced June 2015.

Comments: 36 pages, 2 figure, 2 tables. This version corrects minor errors in the statements of Propositions 6 and 7

Report number: CRiSM Working Paper 14-27 MSC Class: 65C05 (Primary); 60H35; 60J60; 92D15 (Secondary)

Journal ref: Annals of Applied Probability 27(3):1478-1509 (2017)

arXiv:1506.04709 [pdf, ps, other]

doi 10.3150/18-BEJ1050

Consistency of Bayesian nonparametric inference for discretely observed jump diffusions

Authors: Jere Koskela, Dario Spano, Paul A. Jenkins

Abstract: We introduce verifiable criteria for weak posterior consistency of identifiable Bayesian nonparametric inference for jump diffusions with unit diffusion coefficient and uniformly Lipschitz drift and jump coefficients in arbitrary dimension. The criteria are expressed in terms of coefficients of the SDEs describing the process, and do not depend on intractable quantities such as transition densitie… ▽ More We introduce verifiable criteria for weak posterior consistency of identifiable Bayesian nonparametric inference for jump diffusions with unit diffusion coefficient and uniformly Lipschitz drift and jump coefficients in arbitrary dimension. The criteria are expressed in terms of coefficients of the SDEs describing the process, and do not depend on intractable quantities such as transition densities. We also show that products of discrete net and Dirichlet mixture model priors satisfy our conditions, again under an identifiability assumption. This generalises known results by incorporating jumps into previous work on unit diffusions with uniformly Lipschitz drift coefficients. △ Less

Submitted 14 September, 2018; v1 submitted 15 June, 2015; originally announced June 2015.

Comments: 20 pages

MSC Class: 62G20 (Primary) 60J25; 62M05 (Secondary)

Journal ref: Bernoulli 25(3):2183-2205, 2019

arXiv:1408.1083 [pdf, ps, other]

Coefficient Bounds for Level 2 Cusp Forms and Modular Functions

Authors: Paul Jenkins, Kyle Pratt

Abstract: We give explicit upper bounds for the coefficients of arbitrary weight $k$, level 2 cusp forms, making Deligne's well-known $O(n^{\frac{k-1}{2}+ε})$ bound precise. We also derive asymptotic formulas and explicit upper bounds for the coefficients of certain level 2 modular functions. We give explicit upper bounds for the coefficients of arbitrary weight $k$, level 2 cusp forms, making Deligne's well-known $O(n^{\frac{k-1}{2}+ε})$ bound precise. We also derive asymptotic formulas and explicit upper bounds for the coefficients of certain level 2 modular functions. △ Less

Submitted 5 August, 2014; originally announced August 2014.

Showing 1–50 of 75 results for author: Jenkins, P