Search | arXiv e-print repository

BACTA-GPT: An AI-Based Bayesian Adaptive Clinical Trial Architect

Authors: Krishna Padmanabhan, Danny Baker

Abstract: Bayesian adaptive clinical trials offer a flexible and efficient alternative to traditional fixed-design trials, but their implementation is often hindered by the complexity of Bayesian computations and the need for advanced statistical programming expertise. The authors introduce a custom fine-tuned LLM designed to assist with this and lower barriers to adoption of Bayesian methods for adaptive c… ▽ More Bayesian adaptive clinical trials offer a flexible and efficient alternative to traditional fixed-design trials, but their implementation is often hindered by the complexity of Bayesian computations and the need for advanced statistical programming expertise. The authors introduce a custom fine-tuned LLM designed to assist with this and lower barriers to adoption of Bayesian methods for adaptive clinical trials. This paper describes the development and fine-tuning of BACTA-GPT, a Large Language Model (LLM)-based tool designed to assist in the implementation of Bayesian Adaptive Clinical Trials. This engine uses GPT-3.5 as the underlying model and takes in Natural Language input from the Statistician or the Trialist. The fine-tuned model demonstrates a viable proof-of-concept in its objectives. Test case evaluations show that the model is capable of generating a fit-for-purpose Bayesian model for an adaptive trial and evaluate its operating characteristics via simulations using R and JAGS. The integration of AI code generation has significant potential to lower technical barriers for the design and implementation of Bayesian Adaptive trials. But they also require attention to important considerations regarding validation and quality control. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: 15 pages plus 9 page appendix

arXiv:2206.04119 [pdf, other]

Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Authors: Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, Tommi Jaakkola

Abstract: Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds.… ▽ More Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif. △ Less

Submitted 19 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Appearing in ICLR 2023. Code available: github.com/blt2114/ProtDiff_SMCDiff

arXiv:2202.03253 [pdf, ps, other]

A useful family of fat-tailed distributions

Authors: Rose D Baker

Abstract: It is argued that there is a need for fat-tailed distributions that become thin in the extreme tail. A 3-parameter distribution is introduced that visually resembles the t-distribution and interpolates between the normal distribution and the Cauchy distribution. It is fat-tailed, but has all moments finite, and the moment-generating function exists. It would be useful as an alternative to the t-di… ▽ More It is argued that there is a need for fat-tailed distributions that become thin in the extreme tail. A 3-parameter distribution is introduced that visually resembles the t-distribution and interpolates between the normal distribution and the Cauchy distribution. It is fat-tailed, but has all moments finite, and the moment-generating function exists. It would be useful as an alternative to the t-distribution for a sensitivity analysis to check the robustness of results or for computations where finite moments are needed, such as in option-pricing. It can be motivated probabilistically in at least two ways, either as the random thinning of a long-tailed distribution, or as random variation of the variance of a normal distribution. Its properties are described, algorithms for random-number generation are provided, and examples of its use in data-fitting given. Some related distributions are also discussed, including asymmetric and multivariate distributions. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 20 pages, 3 figures, 1 table

MSC Class: 62

arXiv:2202.03072 [pdf, ps, other]

Mathematical models of confirmation bias

Authors: Rose D Baker

Abstract: Confirmation bias is a cognitive bias that adversely affects management decisions, and mathematical modelling is an aid to its detailed understanding. Bias in opinion update about the value of a parameter is modelled here assuming that observations are discounted depending on their distance from prior opinion. The models allow belief persistence, attitude polarization, and the irrational primacy e… ▽ More Confirmation bias is a cognitive bias that adversely affects management decisions, and mathematical modelling is an aid to its detailed understanding. Bias in opinion update about the value of a parameter is modelled here assuming that observations are discounted depending on their distance from prior opinion. The models allow belief persistence, attitude polarization, and the irrational primacy effect to be explored. A general framework for exploring large-sample properties of these models is given, and an attempt made to classify the models. An interesting result is that in some models the influence of an observation always increases with distance from the prior opinion, whereas in others observations greatly at odds with prior opinion are given very little weight. The models could be useful to those exploring these phenomena in detail. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: 17 pages, 3 figures

MSC Class: 62 Statistics

arXiv:2101.04408 [pdf, other]

doi 10.51628/001c.27680

Statistical analysis of periodic data in neuroscience

Authors: Daniel H. Baker

Abstract: Many experimental paradigms in neuroscience involve driving the nervous system with periodic sensory stimuli. Neural signals recorded using a variety of techniques will then include phase-locked oscillations at the stimulation frequency. The analysis of such data often involves standard univariate statistics such as T-tests, conducted on the Fourier amplitude components (ignoring phase), either to… ▽ More Many experimental paradigms in neuroscience involve driving the nervous system with periodic sensory stimuli. Neural signals recorded using a variety of techniques will then include phase-locked oscillations at the stimulation frequency. The analysis of such data often involves standard univariate statistics such as T-tests, conducted on the Fourier amplitude components (ignoring phase), either to test for the presence of a signal, or to compare signals across different conditions. However, the assumptions of these tests will sometimes be violated because amplitudes are not normally distributed, and furthermore weak signals might be missed if the phase information is discarded. An alternative approach is to conduct multivariate statistical tests using the real and imaginary Fourier components. Here the performance of two multivariate extensions of the T-test are compared: Hotelling's $T^2$ and a variant called $T^2_{circ}$. A novel test of the assumptions of $T^2_{circ}$ is developed, based on the condition index of the data (the square root of the ratio of eigenvalues of a bounding ellipse), and a heuristic for excluding outliers using the Mahalanobis distance is proposed. The $T^2_{circ}$ statistic is then extended to multi-level designs, resulting in a new statistical test termed $ANOVA^2_{circ}$. This has identical assumptions to $T^2_{circ}$, and is shown to be more sensitive than MANOVA when these assumptions are met. The use of these tests is demonstrated for two publicly available empirical data sets, and practical guidance is suggested for choosing which test to run. Implementations of these novel tools are provided as an R package and a Matlab toolbox, in the hope that their wider adoption will improve the sensitivity of statistical inferences involving periodic data. △ Less

Submitted 26 August, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: 18 pages, 10 figures

Journal ref: Neurons, Behavior, Data Analysis and Theory (2021) 5(3): 1-18

arXiv:1902.06122 [pdf, other]

doi 10.1037/met0000337

Power contours: optimising sample size and precision in experimental psychology and human neuroscience

Authors: Daniel H. Baker, Greta Vilidaite, Freya A. Lygo, Anika K. Smith, Tessa R. Flack, Andre D. Gouws, Timothy J. Andrews

Abstract: When designing experimental studies with human participants, experimenters must decide how many trials each participant will complete, as well as how many participants to test. Most discussion of statistical power (the ability of a study design to detect an effect) has focussed on sample size, and assumed sufficient trials. Here we explore the influence of both factors on statistical power, repres… ▽ More When designing experimental studies with human participants, experimenters must decide how many trials each participant will complete, as well as how many participants to test. Most discussion of statistical power (the ability of a study design to detect an effect) has focussed on sample size, and assumed sufficient trials. Here we explore the influence of both factors on statistical power, represented as a two-dimensional plot on which iso-power contours can be visualised. We demonstrate the conditions under which the number of trials is particularly important, i.e. when the within-participant variance is large relative to the between-participants variance. We then derive power contour plots using existing data sets for eight experimental paradigms and methodologies (including reaction times, sensory thresholds, fMRI, MEG, and EEG), and provide example code to calculate estimates of the within- and between-participant variance for each method. In all cases, the within-participant variance was larger than the between-participants variance, meaning that the number of trials has a meaningful influence on statistical power in commonly used paradigms. An online tool is provided (https://shiny.york.ac.uk/powercontours/) for generating power contours, from which the optimal combination of trials and participants can be calculated when designing future studies. △ Less

Submitted 4 February, 2020; v1 submitted 16 February, 2019; originally announced February 2019.

Journal ref: Psychological Methods (2021), 26(3): 295-314

arXiv:1606.05203 [pdf, ps, other]

A new asymmetric generalisation of the t-distribution

Authors: Rose D. Baker

Abstract: A 6-parameter fat-tailed distribution is proposed that generalises the t-distribution and allows asymmetry of scale and also of tail power, whilst avoiding the discontinuity of the second derivative of the split-t (AST) distribution. With the sixth parameter set to unity and no asymmetry, the distribution reduces to a t-distribution, but with the sixth parameter reduced, fatter tails than those of… ▽ More A 6-parameter fat-tailed distribution is proposed that generalises the t-distribution and allows asymmetry of scale and also of tail power, whilst avoiding the discontinuity of the second derivative of the split-t (AST) distribution. With the sixth parameter set to unity and no asymmetry, the distribution reduces to a t-distribution, but with the sixth parameter reduced, fatter tails than those of the t-distribution are allowed (the tails start earlier) and the distribution generalises Johnson's $S_U$ distribution. Data fitting is illustrated with examples. △ Less

Submitted 16 June, 2016; originally announced June 2016.

Comments: 13 pages, 2 figures, Statistics, exact distribution theoru

Showing 1–7 of 7 results for author: Baker, D