Search | arXiv e-print repository

Putting Skill as Nearly Indistinguishable from Noise: An Empirical Bayes Analysis of PGA Tour Performance

Authors: Ryan S. Brill, Abraham J. Wyner

Abstract: We revisit a foundational question in golf analytics: how important are the core components of performance--driving, approach play, and putting--in explaining success on the PGA Tour? Building on Mark Broadie's strokes gained analyses, we use an empirical Bayes approach to estimate latent golfer skill and assess statistical significance using a multiple testing procedure that controls the false di… ▽ More We revisit a foundational question in golf analytics: how important are the core components of performance--driving, approach play, and putting--in explaining success on the PGA Tour? Building on Mark Broadie's strokes gained analyses, we use an empirical Bayes approach to estimate latent golfer skill and assess statistical significance using a multiple testing procedure that controls the false discovery rate. While tee-to-green skill shows clear and substantial differences across players, putting skill is both less variable and far less reliably estimable. Indeed, putting performance appears nearly indistinguishable from noise. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2411.10400 [pdf, other]

The Loser's Curse and the Critical Role of the Utility Function

Authors: Ryan S. Brill, Abraham J. Wyner

Abstract: A longstanding question in the judgment and decision making literature is whether experts, even in high-stakes environments, exhibit the same cognitive biases observed in controlled experiments with inexperienced participants. Massey and Thaler (2013) claim to have found an example of bias and irrationality in expert decision making: general managers' behavior in the National Football League draft… ▽ More A longstanding question in the judgment and decision making literature is whether experts, even in high-stakes environments, exhibit the same cognitive biases observed in controlled experiments with inexperienced participants. Massey and Thaler (2013) claim to have found an example of bias and irrationality in expert decision making: general managers' behavior in the National Football League draft pick trade market. They argue that general managers systematically overvalue top draft picks, which generate less surplus value on average than later first-round picks, a phenomenon known as the loser's curse. Their conclusion hinges on the assumption that general managers should use expected surplus value as their utility function for evaluating draft picks. This assumption, however, is neither explicitly justified nor necessarily aligned with the strategic complexities of constructing a National Football League roster. In this paper, we challenge their framework by considering alternative utility functions, particularly those that emphasize the acquisition of transformational players--those capable of dramatically increasing a team's chances of winning the Super Bowl. Under a decision rule that prioritizes the probability of acquiring elite players, which we construct from a novel Bayesian hierarchical Beta regression model, general managers' draft trade behavior appears rational rather than systematically flawed. More broadly, our findings highlight the critical role of carefully specifying a utility function when evaluating the quality of decisions. △ Less

Submitted 23 April, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

arXiv:2409.04889 [pdf, other]

Moving from Machine Learning to Statistics: the case of Expected Points in American football

Authors: Ryan S. Brill, Ryan Yee, Sameer K. Deshpande, Abraham J. Wyner

Abstract: Expected points is a value function fundamental to player evaluation and strategic in-game decision-making across sports analytics, particularly in American football. To estimate expected points, football analysts use machine learning tools, which are not equipped to handle certain challenges. They suffer from selection bias, display counter-intuitive artifacts of overfitting, do not quantify unce… ▽ More Expected points is a value function fundamental to player evaluation and strategic in-game decision-making across sports analytics, particularly in American football. To estimate expected points, football analysts use machine learning tools, which are not equipped to handle certain challenges. They suffer from selection bias, display counter-intuitive artifacts of overfitting, do not quantify uncertainty in point estimates, and do not account for the strong dependence structure of observational football data. These issues are not unique to American football or even sports analytics; they are general problems analysts encounter across various statistical applications, particularly when using machine learning in lieu of traditional statistical models. We explore these issues in detail and devise expected points models that account for them. We also introduce a widely applicable novel methodological approach to mitigate overfitting, using a catalytic prior to smooth our machine learning models. △ Less

Submitted 7 September, 2024; originally announced September 2024.

Comments: version 0; still have editing to do in the body

arXiv:2406.16171 [pdf, other]

Exploring the difficulty of estimating win probability: a simulation study

Authors: Ryan S. Brill, Ronald Yurko, Abraham J. Wyner

Abstract: Estimating win probability is one of the classic modeling tasks of sports analytics. Many widely used win probability estimators use machine learning to fit the relationship between a binary win/loss outcome variable and certain game-state variables. To illustrate just how difficult it is to accurately fit such a model from noisy and highly correlated observational data, in this paper we conduct a… ▽ More Estimating win probability is one of the classic modeling tasks of sports analytics. Many widely used win probability estimators use machine learning to fit the relationship between a binary win/loss outcome variable and certain game-state variables. To illustrate just how difficult it is to accurately fit such a model from noisy and highly correlated observational data, in this paper we conduct a simulation study. We create a simplified random walk version of football in which true win probability at each game-state is known, and we see how well a model recovers it. We find that the dependence structure of observational play-by-play data substantially inflates the bias and variance of estimators and lowers the effective sample size. Further, to achieve approximately valid marginal coverage, win probability confidence intervals need to be substantially wide. Concisely, these are high variance estimators subject to substantial uncertainty. Our findings are not unique to the particular application of estimating win probability; they are broadly applicable across sports analytics, as myriad other sports datasets are clustered into groups of observations that share the same outcome. △ Less

Submitted 2 March, 2025; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2311.03490 [pdf, other]

Analytics, have some humility: a statistical view of fourth-down decision making

Authors: Ryan S. Brill, Ronald Yurko, Abraham J. Wyner

Abstract: The standard mathematical approach to fourth-down decision making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linear… ▽ More The standard mathematical approach to fourth-down decision making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linearities from a finite dataset of just a few thousand games. Thus, it is imperative to knit uncertainty quantification into the fourth-down decision procedure; we do so using bootstrapping. We find that uncertainty in the estimated optimal fourth-down decision is far greater than that currently expressed by sports analysts in popular sports media. △ Less

Submitted 31 January, 2025; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2210.06724 [pdf, other]

doi 10.1515/jqas-2022-0116

A Bayesian analysis of the time through the order penalty in baseball

Authors: Ryan S. Brill, Sameer K. Deshpande, Abraham J. Wyner

Abstract: As a baseball game progresses, batters appear to perform better the more times they face a particular pitcher. The apparent drop-off in pitcher performance from one time through the order to the next, known as the Time Through the Order Penalty (TTOP), is often attributed to within-game batter learning. Although the TTOP has largely been accepted within baseball and influences many managers' in-ga… ▽ More As a baseball game progresses, batters appear to perform better the more times they face a particular pitcher. The apparent drop-off in pitcher performance from one time through the order to the next, known as the Time Through the Order Penalty (TTOP), is often attributed to within-game batter learning. Although the TTOP has largely been accepted within baseball and influences many managers' in-game decision making, we argue that existing approaches of estimating the size of the TTOP cannot disentangle continuous evolution in pitcher performance over the course of the game from discontinuities between successive times through the order. Using a Bayesian multinomial regression model, we find that, after adjusting for confounders like batter and pitcher quality, handedness, and home field advantage, there is little evidence of strong discontinuity in pitcher performance between times through the order. Our analysis suggests that the start of the third time through the order should not be viewed as a special cutoff point in deciding whether to pull a starting pitcher. △ Less

Submitted 31 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted to JQAS

arXiv:1812.05792 [pdf, other]

Making Sense of Random Forest Probabilities: a Kernel Perspective

Authors: Matthew A. Olson, Abraham J. Wyner

Abstract: A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound… ▽ More A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates. △ Less

Submitted 14 December, 2018; originally announced December 2018.

arXiv:1704.00823 [pdf, other]

doi 10.1515/jqas-2015-0027

A Hierarchical Bayesian Model of Pitch Framing

Authors: Sameer K. Deshpande, Abraham J. Wyner

Abstract: Since the advent of high-resolution pitch tracking data (PITCHf/x), many in the sabermetrics community have attempted to quantify a Major League Baseball catcher's ability to "frame" a pitch (i.e. increase the chance that a pitch is called as a strike). Especially in the last three years, there has been an explosion of interest in the "art of pitch framing" in the popular press as well as signs th… ▽ More Since the advent of high-resolution pitch tracking data (PITCHf/x), many in the sabermetrics community have attempted to quantify a Major League Baseball catcher's ability to "frame" a pitch (i.e. increase the chance that a pitch is called as a strike). Especially in the last three years, there has been an explosion of interest in the "art of pitch framing" in the popular press as well as signs that teams are considering framing when making roster decisions. We introduce a Bayesian hierarchical model to estimate each umpire's probability of calling a strike, adjusting for pitch participants, pitch location, and contextual information like the count. Using our model, we can estimate each catcher's effect on an umpire's chance of calling a strike.We are then able to translate these estimated effects into average runs saved across a season. We also introduce a new metric, analogous to Jensen, Shirley, and Wyner's Spatially Aggregate Fielding Evaluation metric, which provides a more honest assessment of the impact of framing. △ Less

Submitted 9 September, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Journal ref: Journal of Quantitative Analysis in Sports. 13(3): 95--112. (2017)

arXiv:1504.07676 [pdf, other]

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

Authors: Abraham J. Wyner, Matthew Olson, Justin Bleich, David Mease

Abstract: There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explana… ▽ More There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates what we denote as a "spikey-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping. △ Less

Submitted 29 April, 2017; v1 submitted 28 April, 2015; originally announced April 2015.

Comments: 40 pages, 11 figures, 2 algorithms

arXiv:1105.2433 [pdf, ps, other]

doi 10.1214/10-AOAS398REJ

Rejoinder: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?

Authors: Blakeley B. McShane, Abraham J. Wyner

Abstract: Rejoinder to "A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?" by B.B. McShane and A.J. Wyner [arXiv:1104.4002] Rejoinder to "A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?" by B.B. McShane and A.J. Wyner [arXiv:1104.4002] △ Less

Submitted 12 May, 2011; originally announced May 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS398REJ the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS398REJ

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 1, 99-123

arXiv:1104.4002 [pdf, ps, other]

doi 10.1214/10-AOAS398

A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?

Authors: Blakeley B. McShane, Abraham J. Wyner

Abstract: Predicting historic temperatures based on tree rings, ice cores, and other natural proxies is a difficult endeavor. The relationship between proxies and temperature is weak and the number of proxies is far larger than the number of target data points. Furthermore, the data contain complex spatial and temporal dependence structures which are not easily captured with simple models. In this paper, we… ▽ More Predicting historic temperatures based on tree rings, ice cores, and other natural proxies is a difficult endeavor. The relationship between proxies and temperature is weak and the number of proxies is far larger than the number of target data points. Furthermore, the data contain complex spatial and temporal dependence structures which are not easily captured with simple models. In this paper, we assess the reliability of such reconstructions and their statistical significance against various null models. We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago. We propose our own reconstruction of Northern Hemisphere average annual land temperature over the last millennium, assess its reliability, and compare it to those from the climate science literature. Our model provides a similar reconstruction but has much wider standard errors, reflecting the weak signal and large uncertainty encountered in this setting. △ Less

Submitted 20 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS398 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS398

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 1, 5-44

arXiv:0902.1360 [pdf, ps, other]

doi 10.1214/09-BA424

Hierarchical Bayesian Modeling of Hitting Performance in Baseball

Authors: Shane T. Jensen, Blake McShane, Abraham J. Wyner

Abstract: We have developed a sophisticated statistical model for predicting the hitting performance of Major League baseball players. The Bayesian paradigm provides a principled method for balancing past performance with crucial covariates, such as player age and position. We share information across time and across players by using mixture distributions to control shrinkage for improved accuracy. We com… ▽ More We have developed a sophisticated statistical model for predicting the hitting performance of Major League baseball players. The Bayesian paradigm provides a principled method for balancing past performance with crucial covariates, such as player age and position. We share information across time and across players by using mixture distributions to control shrinkage for improved accuracy. We compare the performance of our model to current sabermetric methods on a held-out season (2006), and discuss both successes and limitations. △ Less

Submitted 8 February, 2009; originally announced February 2009.

Journal ref: Bayesian Analysis 2009, Vol. 4, No. 4, 631-652

arXiv:0804.2757 [pdf, ps, other]

doi 10.1214/07-STS242B

Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

Authors: Andreas Buja, David Mease, Abraham J. Wyner

Abstract: The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology tha… ▽ More The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752] △ Less

Submitted 17 April, 2008; originally announced April 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS242B

Journal ref: Statistical Science 2007, Vol. 22, No. 4, 506-512

arXiv:0802.4317 [pdf, ps, other]

doi 10.1214/08-AOAS228

Bayesball: A Bayesian hierarchical model for evaluating fielding in major league baseball

Authors: Shane T. Jensen, Kenneth E. Shirley, Abraham J. Wyner

Abstract: The use of statistical modeling in baseball has received substantial attention recently in both the media and academic community. We focus on a relatively under-explored topic: the use of statistical models for the analysis of fielding based on high-resolution data consisting of on-field location of batted balls. We combine spatial modeling with a hierarchical Bayesian structure in order to eval… ▽ More The use of statistical modeling in baseball has received substantial attention recently in both the media and academic community. We focus on a relatively under-explored topic: the use of statistical models for the analysis of fielding based on high-resolution data consisting of on-field location of batted balls. We combine spatial modeling with a hierarchical Bayesian structure in order to evaluate the performance of individual fielders while sharing information between fielders at each position. We present results across four seasons of MLB data (2002--2005) and compare our approach to other fielding evaluation procedures. △ Less

Submitted 14 August, 2009; v1 submitted 28 February, 2008; originally announced February 2008.

Comments: Published in at http://dx.doi.org/10.1214/08-AOAS228 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS228

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 2, 491-520

Showing 1–14 of 14 results for author: Wyner, A J