-
Adaptive, Personalized Diversity for Visual Discovery
Authors:
Choon Hui Teo,
Houssam Nassif,
Daniel Hill,
Sriram Srinavasan,
Mitchell Goodman,
Vijai Mohan,
SVN Vishwanathan
Abstract:
Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shopping experience. Here we explore extensions in the direction of adaptive personalization and item diversi…
▽ More
Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shopping experience. Here we explore extensions in the direction of adaptive personalization and item diversification within Stream, a new form of visual browsing and discovery by Amazon. Our system presents the user with a diverse set of interesting items while adapting to user interactions. Our solution consists of three components (1) a Bayesian regression model for scoring the relevance of items while leveraging uncertainty, (2) a submodular diversification framework that re-ranks the top scoring items based on category, and (3) personalized category preferences learned from the user's behavior. When tested on live traffic, our algorithms show a strong lift in click-through-rate and session duration.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
NFL Injuries Before and After the 2011 Collective Bargaining Agreement (CBA)
Authors:
Zachary O. Binney,
Kyle E. Hammond,
Mitchel Klein,
Michael Goodman,
A. Cecile J. W. Janssens
Abstract:
The National Football League's (NFL) 2011 collective bargaining agreement (CBA) with its players placed a number of contact and quantity limitations on practices and workouts. Some coaches and others have expressed a concern that this has led to poor conditioning and a subsequent increase in injuries. We sought to assess whether the 2011 CBA's practice restrictions affected the number of overall,…
▽ More
The National Football League's (NFL) 2011 collective bargaining agreement (CBA) with its players placed a number of contact and quantity limitations on practices and workouts. Some coaches and others have expressed a concern that this has led to poor conditioning and a subsequent increase in injuries. We sought to assess whether the 2011 CBA's practice restrictions affected the number of overall, conditioning-dependent, and/or non-conditioning-dependent injuries in the NFL or the number of games missed due to those injuries. The study population was player-seasons from 2007-2016. We included regular season, non-illness, non-head, game-loss injuries. Injuries were identified using a database from Football Outsiders. The primary outcomes were overall, conditioning-dependent and non-conditioning-dependent injury counts by season. We examined time trends in injury counts before (2007-2010) and after (2011-2016) the CBA using a Poisson interrupted time series model. The number of game-loss regular season, non-head, non-illness injuries grew from 701 in 2007 to 804 in 2016 (15% increase). The number of regular season weeks missed exhibited a similar increase. Conditioning-dependent injuries increased from 197 in 2007 to 271 in 2011 (38% rise), but were lower and remained relatively unchanged at 220-240 injuries per season thereafter. Non-conditioning injuries decreased by 37% in the first three years of the new CBA before returning to historic levels in 2014-2016. Poisson models for all, conditioning-dependent, and non-conditioning-dependent game-loss injury counts did not show statistically significant or meaningful detrimental changes associated with the CBA. We did not observe an increase in injuries following the 2011 CBA. Other concurrent injury-related rule and regulation changes limit specific causal inferences about the practice restrictions, however.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Variance Components Genetic Association Test for Zero-inflated Count Outcomes
Authors:
Matthew Goodman,
Lori Chibnik,
Tianxi Cai
Abstract:
Commonly in biomedical research, studies collect data in which an outcome measure contains informative excess zeros; for example when observing the burden of neuritic plaques in brain pathology studies, those who show none contribute to our understanding of neurodegenerative disease. The outcome may be characterized by a mixture distribution with one component being the `structural zero' and the o…
▽ More
Commonly in biomedical research, studies collect data in which an outcome measure contains informative excess zeros; for example when observing the burden of neuritic plaques in brain pathology studies, those who show none contribute to our understanding of neurodegenerative disease. The outcome may be characterized by a mixture distribution with one component being the `structural zero' and the other component being a Poisson distribution. We propose a novel variance components score test of genetic association between a set of genetic markers and a zero-inflated count outcome from a mixture distribution. This test shares advantageous properties with SNP-set tests which have been previously devised for standard continuous or binary outcomes, such as the Sequence Kernel Association Test (SKAT). In particular, our method has superior statistical power compared to competing methods, especially when there is correlation within the group of markers, and when the SNPs are associated with both the mixing proportion and the rate of the Poisson distribution. We apply the method to Alzheimer's data from the Rush University Religious Orders Study and Memory and Aging Project (ROSMAP), where as proof of principle we find highly significant associations with the APOE gene, in both the `structural zero' and `count' parameters, when applied to a zero-inflated neuritic plaques count outcome.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
High-dimensional cluster analysis with the Masked EM Algorithm
Authors:
Shabnam N. Kadir,
Dan F. M. Goodman,
Kenneth D. Harris
Abstract:
Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. In many applications, only a small subset of features provide information about the cluster membership of any one data point, how…
▽ More
Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. In many applications, only a small subset of features provide information about the cluster membership of any one data point, however this informative feature subset may not be the same for all data points. Here we introduce a `Masked EM' algorithm for fitting mixture of Gaussians models in such cases. We show that the algorithm performs close to optimally on simulated Gaussian data, and in an application of `spike sorting' of high channel-count neuronal recordings.
△ Less
Submitted 11 September, 2013;
originally announced September 2013.