Addressing Detection Limits with Semiparametric Cumulative Probability Models
Authors:
Yuqi Tian,
Chun Li,
Shengxin Tu,
Nathan T. James,
Frank E. Harrell,
Bryan E. Shepherd
Abstract:
Detection limits (DLs), where a variable is unable to be measured outside of a certain range, are common in research. Most approaches to handle DLs in the response variable implicitly make parametric assumptions on the distribution of data outside DLs. We propose a new approach to deal with DLs based on a widely used ordinal regression model, the cumulative probability model (CPM). The CPM is a ty…
▽ More
Detection limits (DLs), where a variable is unable to be measured outside of a certain range, are common in research. Most approaches to handle DLs in the response variable implicitly make parametric assumptions on the distribution of data outside DLs. We propose a new approach to deal with DLs based on a widely used ordinal regression model, the cumulative probability model (CPM). The CPM is a type of semiparametric linear transformation model. CPMs are rank-based and can handle mixed distributions of continuous and discrete outcome variables. These features are key for analyzing data with DLs because while observations inside DLs are typically continuous, those outside DLs are censored and generally put into discrete categories. With a single lower DL, the CPM assigns values below the DL as having the lowest rank. When there are multiple DLs, the CPM likelihood can be modified to appropriately distribute probability mass. We demonstrate the use of CPMs with simulations and two HIV data examples. The first example models a biomarker in which 15% of observations are below a DL. The second uses multi-cohort data to model viral load, where approximately 55% of observations are outside DLs which vary across sites and over time.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes
Authors:
Nathan T. James,
Frank E. Harrell Jr.,
Bryan E. Shepherd
Abstract:
Ordinal cumulative probability models (CPMs) -- also known as cumulative link models -- such as the proportional odds regression model are typically used for discrete ordered outcomes, but can accommodate both continuous and mixed discrete/continuous outcomes since these are also ordered. Recent papers describe ordinal CPMs in this setting using non-parametric maximum likelihood estimation. We for…
▽ More
Ordinal cumulative probability models (CPMs) -- also known as cumulative link models -- such as the proportional odds regression model are typically used for discrete ordered outcomes, but can accommodate both continuous and mixed discrete/continuous outcomes since these are also ordered. Recent papers describe ordinal CPMs in this setting using non-parametric maximum likelihood estimation. We formulate a Bayesian CPM for continuous or mixed outcome data. Bayesian CPMs inherit many of the benefits of frequentist CPMs and have advantages with regard to interpretation, flexibility, and exact inference (within simulation error) for parameters and functions of parameters. We explore characteristics of the Bayesian CPM through simulations and a case study using HIV biomarker data. In addition, we provide the package 'bayesCPM' which implements Bayesian CPM models using the R interface to the Stan probabilistic programing language. The Bayesian CPM for continuous outcomes can be implemented with only minor modifications to the prior specification and, despite some limitations, has generally good statistical performance with moderate or large sample sizes.
△ Less
Submitted 7 January, 2022; v1 submitted 30 January, 2021;
originally announced February 2021.