-
Predicting Distributions of Physical Activity Profiles in the NHANES Database Using a Partially Linear Fréchet Single Index Model
Authors:
Marcos Matabuena,
Aritra Ghosal,
Wendy Meiring,
Alexander Petersen
Abstract:
Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to…
▽ More
Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to analyze physical activity levels, measured by accelerometers, as response objects in a regression model. Unlike traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more nuanced and complete profile of individual energy expenditure across all ranges of monitoring intensity. A novel hybrid Fréchet regression model is proposed and applied to US population accelerometer data from National Health and Nutrition Examination Survey (NHANES) 2011-2014. The semi-parametric nature of the model allows for the inclusion of nonlinear effects for critical variables, such as age, which are biologically known to have subtle impacts on physical activity. Simultaneously, the inclusion of linear effects preserves interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained are valuable from a public health perspective and could lead to new strategies for optimizing physical activity interventions in specific American subpopulations.
△ Less
Submitted 9 March, 2025; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Fréchet single index models for object response regression
Authors:
Aritra Ghosal,
Wendy Meiring,
Alexander Petersen
Abstract:
With the increasing availability of non-Euclidean data objects, statisticians are faced with the task of developing appropriate statistical methods for their analysis. For regression models in which the predictors lie in $\mathbb{R}^p$ and the response variables are situated in a metric space, conditional Fréchet means can be used to define the Fréchet regression function. Global and local Fréchet…
▽ More
With the increasing availability of non-Euclidean data objects, statisticians are faced with the task of developing appropriate statistical methods for their analysis. For regression models in which the predictors lie in $\mathbb{R}^p$ and the response variables are situated in a metric space, conditional Fréchet means can be used to define the Fréchet regression function. Global and local Fréchet methods have recently been developed for modeling and estimating this regression function as extensions of multiple and local linear regression, respectively. This paper expands on these methodologies by proposing the Fréchet Single Index model, in which the Fréchet regression function is assumed to depend only on a scalar projection of the multivariate predictor. Estimation is performed by combining local Fréchet along with M-estimation to estimate both the coefficient vector and the underlying regression function, and these estimators are shown to be consistent. The method is illustrated by simulations for response objects on the surface of the unit sphere and through an analysis of human mortality data in which lifetable data are represented by distributions of age-of-death, viewed as elements of the Wasserstein space of distributions.
△ Less
Submitted 22 March, 2023; v1 submitted 13 August, 2021;
originally announced August 2021.
-
Multitask Learning for Citation Purpose Classification
Authors:
Alex Oesterling,
Angikar Ghosal,
Haoyang Yu,
Rui Xin,
Yasa Baig,
Lesia Semenova,
Cynthia Rudin
Abstract:
We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the…
▽ More
We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the limited amount of available training data in which the purposes of each citation have been hand-labeled, along with the subjectivity of these labels. Our entry in the competition is a multi-task model that combines multiple modules designed to handle the problem from different perspectives, including hand-generated linguistic features, TF-IDF features, and an LSTM-with-attention model. We also provide an ablation study and feature analysis whose insights could lead to future work.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.