Skip to main content

Showing 1–22 of 22 results for author: Shi, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.11775  [pdf, other

    stat.ML cs.CY cs.LG q-fin.RM

    Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

    Authors: Tianhe Zhang, Suhan Liu, Peng Shi

    Abstract: Fairness has emerged as a critical consideration in the landscape of machine learning algorithms, particularly as AI continues to transform decision-making across societal domains. To ensure that these algorithms are free from bias and do not discriminate against individuals based on sensitive attributes such as gender and race, the field of algorithmic bias has introduced various fairness concept… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2410.03619  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    Functional Singular Value Decomposition

    Authors: Jianbin Tan, Pixu Shi, Anru R. Zhang

    Abstract: Heterogeneous functional data commonly arise in time series and longitudinal studies. To uncover the statistical structures of such data, we propose Functional Singular Value Decomposition (FSVD), a unified framework encompassing various tasks for the analysis of functional data with potential heterogeneity. We establish the mathematical foundation of FSVD by proving its existence and providing it… ▽ More

    Submitted 16 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  3. arXiv:2409.13819  [pdf, other

    stat.ME

    Supervised low-rank approximation of high-dimensional multivariate functional data via tensor decomposition

    Authors: Mohammad Samsul Alam, Ana-Maria Staicu, Pixu Shi

    Abstract: Motivated by the challenges of analyzing high-dimensional ($p \gg n$) sequencing data from longitudinal microbiome studies, where samples are collected at multiple time points from each subject, we propose supervised functional tensor singular value decomposition (SupFTSVD), a novel dimensionality reduction method that leverages auxiliary information in the dimensionality reduction of high-dimensi… ▽ More

    Submitted 14 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  4. Estimating Conditional Average Treatment Effects with Heteroscedasticity by Model Averaging and Matching

    Authors: Pengfei Shi, Xinyu Zhang, Wei Zhong

    Abstract: We propose a model averaging approach, combined with a partition and matching method to estimate the conditional average treatment effects under heteroskedastic error settings. The proposed approach has asymptotic optimality and consistency of weights and estimator. Numerical studies show that our method has good finite-sample performances.

    Submitted 15 December, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Journal ref: Economics Letters, 2024, 238: 111679

  5. arXiv:2208.04119  [pdf, other

    stat.ML cond-mat.str-el cs.LG physics.class-ph

    Deep Machine Learning Reconstructing Lattice Topology with Strong Thermal Fluctuations

    Authors: Xiao-Han Wang, Pei Shi, Bin Xi, Jie Hu, Shi-Ju Ran

    Abstract: Applying artificial intelligence to scientific problems (namely AI for science) is currently under hot debate. However, the scientific problems differ much from the conventional ones with images, texts, and etc., where new challenges emerges with the unbalanced scientific data and complicated effects from the physical setups. In this work, we demonstrate the validity of the deep convolutional neur… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

    Comments: 5 pages, 4 figures

  6. arXiv:2201.10967  [pdf

    cs.LG math.NA stat.ML

    Physics-informed ConvNet: Learning Physical Field from a Shallow Neural Network

    Authors: Pengpeng Shi, Zhi Zeng, Tianshou Liang

    Abstract: Big-data-based artificial intelligence (AI) supports profound evolution in almost all of science and technology. However, modeling and forecasting multi-physical systems remain a challenge due to unavoidable data scarcity and noise. Improving the generalization ability of neural networks by "teaching" domain knowledge and developing a new generation of models combined with the physical laws have b… ▽ More

    Submitted 7 February, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

    MSC Class: 68T07; 65N99; 35Gxx;

  7. arXiv:2108.07940  [pdf, ps, other

    stat.ME

    Weak signal identification and inference in penalized likelihood models for categorical responses

    Authors: Yuexia Zhang, Peibei Shi, Zhongyi Zhu, Linbo Wang, Annie Qu

    Abstract: Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify… ▽ More

    Submitted 11 December, 2022; v1 submitted 17 August, 2021; originally announced August 2021.

    MSC Class: 62F99 ACM Class: G.3

  8. arXiv:2108.04201  [pdf, other

    stat.ME math.ST stat.AP

    Guaranteed Functional Tensor Singular Value Decomposition

    Authors: Rungang Han, Pixu Shi, Anru R. Zhang

    Abstract: This paper introduces the functional tensor singular value decomposition (FTSVD), a novel dimension reduction framework for tensors with one functional mode and several tabular modes. The problem is motivated by high-order longitudinal data analysis. Our model assumes the observed data to be a random realization of an approximate CP low-rank functional tensor measured on a discrete time grid. Inco… ▽ More

    Submitted 25 October, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Journal of the American Statistical Association, to appear

  9. arXiv:1910.05676  [pdf, ps, other

    stat.AP stat.ME

    Regression for Copula-linked Compound Distributions with Applications in Modeling Aggregate Insurance Claims

    Authors: Peng Shi, Zifeng Zhao

    Abstract: In actuarial research, a task of particular interest and importance is to predict the loss cost for individual risks so that informative decisions are made in various insurance operations such as underwriting, ratemaking, and capital management. The loss cost is typically viewed to follow a compound distribution where the summation of the severity variables is stopped by the frequency variable. A… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

  10. arXiv:1906.10407  [pdf

    cs.LG stat.ML

    Traffic Flow Combination Forecasting Method Based on Improved LSTM and ARIMA

    Authors: Boyi Liu, Xiangyan Tang, Jieren Cheng, Pengchao Shi

    Abstract: Traffic flow forecasting is hot spot research of intelligent traffic system construction. The existing traffic flow prediction methods have problems such as poor stability, high data requirements, or poor adaptability. In this paper, we define the traffic data time singularity ratio in the dropout module and propose a combination prediction method based on the improved long short-term memory neura… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  11. A new perspective from a Dirichlet model for forecasting outstanding liabilities of nonlife insurers

    Authors: Karthik Sriram, Peng Shi

    Abstract: Forecasting the outstanding claim liabilities to set adequate reserves is critical for a nonlife insurer's solvency. Chain-Ladder and Bornhuetter-Ferguson are two prominent actuarial approaches used for this task. The selection between the two approaches is often ad hoc due to different underlying assumptions. We introduce a Dirichlet model that provides a common statistical framework for the two… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

  12. arXiv:1903.05851  [pdf, ps, other

    stat.AP

    Implementation of Frequency-Severity Association in BMS Ratemaking

    Authors: Rosy Oh, Peng Shi, Jae Youn Ahn

    Abstract: A Bonus-Malus System (BMS) in insurance is a premium adjustment mechanism widely used in a posteriori ratemaking process to set the premium for the next contract period based on a policyholder's claim history. The current practice in BMS implementation relies on the assumption of independence between claim frequency and severity, despite the fact that a series of recent studies report evidence of… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

  13. arXiv:1901.04028  [pdf, other

    cs.LG stat.ML

    Sales Demand Forecast in E-commerce using a Long Short-Term Memory Neural Network Methodology

    Authors: Kasun Bandara, Peibei Shi, Christoph Bergmeir, Hansika Hewamalage, Quoc Tran, Brian Seaman

    Abstract: Generating accurate and reliable sales forecasts is crucial in the E-commerce business. The current state-of-the-art techniques are typically univariate methods, which produce forecasts considering only the historical sales data of a single product. However, in a situation where large quantities of related time series are available, conditioning the forecast of an individual time series on past be… ▽ More

    Submitted 11 August, 2019; v1 submitted 13 January, 2019; originally announced January 2019.

  14. arXiv:1811.11709  [pdf, other

    stat.ME math.ST stat.AP

    High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

    Authors: Pixu Shi, Yuchen Zhou, Anru R. Zhang

    Abstract: In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain criti… ▽ More

    Submitted 10 March, 2021; v1 submitted 28 November, 2018; originally announced November 2018.

  15. arXiv:1805.07301  [pdf, other

    stat.ME

    Enhanced Pricing and Management of Bundled Insurance Risks with Dependence-aware Prediction using Pair Copula Construction

    Authors: Peng Shi, Zifeng Zhao

    Abstract: We propose a dependence-aware predictive modeling framework for multivariate risks stemmed from an insurance contract with bundling features - an important type of policy increasingly offered by major insurance companies. The bundling feature naturally leads to longitudinal measurements of multiple insurance risks, and correct pricing and management of such risks is of fundamental interest to fina… ▽ More

    Submitted 15 October, 2023; v1 submitted 18 May, 2018; originally announced May 2018.

  16. arXiv:1805.03336  [pdf, other

    stat.ME

    Modeling Multivariate Time Series with Copula-linked Univariate D-vines

    Authors: Zifeng Zhao, Peng Shi, Zhengjun Zhang

    Abstract: This paper proposes a novel multivariate time series model named Copula-linked univariate D-vines (CuDvine), which enables the simultaneous copula-based modeling of both temporal and cross-sectional dependence for multivariate time series. To construct CuDvine, we first build a semiparametric univariate D-vine time series model (uDvine) based on a D-vine. The uDvine generalizes the existing first-… ▽ More

    Submitted 30 November, 2020; v1 submitted 8 May, 2018; originally announced May 2018.

  17. arXiv:1801.03238  [pdf, other

    stat.ME

    Generalized Linear Models with Linear Constraints for Microbiome Compositional Data

    Authors: Jiarui Lu, Pixu Shi, Hongzhe Li

    Abstract: Motivated by regression analysis for microbiome compositional data, this paper considers generalized linear regression analysis with compositional covariates, where a group of linear constraints on regression coefficients are imposed to account for the compositional nature of the data and to achieve subcompositional coherence. A penalized likelihood estimation procedure using a generalized acceler… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

  18. arXiv:1702.04808  [pdf, other

    stat.AP

    A Model for Paired-Multinomial Data and Its Application to Analysis of Data on a Taxonomic Tree

    Authors: Pixu Shi, Hongzhe Li

    Abstract: In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels specified by a taxonomic tree. This paper considers the problem of analyzing two repeated measurements of microbiome data from the same subjects. Such data are often collected to assess the change of microbial composition after certain treatment, or the difference in micr… ▽ More

    Submitted 15 February, 2017; originally announced February 2017.

  19. arXiv:1611.04638  [pdf, other

    stat.ME

    Weak Signal Identification and Inference in Penalized Model Selection

    Authors: Peibei Shi, Annie Qu

    Abstract: Weak signal identification and inference are very important in the area of penalized model selection, yet they are under-developed and not well-studied. Existing inference procedures for penalized estimators are mainly focused on strong signals. In this paper, we propose an identification procedure for weak signals in finite samples, and pro- vide a transition phase in-between noise and strong sig… ▽ More

    Submitted 14 November, 2016; originally announced November 2016.

  20. arXiv:1603.00974  [pdf, other

    stat.AP

    Regression Analysis for Microbiome Compositional Data

    Authors: Pixu Shi, Anru Zhang, Hongzhe Li

    Abstract: One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with… ▽ More

    Submitted 3 March, 2016; originally announced March 2016.

  21. arXiv:1401.7359  [pdf, other

    stat.AP

    Demand Modeling, Forecasting, and Counterfactuals, Part I

    Authors: Parag A. Pathak, Peng Shi

    Abstract: There are relatively few systematic comparisons of the ex ante counterfactual predictions from structural models to what occurs ex post. This paper uses a large-scale policy change in Boston in 2014 to investigate the performance of discrete choice models of demand compared to simpler alternatives. In 2013, Boston Public Schools (BPS) proposed alternative zone configurations in their school choice… ▽ More

    Submitted 14 January, 2015; v1 submitted 28 January, 2014; originally announced January 2014.

    Comments: Also available as NBER Working Paper No. 19859

  22. arXiv:1305.4896  [pdf, other

    stat.ME

    Methods to Calculate the Upper Bound of Gini Coefficient Based on Grouped Data and the Result for China

    Authors: Pixu Shi, Anru R. Zhang

    Abstract: Determining an upper bound, particularly the optimal upper bound of the Gini coefficient when dealing with grouped data without specified income brackets, remains an important and open question. In this paper, we introduce an efficient algorithm to calculate the exact optimal upper bound of the Gini coefficient with provable guarantees. To exemplify these methods, we also offer computed results fo… ▽ More

    Submitted 14 January, 2025; v1 submitted 21 May, 2013; originally announced May 2013.