-
Orthogonality conditions for convex regression
Authors:
Sheng Dai,
Timo Kuosmanen,
Xun Zhou
Abstract:
Econometric identification generally relies on orthogonality conditions, which usually state that the random error term is uncorrelated with the explanatory variables. In convex regression, the orthogonality conditions for identification are unknown. Applying Lagrangian duality theory, we establish the sample orthogonality conditions for convex regression, including additive and multiplicative for…
▽ More
Econometric identification generally relies on orthogonality conditions, which usually state that the random error term is uncorrelated with the explanatory variables. In convex regression, the orthogonality conditions for identification are unknown. Applying Lagrangian duality theory, we establish the sample orthogonality conditions for convex regression, including additive and multiplicative formulations of the regression model, with and without monotonicity and homogeneity constraints. We then propose a hybrid instrumental variable control function approach to mitigate the impact of potential endogeneity in convex regression. The superiority of the proposed approach is shown in a Monte Carlo study and examined in an empirical application to Chilean manufacturing data.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Economic growth of cities: Does resource allocation matter?
Authors:
Sheng Dai,
Timo Kuosmanen,
Zhiqiang Liao
Abstract:
We study how efficient resource reallocation across cities affects potential aggregate growth. Using optimal resource allocation models and data on 284 China's prefecture-level cities in the years 2003--2019, we quantitatively measure the cost of misallocation of resources. We show that average aggregate output gains from reallocating resources across nationwide cities to their efficient use are 1…
▽ More
We study how efficient resource reallocation across cities affects potential aggregate growth. Using optimal resource allocation models and data on 284 China's prefecture-level cities in the years 2003--2019, we quantitatively measure the cost of misallocation of resources. We show that average aggregate output gains from reallocating resources across nationwide cities to their efficient use are 1.349- and 1.287-fold in the perfect and imperfect allocation scenarios. We further provide evidence on the effects of administrative division adjustments and local allocation. This suggests that city-level adjustments can yield more aggregate gain and that the output gain from nationwide allocation is likely to be more substantial than that from local allocation. Policy implications are proposed to improve the resource allocation efficiency in China.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Overfitting Reduction in Convex Regression
Authors:
Zhiqiang Liao,
Sheng Dai,
Eunji Lim,
Timo Kuosmanen
Abstract:
Convex regression is a method for estimating the convex function from a data set. This method has played an important role in operations research, economics, machine learning, and many other areas. However, it has been empirically observed that convex regression produces inconsistent estimates of convex functions and extremely large subgradients near the boundary as the sample size increases. In t…
▽ More
Convex regression is a method for estimating the convex function from a data set. This method has played an important role in operations research, economics, machine learning, and many other areas. However, it has been empirically observed that convex regression produces inconsistent estimates of convex functions and extremely large subgradients near the boundary as the sample size increases. In this paper, we provide theoretical evidence of this overfitting behavior. To eliminate this behavior, we propose two new estimators by placing a bound on the subgradients of the convex function. We further show that our proposed estimators can reduce overfitting by proving that they converge to the underlying true convex function and that their subgradients converge to the gradient of the underlying function, both uniformly over the domain with probability one as the sample size is increasing to infinity. An application to Finnish electricity distribution firms confirms the superior performance of the proposed methods in predictive power over the existing methods.
△ Less
Submitted 16 October, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Modeling economies of scope in joint production: Convex regression of input distance function
Authors:
Timo Kuosmanen,
Sheng Dai
Abstract:
Modeling of joint production has proved a vexing problem. This paper develops a radial convex nonparametric least squares (CNLS) approach to estimate the input distance function with multiple outputs. We document the correct input distance function transformation and prove that the necessary orthogonality conditions can be satisfied in radial CNLS. A Monte Carlo study is performed to compare the f…
▽ More
Modeling of joint production has proved a vexing problem. This paper develops a radial convex nonparametric least squares (CNLS) approach to estimate the input distance function with multiple outputs. We document the correct input distance function transformation and prove that the necessary orthogonality conditions can be satisfied in radial CNLS. A Monte Carlo study is performed to compare the finite sample performance of radial CNLS and other deterministic and stochastic frontier approaches in terms of the input distance function estimation. We apply our novel approach to the Finnish electricity distribution network regulation and empirically confirm that the input isoquants become more curved. In addition, we introduce the weight restriction to radial CNLS to mitigate the potential overfitting and increase the out-of-sample performance in energy regulation.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Optimal resource allocation: Convex quantile regression approach
Authors:
Sheng Dai,
Natalia Kuosmanen,
Timo Kuosmanen,
Juuso Liesiö
Abstract:
Optimal allocation of resources across sub-units in the context of centralized decision-making systems such as bank branches or supermarket chains is a classical application of operations research and management science. In this paper, we develop quantile allocation models to examine how much the output and productivity could potentially increase if the resources were efficiently allocated between…
▽ More
Optimal allocation of resources across sub-units in the context of centralized decision-making systems such as bank branches or supermarket chains is a classical application of operations research and management science. In this paper, we develop quantile allocation models to examine how much the output and productivity could potentially increase if the resources were efficiently allocated between units. We increase robustness to random noise and heteroscedasticity by utilizing the local estimation of multiple production functions using convex quantile regression. The quantile allocation models then rely on the estimated shadow prices instead of detailed data of units and allow the entry and exit of units. Our empirical results on Finland's business sector reveal a large potential for productivity gains through better allocation, keeping the current technology and resources fixed.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Stochastic Nonparametric Estimation of the Density-Flow Curve
Authors:
Iaroslav Kriuchkov,
Timo Kuosmanen
Abstract:
Recent advances in operations research and machine learning have revived interest in solving complex real-world, large-size traffic control problems. With the increasing availability of road sensor data, deterministic parametric models have proved inadequate in describing the variability of real-world data, especially in congested area of the density-flow diagram. In this paper we estimate the sto…
▽ More
Recent advances in operations research and machine learning have revived interest in solving complex real-world, large-size traffic control problems. With the increasing availability of road sensor data, deterministic parametric models have proved inadequate in describing the variability of real-world data, especially in congested area of the density-flow diagram. In this paper we estimate the stochastic density-flow relation introducing a nonparametric method called convex quantile regression. The proposed method does not depend on any prior functional form assumptions, but thanks to the concavity constraints, the estimated function satisfies the theoretical properties of the density-flow curve. The second contribution is to develop the new convex quantile regression with bags (CQRb) approach to facilitate practical implementation of CQR to the real-world data. We illustrate the CQRb estimation process using the road sensor data from Finland in years 2016-2018. Our third contribution is to demonstrate the excellent out-of-sample predictive power of the proposed CQRb method in comparison to the standard parametric deterministic approach.
△ Less
Submitted 16 February, 2024; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Convex Support Vector Regression
Authors:
Zhiqiang Liao,
Sheng Dai,
Timo Kuosmanen
Abstract:
Nonparametric regression subject to convexity or concavity constraints is increasingly popular in economics, finance, operations research, machine learning, and statistics. However, the conventional convex regression based on the least squares loss function often suffers from overfitting and outliers. This paper proposes to address these two issues by introducing the convex support vector regressi…
▽ More
Nonparametric regression subject to convexity or concavity constraints is increasingly popular in economics, finance, operations research, machine learning, and statistics. However, the conventional convex regression based on the least squares loss function often suffers from overfitting and outliers. This paper proposes to address these two issues by introducing the convex support vector regression (CSVR) method, which effectively combines the key elements of convex regression and support vector regression. Numerical experiments demonstrate the performance of CSVR in prediction accuracy and robustness that compares favorably with other state-of-the-art methods.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Partial frontiers are not quantiles
Authors:
Sheng Dai,
Timo Kuosmanen,
Xun Zhou
Abstract:
Quantile regression and partial frontier are two distinct approaches to nonparametric quantile frontier estimation. In this article, we demonstrate that partial frontiers are not quantiles. Both convex and nonconvex technologies are considered. To this end, we propose convexified order-$α$ as an alternative to convex quantile regression (CQR) and convex expectile regression (CER), and two new nonc…
▽ More
Quantile regression and partial frontier are two distinct approaches to nonparametric quantile frontier estimation. In this article, we demonstrate that partial frontiers are not quantiles. Both convex and nonconvex technologies are considered. To this end, we propose convexified order-$α$ as an alternative to convex quantile regression (CQR) and convex expectile regression (CER), and two new nonconvex estimators: isotonic CQR and isotonic CER as alternatives to order-$α$. A Monte Carlo study shows that the partial frontier estimators perform relatively poorly and even can violate the quantile property, particularly at low quantiles. In addition, the simulation evidence shows that the indirect expectile approach to estimating quantiles generally outperforms the direct quantile estimations. We further find that the convex estimators outperform their nonconvex counterparts owing to their global shape constraints. An illustration of those estimators is provided using a real-world dataset of U.S. electric power plants.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Non-crossing convex quantile regression
Authors:
Sheng Dai,
Timo Kuosmanen,
Xun Zhou
Abstract:
Quantile crossing is a common phenomenon in shape constrained nonparametric quantile regression. A recent study by Wang et al. (2014) has proposed to address this problem by imposing non-crossing constraints to convex quantile regression. However, the non-crossing constraints may violate an intrinsic quantile property. This paper proposes a penalized convex quantile regression approach that can ci…
▽ More
Quantile crossing is a common phenomenon in shape constrained nonparametric quantile regression. A recent study by Wang et al. (2014) has proposed to address this problem by imposing non-crossing constraints to convex quantile regression. However, the non-crossing constraints may violate an intrinsic quantile property. This paper proposes a penalized convex quantile regression approach that can circumvent quantile crossing while better maintaining the quantile property. A Monte Carlo study demonstrates the superiority of the proposed penalized approach in addressing the quantile crossing problem.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
pyStoNED: A Python Package for Convex Regression and Frontier Estimation
Authors:
Sheng Dai,
Yu-Hsueh Fang,
Chia-Yen Lee,
Timo Kuosmanen
Abstract:
Shape-constrained nonparametric regression is a growing area in econometrics, statistics, operations research, machine learning and related fields. In the field of productivity and efficiency analysis, recent developments in the multivariate convex regression and related techniques such as convex quantile regression and convex expectile regression have bridged the long-standing gap between the con…
▽ More
Shape-constrained nonparametric regression is a growing area in econometrics, statistics, operations research, machine learning and related fields. In the field of productivity and efficiency analysis, recent developments in the multivariate convex regression and related techniques such as convex quantile regression and convex expectile regression have bridged the long-standing gap between the conventional deterministic-nonparametric and stochastic-parametric methods. Unfortunately, the heavy computational burden and the lack of powerful, reliable, and fully open access computational package has slowed down the diffusion of these advanced estimation techniques to the empirical practice. The purpose of the Python package pyStoNED is to address this challenge by providing a freely available and user-friendly tool for the multivariate convex regression, convex quantile regression, convex expectile regression, isotonic regression, stochastic nonparametric envelopment of data, and related methods. This paper presents a tutorial of the pyStoNED package and illustrates its application, focusing on the estimation of frontier cost and production functions.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Shape constrained kernel-weighted least squares: Application to production function estimation for Chilean manufacturing industries
Authors:
Daisuke Yagi,
Yining Chen,
Andrew L. Johnson,
Timo Kuosmanen
Abstract:
In this paper we examine a novel way of imposing shape constraints on a local polynomial kernel estimator. The proposed approach is referred to as Shape Constrained Kernel-weighted Least Squares (SCKLS). We prove uniform consistency of the SCKLS estimator with monotonicity and convexity/concavity constraints and establish its convergence rate. The competitiveness of SCKLS is shown in a comprehensi…
▽ More
In this paper we examine a novel way of imposing shape constraints on a local polynomial kernel estimator. The proposed approach is referred to as Shape Constrained Kernel-weighted Least Squares (SCKLS). We prove uniform consistency of the SCKLS estimator with monotonicity and convexity/concavity constraints and establish its convergence rate. The competitiveness of SCKLS is shown in a comprehensive simulation study. Finally, we analyze Chilean manufacturing data using the SCKLS estimator and quantify production in the plastics and wood industries. The results show that exporting firms have significantly higher productivity.
△ Less
Submitted 17 January, 2018; v1 submitted 20 April, 2016;
originally announced April 2016.
-
A Multi-objective Exploratory Procedure for Regression Model Selection
Authors:
Ankur Sinha,
Pekka Malo,
Timo Kuosmanen
Abstract:
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance o…
▽ More
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) that provides the user with an optimal set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, and explores the Pareto-optimal (best subset) models by preferring those models over the other which have less number of regression coefficients and better goodness of fit. The model exploration can be performed based on in-sample or generalization error minimization. The model selection is proposed to be performed in two steps. First, we generate the frontier of Pareto-optimal regression models by eliminating the dominated models without any user intervention. Second, a decision making process is executed which allows the user to choose the most preferred model using visualisations and simple metrics. The method has been evaluated on a recently published real dataset on Communities and Crime within United States.
△ Less
Submitted 13 July, 2016; v1 submitted 28 March, 2012;
originally announced March 2012.