TY - JOUR
T1 - Estimating non-overfitted convex production technologies
T2 - A stochastic machine learning approach
AU - Guillen, Maria D.
AU - Charles, Vincent
AU - Aparicio, Juan
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2025/5/16
Y1 - 2025/5/16
N2 - Overfitting is a classical statistical issue that occurs when a model fits a particular observed data sample too closely, potentially limiting its generalizability. While Data Envelopment Analysis (DEA) is a powerful non-parametric method for assessing the relative efficiency of decision-making units (DMUs), its reliance on the minimal extrapolation principle can lead to concerns about overfitting, particularly when the goal extends beyond evaluating the specific DMUs in the sample to making broader inferences. In this paper, we propose an adaptation of Stochastic Gradient Boosting to estimate production possibility sets that mitigate overfitting while satisfying shape constraints such as convexity and free disposability. Our approach is not intended to replace DEA but to complement it, offering an additional tool for scenarios where generalization is important. Through simulation experiments, we demonstrate that the proposed method performs well compared to DEA, especially in high-dimensional settings. Furthermore, the new machine learning-based technique is compared to the Corrected Concave Non-parametric Least Squares (C2NLS), showing competitive performance. We also illustrate how the usual efficiency measures in DEA can be implemented under our approach. Finally, we provide an empirical example based on data from the Program for International Student Assessment (PISA) to demonstrate the applicability of the new method.
AB - Overfitting is a classical statistical issue that occurs when a model fits a particular observed data sample too closely, potentially limiting its generalizability. While Data Envelopment Analysis (DEA) is a powerful non-parametric method for assessing the relative efficiency of decision-making units (DMUs), its reliance on the minimal extrapolation principle can lead to concerns about overfitting, particularly when the goal extends beyond evaluating the specific DMUs in the sample to making broader inferences. In this paper, we propose an adaptation of Stochastic Gradient Boosting to estimate production possibility sets that mitigate overfitting while satisfying shape constraints such as convexity and free disposability. Our approach is not intended to replace DEA but to complement it, offering an additional tool for scenarios where generalization is important. Through simulation experiments, we demonstrate that the proposed method performs well compared to DEA, especially in high-dimensional settings. Furthermore, the new machine learning-based technique is compared to the Corrected Concave Non-parametric Least Squares (C2NLS), showing competitive performance. We also illustrate how the usual efficiency measures in DEA can be implemented under our approach. Finally, we provide an empirical example based on data from the Program for International Student Assessment (PISA) to demonstrate the applicability of the new method.
KW - Data Envelopment Analysis
KW - Machine learning
KW - Stochastic gradient boosting
KW - Technical efficiency measurement
UR - http://www.scopus.com/inward/record.url?scp=85211200206&partnerID=8YFLogxK
U2 - 10.1016/j.ejor.2024.11.030
DO - 10.1016/j.ejor.2024.11.030
M3 - Article
AN - SCOPUS:85211200206
SN - 0377-2217
VL - 323
SP - 224
EP - 240
JO - European Journal of Operational Research
JF - European Journal of Operational Research
IS - 1
ER -