-
MARS via LASSO
Authors:
Dohyeong Ki,
Billy Fang,
Adityanand Guntuboyina
Abstract:
Multivariate adaptive regression splines (MARS) is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural lasso variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear…
▽ More
Multivariate adaptive regression splines (MARS) is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural lasso variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. Our estimator can be computed via finite-dimensional convex optimization, although it is defined as a solution to an infinite-dimensional optimization problem. Under a few standard design assumptions, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We also show that our method is naturally connected to nonparametric estimation techniques based on smoothness constraints. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and compare it to the usual MARS method in various simulation and real data settings.
△ Less
Submitted 13 October, 2024; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Neural Network Compression Via Sparse Optimization
Authors:
Tianyi Chen,
Bo Ji,
Yixin Shi,
Tianyu Ding,
Biyi Fang,
Sheng Yi,
Xiao Tu
Abstract:
The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yi…
▽ More
The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively.
△ Less
Submitted 11 November, 2020; v1 submitted 9 November, 2020;
originally announced November 2020.
-
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications
Authors:
Zitao Chen,
Niranjhana Narayanan,
Bo Fang,
Guanpeng Li,
Karthik Pattabiraman,
Nathan DeBardeleben
Abstract:
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience techniques (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the ap…
▽ More
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience techniques (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application's resilience.
In this work, we present TensorFI, a high-level fault injection (FI) framework for TensorFlow-based applications. TensorFI is able to inject both hardware and software faults in general TensorFlow programs. TensorFI is a configurable FI tool that is flexible, easy to use, and portable. It can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., faults in particular operators). We use TensorFI to evaluate the resilience of 12 ML programs, including DNNs used in the autonomous vehicle domain. Our tool is publicly available at https://github.com/DependableSystemsLab/TensorFI.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Drug dissemination strategy with an SEIR-based SUC model
Authors:
Boyue Fang,
Yutong Feng
Abstract:
According to the features of drug addiction, this paper constructs an SEIR-based SUC model to describe and predict the spread of drug addiction. Predictions are that the number of drug addictions will continue to fluctuate with reduced amplitude and eventually stabilize. To seek the fountainhead of heroin, we identified the most likely origins of drugs in Philadelphia, PA, Cuyahoga and Hamilton, O…
▽ More
According to the features of drug addiction, this paper constructs an SEIR-based SUC model to describe and predict the spread of drug addiction. Predictions are that the number of drug addictions will continue to fluctuate with reduced amplitude and eventually stabilize. To seek the fountainhead of heroin, we identified the most likely origins of drugs in Philadelphia, PA, Cuyahoga and Hamilton, OH, Jefferson, KY, Kanawha, WV, and Bedford, VA. Based on the facts, advised concentration includes the spread of Oxycodone, Hydrocodone, Heroin, and Buprenorphine. In other words, drug transmission in the two states of Ohio and Pennsylvania require awareness. According to the propagation curve predicted by our model, the transfer of KY state is still in its early stage, while that of VA, WV is in the middle point, and OH, PA in its latter ones. As a result of this, the number of drug addictions in KY, OH, and VA is projected to increase in three years. For methodology, with the Principal component analysis technique, 22 variables in socio-economic data related to the continuous use of Opioid drugs was filtered, where the 'Relationship' Part deserves a highlight.
Based on them, by using the K-means algorithm, 464 counties were categorized into three baskets. To combat the opioid crisis, a specific action will discuss in the sensitivity analysis section. After modeling and analytics, innovation is required to control addicts and advocate anti-drug news campaigns. This part also verified the effectiveness of model when $d_1<0.2; r_1,r_2,r_3<0.3; 15<β_1,β_2,β_3<25$. In other words, if such boundary exceeded, the number of drug addictions may rocket and peak in a short period.
△ Less
Submitted 19 January, 2020; v1 submitted 29 November, 2019;
originally announced December 2019.
-
Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance
Authors:
Mingxuan Jing,
Xiaojian Ma,
Wenbing Huang,
Fuchun Sun,
Chao Yang,
Bin Fang,
Huaping Liu
Abstract:
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD…
▽ More
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.
△ Less
Submitted 23 November, 2019; v1 submitted 16 November, 2019;
originally announced November 2019.
-
HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking
Authors:
Shen Yan,
Biyi Fang,
Faen Zhang,
Yu Zheng,
Xiao Zeng,
Hui Xu,
Mi Zhang
Abstract:
The use of automatic methods, often referred to as Neural Architecture Search (NAS), in designing neural network architectures has recently drawn considerable attention. In this work, we present an efficient NAS approach, named HM- NAS, that generalizes existing weight sharing based NAS approaches. Existing weight sharing based NAS approaches still adopt hand-designed heuristics to generate archit…
▽ More
The use of automatic methods, often referred to as Neural Architecture Search (NAS), in designing neural network architectures has recently drawn considerable attention. In this work, we present an efficient NAS approach, named HM- NAS, that generalizes existing weight sharing based NAS approaches. Existing weight sharing based NAS approaches still adopt hand-designed heuristics to generate architecture candidates. As a consequence, the space of architecture candidates is constrained in a subset of all possible architectures, making the architecture search results sub-optimal. HM-NAS addresses this limitation via two innovations. First, HM-NAS incorporates a multi-level architecture encoding scheme to enable searching for more flexible network architectures. Second, it discards the hand-designed heuristics and incorporates a hierarchical masking scheme that automatically learns and determines the optimal architecture. Compared to state-of-the-art weight sharing based approaches, HM-NAS is able to achieve better architecture search performance and competitive model evaluation accuracy. Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover.
△ Less
Submitted 7 September, 2019; v1 submitted 31 August, 2019;
originally announced September 2019.
-
Achieving Fairness in Determining Medicaid Eligibility through Fairgroup Construction
Authors:
Boli Fang,
Miao Jiang,
Jerry Shen
Abstract:
Effective complements to human judgment, artificial intelligence techniques have started to aid human decisions in complicated social problems across the world. In the context of United States for instance, automated ML/DL classification models offer complements to human decisions in determining Medicaid eligibility. However, given the limitations in ML/DL model design, these algorithms may fail t…
▽ More
Effective complements to human judgment, artificial intelligence techniques have started to aid human decisions in complicated social problems across the world. In the context of United States for instance, automated ML/DL classification models offer complements to human decisions in determining Medicaid eligibility. However, given the limitations in ML/DL model design, these algorithms may fail to leverage various factors for decision making, resulting in improper decisions that allocate resources to individuals who may not be in the most need. In view of such an issue, we propose in this paper the method of \textit{fairgroup construction}, based on the legal doctrine of \textit{disparate impact}, to improve the fairness of regressive classifiers. Experiments on American Community Survey dataset demonstrate that our method could be easily adapted to a variety of regressive classification models to boost their fairness in deciding Medicaid Eligibility, while maintaining high levels of classification accuracy.
△ Less
Submitted 31 May, 2019;
originally announced June 2019.
-
Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network
Authors:
Biyi Fang,
Diego Klabjan
Abstract:
Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not…
▽ More
Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. We first introduce regret with rolling window, a new performance metric for online streaming learning, which measures the performance of an algorithm on every fixed number of contiguous samples. At the same time, we propose a family of algorithms based on gradient descent with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties of the algorithms. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide the first analysis of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with ReLU activation. In this case we establish that if initial weights are close to a stationary point, the same square root regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms.
△ Less
Submitted 25 November, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation
Authors:
Billy Fang,
Adityanand Guntuboyina,
Bodhisattva Sen
Abstract:
We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total…
▽ More
We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. We also prove nearly matching minimax lower bounds. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.
△ Less
Submitted 9 June, 2020; v1 submitted 4 March, 2019;
originally announced March 2019.
-
Enhancing Stock Market Prediction with Extended Coupled Hidden Markov Model over Multi-Sourced Data
Authors:
Xi Zhang,
Yixuan Li,
Senzhang Wang,
Binxing Fang,
Philip S. Yu
Abstract:
Traditional stock market prediction methods commonly only utilize the historical trading data, ignoring the fact that stock market fluctuations can be impacted by various other information sources such as stock related events. Although some recent works propose event-driven prediction approaches by considering the event data, how to leverage the joint impacts of multiple data sources still remains…
▽ More
Traditional stock market prediction methods commonly only utilize the historical trading data, ignoring the fact that stock market fluctuations can be impacted by various other information sources such as stock related events. Although some recent works propose event-driven prediction approaches by considering the event data, how to leverage the joint impacts of multiple data sources still remains an open research problem. In this work, we study how to explore multiple data sources to improve the performance of the stock prediction. We introduce an Extended Coupled Hidden Markov Model incorporating the news events with the historical trading data. To address the data sparsity issue of news events for each single stock, we further study the fluctuation correlations between the stocks and incorporate the correlations into the model to facilitate the prediction task. Evaluations on China A-share market data in 2016 show the superior performance of our model against previous methods.
△ Less
Submitted 2 September, 2018;
originally announced September 2018.
-
Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data
Authors:
Yanru Qu,
Bohui Fang,
Weinan Zhang,
Ruiming Tang,
Minzhe Niu,
Huifeng Guo,
Yong Yu,
Xiuqiang He
Abstract:
User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on featur…
▽ More
User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on feature engineering and shallow modeling. Recently, deep neural networks have attracted research attention on such a problem for their high capacity and end-to-end training scheme. In this paper, we study user response prediction in the scenario of click prediction. We first analyze a coupled gradient issue in latent vector-based models and propose kernel product to learn field-aware feature interactions. Then we discuss an insensitive gradient issue in DNN-based models and propose Product-based Neural Network (PNN) which adopts a feature extractor to explore feature interactions. Generalizing the kernel product to a net-in-net architecture, we further propose Product-network In Network (PIN) which can generalize previous models. Extensive experiments on 4 industrial datasets and 1 contest dataset demonstrate that our models consistently outperform 8 baselines on both AUC and log loss. Besides, PIN makes great CTR improvement (relatively 34.67%) in online A/B test.
△ Less
Submitted 1 July, 2018;
originally announced July 2018.
-
A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations
Authors:
Biyi Fang,
Diego Klabjan
Abstract:
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are dist…
▽ More
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are distributed, but not both. We propose a general stochastic algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm.
△ Less
Submitted 8 December, 2019; v1 submitted 29 March, 2018;
originally announced March 2018.