-
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Authors:
Yingqian Cui,
Pengfei He,
Xianfeng Tang,
Qi He,
Chen Luo,
Jiliang Tang,
Yue Xing
Abstract:
Few-shot Chain-of-Thought (CoT) prompting has demonstrated strong performance in improving the reasoning capabilities of large language models (LLMs). While theoretical investigations have been conducted to understand CoT, the underlying transformer used in these studies isolates the CoT reasoning process into separated in-context learning steps (Stepwise ICL). In this work, we theoretically show…
▽ More
Few-shot Chain-of-Thought (CoT) prompting has demonstrated strong performance in improving the reasoning capabilities of large language models (LLMs). While theoretical investigations have been conducted to understand CoT, the underlying transformer used in these studies isolates the CoT reasoning process into separated in-context learning steps (Stepwise ICL). In this work, we theoretically show that, compared to Stepwise ICL, the transformer gains better error correction ability and more accurate predictions if the reasoning from earlier steps (Coherent CoT) is integrated. Given that this coherent reasoning changes the behavior of the transformer, we further investigate the sensitivity of the transformer with Coherent CoT when the demonstration examples are corrupted at the inference stage. Our theoretical results indicate that the transformer is more sensitive to errors in intermediate reasoning steps than the final outcome. Building upon this observation, we propose an improvement on CoT by incorporating both correct and incorrect reasoning paths in the demonstration. Our experiments validate the effectiveness of the proposed approach.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Exploring Scaling Laws for Local SGD in Large Language Model Training
Authors:
Qiaozhi He,
Xiaomin Zhuang,
Zhihua Wu
Abstract:
This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local…
▽ More
This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local SGD in various practical scenarios, including multi-cluster setups and edge computing environments. Our findings elucidate the necessary conditions for effective multi-cluster LLM training and examine the potential and limitations of leveraging edge computing resources in the LLM training process. This demonstrates its viability as an alternative to single large-cluster training.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Nonlinear Regression Analysis
Authors:
Hsin-Hsiung Huang,
Qing He
Abstract:
Nonlinear regression analysis is a popular and important tool for scientists and engineers. In this article, we introduce theories and methods of nonlinear regression and its statistical inferences using the frequentist and Bayesian statistical modeling and computation. Least squares with the Gauss-Newton method is the most widely used approach to parameters estimation. Under the assumption of nor…
▽ More
Nonlinear regression analysis is a popular and important tool for scientists and engineers. In this article, we introduce theories and methods of nonlinear regression and its statistical inferences using the frequentist and Bayesian statistical modeling and computation. Least squares with the Gauss-Newton method is the most widely used approach to parameters estimation. Under the assumption of normally distributed errors, maximum likelihood estimation is equivalent to least squares estimation. The Wald confidence regions for parameters in a nonlinear regression model are affected by the curvatures in the mean function. Furthermore, we introduce the Newton-Raphson method and the generalized least squares method to deal with variance heterogeneity. Examples of simulation data analysis are provided to illustrate important properties of confidence regions and the statistical inferences using the nonlinear least squares estimation and Bayesian inference.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
A Framework of Zero-Inflated Bayesian Negative Binomial Regression Models For Spatiotemporal Data
Authors:
Qing He,
Hsin-Hsiung Huang
Abstract:
Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process pr…
▽ More
Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process priors, which have both the simplicity and flexibility in modeling nonlinear relationships through a covariance function. To conquer the computation bottleneck that GPs may suffer when the sample size is large, we adopt the nearest-neighbor GP approach that approximates the covariance matrix using local experts. For the simulation study, we adopt multiple settings with varying sizes of spatial locations to evaluate the performance of the proposed model such as spatial and temporal random effects estimation and compare the result to other methods. We also apply the proposed model to the COVID-19 death counts in the state of Florida, USA from 3/25/2020 through 7/29/2020 to examine relationships between social vulnerability and COVID-19 deaths.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Generalizing the intention-to-treat effect of an active control against placebo from historical placebo-controlled trials to an active-controlled trial: A case study of the efficacy of daily oral TDF/FTC in the HPTN 084 study
Authors:
Qijia He,
Fei Gao,
Oliver Dukes,
Sinead Delany-Moretlwe,
Bo Zhang
Abstract:
In many clinical settings, an active-controlled trial design (e.g., a non-inferiority or superiority design) is often used to compare an experimental medicine to an active control (e.g., an FDA-approved, standard therapy). One prominent example is a recent phase 3 efficacy trial, HIV Prevention Trials Network Study 084 (HPTN 084), comparing long-acting cabotegravir, a new HIV pre-exposure prophyla…
▽ More
In many clinical settings, an active-controlled trial design (e.g., a non-inferiority or superiority design) is often used to compare an experimental medicine to an active control (e.g., an FDA-approved, standard therapy). One prominent example is a recent phase 3 efficacy trial, HIV Prevention Trials Network Study 084 (HPTN 084), comparing long-acting cabotegravir, a new HIV pre-exposure prophylaxis (PrEP) agent, to the FDA-approved daily oral tenofovir disoproxil fumarate plus emtricitabine (TDF/FTC) in a population of heterosexual women in 7 African countries. One key complication of interpreting study results in an active-controlled trial like HPTN 084 is that the placebo arm is not present and the efficacy of the active control (and hence the experimental drug) compared to the placebo can only be inferred by leveraging other data sources. \bz{In this article, we study statistical inference for the intention-to-treat (ITT) effect of the active control using relevant historical placebo-controlled trials data under the potential outcomes (PO) framework}. We highlight the role of adherence and unmeasured confounding, discuss in detail identification assumptions and two modes of inference (point versus partial identification), propose estimators under identification assumptions permitting point identification, and lay out sensitivity analyses needed to relax identification assumptions. We applied our framework to estimating the intention-to-treat effect of daily oral TDF/FTC versus placebo in HPTN 084 using data from an earlier Phase 3, placebo-controlled trial of daily oral TDF/FTC (Partners PrEP).
△ Less
Submitted 29 December, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Caught in the Crossfire: Fears of Chinese-American Scientists
Authors:
Yu Xie,
Xihong Lin,
Ju Li,
Qian He,
Junming Huang
Abstract:
The US leadership in science and technology has greatly benefitted from immigrants from other countries, most notably from China in the recent decades. However, feeling the pressure of potential federal investigation since the 2018 launch of the China Initiative under the Trump administration, Chinese-origin scientists in the US now face higher incentives to leave the US and lower incentives to ap…
▽ More
The US leadership in science and technology has greatly benefitted from immigrants from other countries, most notably from China in the recent decades. However, feeling the pressure of potential federal investigation since the 2018 launch of the China Initiative under the Trump administration, Chinese-origin scientists in the US now face higher incentives to leave the US and lower incentives to apply for federal grants. Analyzing data pertaining to institutional affiliations of more than 2.3 million scientific papers, we find a steady increase in the return migration of Chinese-origin scientists from the US back to China. We also conducted a survey of Chinese-origin scientists employed by US universities in tenure or tenure-track positions (n=1300), with results revealing general feelings of fear and anxiety that lead them to consider leaving the US and/or stop applying for federal grants.
△ Less
Submitted 23 September, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Statistical computation methods for microbiome compositional data network inference
Authors:
Liang Chen,
Qiuyan He,
Hui Wan,
Shun He,
Minghua Deng
Abstract:
Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common an…
▽ More
Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common and essential approach toward this objective involves the inference of microbiome interaction networks. Although network inference methods in other fields have been studied before, applying these methods to estimate microbiome associations based on compositional data will not yield valid results. On the one hand, features of microbiome data such as compositionality, sparsity and high-dimensionality challenge the data normalization and the design of computational methods. On the other hand, several issues like microbial community heterogeneity, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this paper, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various assumptions and research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks and differential networks. Their scope of applications, advantages and limitations are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has captured all the aspects of interest to date. In addition, we discuss the challenges now confronting current microbial associations study and future prospects. Finally, we highlight that the research in microbial network inference requires the joint promotion of statistical computation methods and experimental techniques.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
External Correlates of Adult Digital Problem-Solving Behavior: Log Data Analysis of a Large-Scale Assessment
Authors:
Susu Zhang,
Xueying Tang,
Qiwei He,
Jingchen Liu,
Zhiliang Ying
Abstract:
Using the action sequence data (i.e., log data) from the problem-solving in technology-rich environments assessment on the 2012 Programme for the International Assessment of Adult Competencies survey, the current study examines the associations between adult digital problem-solving behavior and several demographic and cognitive variables. Action sequence features extracted using multidimensional s…
▽ More
Using the action sequence data (i.e., log data) from the problem-solving in technology-rich environments assessment on the 2012 Programme for the International Assessment of Adult Competencies survey, the current study examines the associations between adult digital problem-solving behavior and several demographic and cognitive variables. Action sequence features extracted using multidimensional scaling (Tang, Wang, He, Liu, & Ying, 2019) and sequence-to-sequence autoencoders (Tang, Wang, Liu, & Ying, 2019) were used to predict test-taker external characteristics. Features extracted from action sequences were consistently found to contain more information on demographic and cognitive characteristics than final scores. Partial least squares analyses further revealed systematic associations between behavioral patterns and demographic/cognitive characteristics.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
A Survey on Knowledge Graph-Based Recommender Systems
Authors:
Qingyu Guo,
Fuzhen Zhuang,
Chuan Qin,
Hengshu Zhu,
Xing Xie,
Hui Xiong,
Qing He
Abstract:
To solve the information explosion problem and enhance user experience in various online applications, recommender systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years, generating recommendations with t…
▽ More
To solve the information explosion problem and enhance user experience in various online applications, recommender systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years, generating recommendations with the knowledge graph as side information has attracted considerable interest. Such an approach can not only alleviate the abovementioned issues for a more accurate recommendation, but also provide explanations for recommended items. In this paper, we conduct a systematical survey of knowledge graph-based recommender systems. We collect recently published papers in this field and summarize them from two perspectives. On the one hand, we investigate the proposed algorithms by focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation. On the other hand, we introduce datasets used in these works. Finally, we propose several potential research directions in this field.
△ Less
Submitted 27 February, 2020;
originally announced March 2020.
-
Physics-Informed Neural Networks for Multiphysics Data Assimilation with Application to Subsurface Transport
Authors:
QiZhi He,
David Brajas-Solano,
Guzel Tartakovsky,
Alexandre M. Tartakovsky
Abstract:
Data assimilation for parameter and state estimation in subsurface transport problems remains a significant challenge due to the sparsity of measurements, the heterogeneity of porous media, and the high computational cost of forward numerical models. We present a physics-informed deep neural networks (DNNs) machine learning method for estimating space-dependent hydraulic conductivity, hydraulic he…
▽ More
Data assimilation for parameter and state estimation in subsurface transport problems remains a significant challenge due to the sparsity of measurements, the heterogeneity of porous media, and the high computational cost of forward numerical models. We present a physics-informed deep neural networks (DNNs) machine learning method for estimating space-dependent hydraulic conductivity, hydraulic head, and concentration fields from sparse measurements. In this approach, we employ individual DNNs to approximate the unknown parameters (e.g., hydraulic conductivity) and states (e.g., hydraulic head and concentration) of a physical system, and jointly train these DNNs by minimizing the loss function that consists of the governing equations residuals in addition to the error with respect to measurement data. We apply this approach to assimilate conductivity, hydraulic head, and concentration measurements for joint inversion of the conductivity, hydraulic head, and concentration fields in a steady-state advection--dispersion problem. We study the accuracy of the physics-informed DNN approach with respect to data size, number of variables (conductivity and head versus conductivity, head, and concentration), DNNs size, and DNN initialization during training. We demonstrate that the physics-informed DNNs are significantly more accurate than standard data-driven DNNs when the training set consists of sparse data. We also show that the accuracy of parameter estimation increases as additional variables are inverted jointly.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Transfer Learning Toolkit: Primers and Benchmarks
Authors:
Fuzhen Zhuang,
Keyu Duan,
Tongjia Guo,
Yongchun Zhu,
Dongbo Xi,
Zhiyuan Qi,
Qing He
Abstract:
The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current sta…
▽ More
The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current state of this toolkit is described and the necessary environment setting and usage are introduced.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
A Comprehensive Survey on Transfer Learning
Authors:
Fuzhen Zhuang,
Zhiyuan Qi,
Keyu Duan,
Dongbo Xi,
Yongchun Zhu,
Hengshu Zhu,
Hui Xiong,
Qing He
Abstract:
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learn…
▽ More
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey paper reviews more than forty representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.
△ Less
Submitted 23 June, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Efficient and Adaptive Kernelization for Nonlinear Max-margin Multi-view Learning
Authors:
Changying Du,
Jia He,
Changde Du,
Fuzhen Zhuang,
Qing He,
Guoping Long
Abstract:
Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small tr…
▽ More
Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small training data. This paper presents an adaptive kernel nonlinear max-margin multi-view learning model under the Bayesian framework. Specifically, we regularize the posterior of an efficient multi-view latent variable model by explicitly mapping the latent representations extracted from multiple data views to a random Fourier feature space where max-margin classification constraints are imposed. Assuming these random features are drawn from Dirichlet process Gaussian mixtures, we can adaptively learn shift-invariant kernels from data according to Bochners theorem. For inference, we employ the data augmentation idea for hinge loss, and design an efficient gradient-based MCMC sampler in the augmented space. Having no need to compute the Gram matrix, our algorithm scales linearly with the size of training set. Extensive experiments on real-world datasets demonstrate that our method has superior performance.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Learning beyond Predefined Label Space via Bayesian Nonparametric Topic Modelling
Authors:
Changying Du,
Fuzhen Zhuang,
Jia He,
Qing He,
Guoping Long
Abstract:
In real world machine learning applications, testing data may contain some meaningful new categories that have not been seen in labeled training data. To simultaneously recognize new data categories and assign most appropriate category labels to the data actually from known categories, existing models assume the number of unknown new categories is pre-specified, though it is difficult to determine…
▽ More
In real world machine learning applications, testing data may contain some meaningful new categories that have not been seen in labeled training data. To simultaneously recognize new data categories and assign most appropriate category labels to the data actually from known categories, existing models assume the number of unknown new categories is pre-specified, though it is difficult to determine in advance. In this paper, we propose a Bayesian nonparametric topic model to automatically infer this number, based on the hierarchical Dirichlet process and the notion of latent Dirichlet allocation. Exact inference in our model is intractable, so we provide an efficient collapsed Gibbs sampling algorithm for approximate posterior inference. Extensive experiments on various text data sets show that: (a) compared with parametric approaches that use pre-specified true number of new categories, the proposed nonparametric approach can yield comparable performance; and (b) when the exact number of new categories is unavailable, i.e. the parametric approaches only have a rough idea about the new categories, our approach has evident performance advantages.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Deep Learning Detection of Inaccurate Smart Electricity Meters: A Case Study
Authors:
Ming Liu,
Dongpeng Liu,
Guangyu Sun,
Yi Zhao,
Duolin Wang,
Fangxing Liu,
Xiang Fang,
Qing He,
Dong Xu
Abstract:
Detecting inaccurate smart meters and targeting them for replacement can save significant resources. For this purpose, a novel deep-learning method was developed based on long short-term memory (LSTM) and a modified convolutional neural network (CNN) to predict electricity usage trajectories based on historical data. From the significant difference between the predicted trajectory and the observed…
▽ More
Detecting inaccurate smart meters and targeting them for replacement can save significant resources. For this purpose, a novel deep-learning method was developed based on long short-term memory (LSTM) and a modified convolutional neural network (CNN) to predict electricity usage trajectories based on historical data. From the significant difference between the predicted trajectory and the observed one, the meters that cannot measure electricity accurately are located. In a case study, a proof of principle was demonstrated in detecting inaccurate meters with high accuracy for practical usage to prevent unnecessary replacement and increase the service life span of smart meters.
△ Less
Submitted 7 August, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions
Authors:
Feiyang Pan,
Xiang Ao,
Pingzhong Tang,
Min Lu,
Dapeng Liu,
Lei Xiao,
Qing He
Abstract:
It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some popu…
▽ More
It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some population of users where its actual click rate is 0.15. In such cases, the probabilistic predictions have to be fixed before the system can be deployed.
In this paper, we first introduce a new evaluation metric named field-level calibration error that measures the bias in predictions over the sensitive input field that the decision-maker concerns. We show that existing post-hoc calibration methods have limited improvements in the new field-level metric and other non-calibration metrics such as the AUC score. To this end, we propose Neural Calibration, a simple yet powerful post-hoc calibration method that learns to calibrate by making full use of the field-aware information over the validation set. We present extensive experiments on five large-scale datasets. The results showed that Neural Calibration significantly improves against uncalibrated predictions in common metrics such as the negative log-likelihood, Brier score and AUC, as well as the proposed field-level calibration error.
△ Less
Submitted 27 January, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings
Authors:
Feiyang Pan,
Shuokai Li,
Xiang Ao,
Pingzhong Tang,
Qing He
Abstract:
Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem.
In this pap…
▽ More
Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem.
In this paper, we aim to improve CTR predictions during both the cold-start phase and the warm-up phase when a new ad is added to the candidate pool. We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs. The proposed method trains an embedding generator for new ad IDs by making use of previously learned ads through gradient-based meta-learning. In other words, our method learns how to learn better embeddings. When a new ad comes, the trained generator initializes the embedding of its ID by feeding its contents and attributes. Next, the generated embedding can speed up the model fitting during the warm-up phase when a few labeled examples are available, compared to the existing initialization methods.
Experimental results on three real-world datasets showed that Meta-Embedding can significantly improve both the cold-start and warm-up performances for six existing CTR prediction models, ranging from lightweight models such as Factorization Machines to complicated deep models such as PNN and DeepFM. All of the above apply to conversion rate (CVR) predictions as well.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Latent Feature Extraction for Process Data via Multidimensional Scaling
Authors:
Xueying Tang,
Zhi Wang,
Qiwei He,
Jingchen Liu,
Zhiliang Ying
Abstract:
Computer-based interactive items have become prevalent in recent educational assessments. In such items, the entire human-computer interactive process is recorded in a log file and is known as the response process. This paper aims at extracting useful information from response processes. In particular, we consider an exploratory latent variable analysis for process data. Latent variables are extra…
▽ More
Computer-based interactive items have become prevalent in recent educational assessments. In such items, the entire human-computer interactive process is recorded in a log file and is known as the response process. This paper aims at extracting useful information from response processes. In particular, we consider an exploratory latent variable analysis for process data. Latent variables are extracted through a multidimensional scaling framework and can be empirically proved to contain more information than classic binary responses in terms of out-of-sample prediction of many variables.
△ Less
Submitted 21 April, 2019;
originally announced April 2019.
-
High-Dimensional Linear Regression via Implicit Regularization
Authors:
Peng Zhao,
Yun Yang,
Qiao-Chu He
Abstract:
Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined through a discretized gradient dynamic system under overparameterization. We show that under suitable restricted isometry conditions, overparameterization leads to im…
▽ More
Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined through a discretized gradient dynamic system under overparameterization. We show that under suitable restricted isometry conditions, overparameterization leads to implicit regularization: if we directly apply gradient descent to the residual sum of squares with sufficiently small initial values, then under some proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution that improves over explicitly regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to explicit penalties, and can achieve the parametric root-n rate when the signal-to-noise ratio is sufficiently high. We also perform simulations to compare our methods with high dimensional linear regression with explicit regularization. Our results illustrate the advantages of using implicit regularization via gradient descent after overparameterization in sparse vector estimation.
△ Less
Submitted 12 February, 2022; v1 submitted 22 March, 2019;
originally announced March 2019.
-
Policy Optimization with Model-based Explorations
Authors:
Feiyang Pan,
Qingpeng Cai,
An-Xiang Zeng,
Chun-Xiang Pan,
Qing Da,
Hualin He,
Qing He,
Pingzhong Tang
Abstract:
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often s…
▽ More
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions' target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
Estimation for bivariate quantile varying coefficient model
Authors:
Linglong Kong,
Haoxu Shu,
Giseon Heo,
Qianchuan Chad He
Abstract:
We propose a bivariate quantile regression method for the bivariate varying coefficient model through a directional approach. The varying coefficients are approximated by the B-spline basis and an $L_{2}$ type penalty is imposed to achieve desired smoothness. We develop a multistage estimation procedure based the Propagation-Separation~(PS) approach to borrow information from nearby directions. Th…
▽ More
We propose a bivariate quantile regression method for the bivariate varying coefficient model through a directional approach. The varying coefficients are approximated by the B-spline basis and an $L_{2}$ type penalty is imposed to achieve desired smoothness. We develop a multistage estimation procedure based the Propagation-Separation~(PS) approach to borrow information from nearby directions. The PS method is capable of handling the computational complexity raised by simultaneously considering multiple directions to efficiently estimate varying coefficients while guaranteeing certain smoothness along directions. We reformulate the optimization problem and solve it by the Alternating Direction Method of Multipliers~(ADMM), which is implemented using R while the core is written in C to speed it up. Simulation studies are conducted to confirm the finite sample performance of our proposed method. A real data on Diffusion Tensor Imaging~(DTI) properties from a clinical study on neurodevelopment is analyzed.
△ Less
Submitted 8 November, 2015;
originally announced November 2015.
-
A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis
Authors:
Zhishi Wang,
Qiuling He,
Bret Larget,
Michael A. Newton
Abstract:
We develop a model-based methodology for integrating gene-set information with an experimentally-derived gene list. The methodology uses a previously reported sampling model, but takes advantage of natural constraints in the high-dimensional discrete parameter space in order to work from a more structured prior distribution than is currently available. We show how the natural constraints are expre…
▽ More
We develop a model-based methodology for integrating gene-set information with an experimentally-derived gene list. The methodology uses a previously reported sampling model, but takes advantage of natural constraints in the high-dimensional discrete parameter space in order to work from a more structured prior distribution than is currently available. We show how the natural constraints are expressed in terms of linear inequality constraints within a set of binary latent variables. Further, the currently available prior gives low probability to these constraints in complex systems, such as Gene Ontology (GO), thus reducing the efficiency of statistical inference. We develop two computational advances to enable posterior inference within the constrained parameter space: one using integer linear programming for optimization and one using a penalized Markov chain sampler. Numerical experiments demonstrate the utility of the new methodology for a multivariate integration of genomic data with GO or related information systems. Compared to available methods, the proposed multi-functional analyzer covers more reported genes without mis-covering nonreported genes, as demonstrated on genome-wide data from association studies of type 2 diabetes and from RNA interference studies of influenza.
△ Less
Submitted 1 June, 2015; v1 submitted 23 October, 2013;
originally announced October 2013.