-
Analyzing Classroom Interaction Data Using Prompt Engineering and Network Analysis
Authors:
Gwanghee Kim,
Ick Hoon Jin,
Minjeong Jeon
Abstract:
Classroom interactions play a vital role in developing critical thinking, collaborative problem-solving abilities, and enhanced learning outcomes. While analyzing these interactions is crucial for improving educational practices, the examination of classroom dialogues presents significant challenges due to the complexity and high-dimensionality of conversational data. This study presents an integr…
▽ More
Classroom interactions play a vital role in developing critical thinking, collaborative problem-solving abilities, and enhanced learning outcomes. While analyzing these interactions is crucial for improving educational practices, the examination of classroom dialogues presents significant challenges due to the complexity and high-dimensionality of conversational data. This study presents an integrated framework that combines prompt engineering with network analysis to investigate classroom interactions comprehensively. Our approach automates utterance classification through prompt engineering, enabling efficient and scalable dialogue analysis without requiring pre-labeled datasets. The classified interactions are subsequently transformed into network representations, facilitating the analysis of classroom dynamics as structured social networks. To uncover complex interaction patterns and how underlying interaction structures relate to student learning, we utilize network mediation analysis. In this approach, latent interaction structures, derived from the additive and multiplicative effects network (AMEN) model that places students within a latent social space, act as mediators. In particular, we investigate how the gender gap in mathematics performance may be mediated by students' classroom interaction structures.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
MoDeGPT: Modular Decomposition for Large Language Model Compression
Authors:
Chi-Heng Lin,
Shangqian Gao,
James Seale Smith,
Abhishek Patel,
Shikhar Tuli,
Yilin Shen,
Hongxia Jin,
Yen-Chang Hsu
Abstract:
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduc…
▽ More
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nyström approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.
△ Less
Submitted 2 May, 2025; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Conformal Diffusion Models for Individual Treatment Effect Estimation and Inference
Authors:
Hengrui Cai,
Huaqing Jin,
Lexin Li
Abstract:
Estimating treatment effects from observational data is of central interest across numerous application domains. Individual treatment effect offers the most granular measure of treatment effect on an individual level, and is the most useful to facilitate personalized care. However, its estimation and inference remain underdeveloped due to several challenges. In this article, we propose a novel con…
▽ More
Estimating treatment effects from observational data is of central interest across numerous application domains. Individual treatment effect offers the most granular measure of treatment effect on an individual level, and is the most useful to facilitate personalized care. However, its estimation and inference remain underdeveloped due to several challenges. In this article, we propose a novel conformal diffusion model-based approach that addresses those intricate challenges. We integrate the highly flexible diffusion modeling, the model-free statistical inference paradigm of conformal inference, along with propensity score and covariate local approximation that tackle distributional shifts. We unbiasedly estimate the distributions of potential outcomes for individual treatment effect, construct an informative confidence interval, and establish rigorous theoretical guarantees. We demonstrate the competitive performance of the proposed method over existing solutions through extensive numerical studies.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Authors:
Brian K Chen,
Tianyang Hu,
Hui Jin,
Hwee Kuan Lee,
Kenji Kawaguchi
Abstract:
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter…
▽ More
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.
△ Less
Submitted 6 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
High dimensional test for functional covariates
Authors:
Huaqing Jin,
Fei Jiang
Abstract:
As medical devices become more complex, they routinely collect extensive and complicated data. While classical regressions typically examine the relationship between an outcome and a vector of predictors, it becomes imperative to identify the relationship with predictors possessing functional structures. In this article, we introduce a novel inference procedure for examining the relationship betwe…
▽ More
As medical devices become more complex, they routinely collect extensive and complicated data. While classical regressions typically examine the relationship between an outcome and a vector of predictors, it becomes imperative to identify the relationship with predictors possessing functional structures. In this article, we introduce a novel inference procedure for examining the relationship between outcomes and large-scale functional predictors. We target testing the linear hypothesis on the functional parameters under the generalized functional linear regression framework, where the number of the functional parameters grows with the sample size. We develop the estimation procedure for the high dimensional generalized functional linear model incorporating B-spline functional approximation and amenable regularization. Furthermore, we construct a procedure that is able to test the local alternative hypothesis on the linear combinations of the functional parameters. We establish the statistical guarantees in terms of non-asymptotic convergence of the parameter estimation and the oracle property and asymptotic normality of the estimators. Moreover, we derive the asymptotic distribution of the test statistic. We carry out intensive simulations and illustrate with a new dataset from an Alzheimer's disease magnetoencephalography study.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Federated Control in Markov Decision Processes
Authors:
Hao Jin,
Yang Peng,
Liangyu Zhang,
Zhihua Zhang
Abstract:
We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the…
▽ More
We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Federated Reinforcement Learning with Constraint Heterogeneity
Authors:
Hao Jin,
Liangyu Zhang,
Zhihua Zhang
Abstract:
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning…
▽ More
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications. To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods. Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies. Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, ie, {FedNPG} and {FedPPO}. We show that FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Analysis of Log Data from an International Online Educational Assessment System: A Multi-state Survival Modeling Approach to Reaction Time between and across Action Sequence
Authors:
Jina Park,
Ick Hoon Jin,
Minjeong Jeon
Abstract:
With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this paper, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factor…
▽ More
With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this paper, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factors and how they influence test takers' transition speed between actions. We specifically identify the key actions that differentiate correct and incorrect answers, compare transition probabilities between these groups, and analyze their distinct problem-solving patterns. Through simulation studies and sensitivity analyses, we evaluate the robustness of our proposed model. We demonstrate the proposed approach using problem-solving items from the Programme for the International Assessment of Adult Competencies (PIAAC).
△ Less
Submitted 25 May, 2025; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Network-based Topic Structure Visualization
Authors:
Yeseul Jeon,
Jina Park,
Ick Hoon Jin,
Dongjun Chungc
Abstract:
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response d…
▽ More
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response data to model the structure of topics using a latent space item response model. By estimating the latent positions of topics based on their distances toward words, we can capture the underlying topic structure and reveal their relationships. Visualizing the latent positions of topics in Euclidean space allows for an intuitive understanding of their proximity and associations. We interpret relationships among topics by characterizing each topic based on representative words selected using a newly proposed scoring scheme. Additionally, we assess the maturity of topics by tracking their latent positions using different word sets, providing insights into the robustness of topics. To demonstrate the effectiveness of our approach, we analyze the topic composition of COVID-19 studies during the early stage of its emergence using biomedical literature in the PubMed database. The software and data used in this paper are publicly available at https://github.com/jeon9677/gViz .
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Impacts of Innovation School System in Korea: A Latent Space Item Response Model with Neyman-Scott Point Process
Authors:
Seorim Yi,
Minkyu Kim,
Jaewoo Park,
Minjeong Jeon,
Ick Hoon Jin
Abstract:
South Korea's educational system has faced criticism for its lack of focus on critical thinking and creativity, resulting in high levels of stress and anxiety among students. As part of the government's effort to improve the educational system, the innovation school system was introduced in 2009, which aims to develop students' creativity as well as their non-cognitive skills. To better understand…
▽ More
South Korea's educational system has faced criticism for its lack of focus on critical thinking and creativity, resulting in high levels of stress and anxiety among students. As part of the government's effort to improve the educational system, the innovation school system was introduced in 2009, which aims to develop students' creativity as well as their non-cognitive skills. To better understand the differences between innovation and regular school systems in South Korea, we propose a novel method that combines the latent space item response model (LSIRM) with the Neyman-Scott (NS) point process model. Our method accounts for the heterogeneity of items and students, captures relationships between respondents and items, and identifies item and student clusters that can provide a comprehensive understanding of students' behaviors/perceptions on non-cognitive outcomes. Our analysis reveals that students in the innovation school system show a higher sense of citizenship, while those in the regular school system tend to associate confidence in appearance with social ability. We compare our model with exploratory item factor analysis in terms of item clustering and find that our approach provides a more detailed and automated analysis. A comparison with exploratory item factor analysis highlights our method's advantages in terms of uncertainty quantification of the clustering process and more detailed and nuanced clustering results. Our method is made available to an existing R package, lsirm12pl.
△ Less
Submitted 27 May, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Comparing multiple latent space embeddings using topological analysis
Authors:
Kisung You,
Ilmun Kim,
Ick Hoon Jin,
Minjeong Jeon,
Dennis Shung
Abstract:
The latent space model is one of the well-known methods for statistical inference of network data. While the model has been much studied for a single network, it has not attracted much attention to analyze collectively when multiple networks and their latent embeddings are present. We adopt a topology-based representation of latent space embeddings to learn over a population of network model fits,…
▽ More
The latent space model is one of the well-known methods for statistical inference of network data. While the model has been much studied for a single network, it has not attracted much attention to analyze collectively when multiple networks and their latent embeddings are present. We adopt a topology-based representation of latent space embeddings to learn over a population of network model fits, which allows us to compare networks of potentially varying sizes in an invariant manner to label permutation and rigid motion. This approach enables us to propose algorithms for clustering and multi-sample hypothesis tests by adopting well-established theories for Hilbert space-valued analysis. After the proposed method is validated via simulated examples, we apply the framework to analyze educational survey data from Korean innovative school reform.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
lsirm12pl: An R package for latent space item response modeling
Authors:
Dongyoung Go,
Gwanghee Kim,
Jina Park,
Junyong Park,
Minjeong Jeon,
Ick Hoon Jin
Abstract:
The item response model in latent space (LSIRM; Jeon et al., 2021) uncovers unobserved interactions between respondents and items in the item response data by embedding both in a shared latent metric space. The R package lsirm12pl implements Bayesian estimation of the LSIRM and its extensions for various response types, base model specifications, and missing data handling. Furthermore, lsirm12pl p…
▽ More
The item response model in latent space (LSIRM; Jeon et al., 2021) uncovers unobserved interactions between respondents and items in the item response data by embedding both in a shared latent metric space. The R package lsirm12pl implements Bayesian estimation of the LSIRM and its extensions for various response types, base model specifications, and missing data handling. Furthermore, lsirm12pl package provides methods to improve model utilization and interpretation, such as clustering item positions on an estimated interaction map. The package also offers convenient summary and plotting options to evaluate and process the estimated results. In this paper, we provide an overview of the LSIRM's methodological foundation and describe several extensions included in the package. We then demonstrate the use of the package with real data examples contained within it.
△ Less
Submitted 7 March, 2025; v1 submitted 14 May, 2022;
originally announced May 2022.
-
Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Authors:
Hongwei Jin,
Zishun Yu,
Xinhua Zhang
Abstract:
Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although…
▽ More
Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although efficient \emph{local} solvers such as conditional gradient and Sinkhorn are available, the inherent non-convexity still prevents a tractable evaluation, and the existing lower bounds are not tight enough for practical use. To address this issue, we take inspiration from the connection with the quadratic assignment problem, and propose the orthogonal Gromov-Wasserstein (OGW) discrepancy as a surrogate of GW. It admits an efficient and \emph{closed-form} lower bound with $\mathcal{O}(n^3)$ complexity, and directly extends to the fused Gromov-Wasserstein (FGW) distance, incorporating node features into the coupling. Extensive experiments on both the synthetic and real-world datasets show the tightness of our lower bounds, and both OGW and its lower bounds efficiently deliver accurate predictions and satisfactory barycenters for graph sets.
△ Less
Submitted 10 July, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Federated Reinforcement Learning with Environment Heterogeneity
Authors:
Hao Jin,
Yang Peng,
Wenhao Yang,
Shusen Wang,
Zhihua Zhang
Abstract:
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy funct…
▽ More
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
A Latent Space Accumulator Model for Response Time: Applications to Cognitive Assessment Data
Authors:
Ick Hoon Jin,
Jonghyun Yun,
Hyunjoo Kim,
Minjeong Jeon
Abstract:
Response time has attracted increased interest in educational and psychological assessment for, e.g., measuring test takers' processing speed, improving the measurement accuracy of ability, and understanding aberrant response behavior. Most models for response time analysis are based on a parametric assumption about the response time distribution. The Cox proportional hazard model has been utilize…
▽ More
Response time has attracted increased interest in educational and psychological assessment for, e.g., measuring test takers' processing speed, improving the measurement accuracy of ability, and understanding aberrant response behavior. Most models for response time analysis are based on a parametric assumption about the response time distribution. The Cox proportional hazard model has been utilized for response time analysis for the advantages of not requiring a distributional assumption of response time and enabling meaningful interpretations with respect to response processes. In this paper, we present a new version of the proportional hazard model, called a latent space accumulator model, for cognitive assessment data based on accumulators for two competing response outcomes, such as correct vs. incorrect responses. The proposed model extends a previous accumulator model by capturing dependencies between respondents and test items across accumulators in the form of distances in a two-dimensional Euclidean space. A fully Bayesian approach is developed to estimate the proposed model. The utilities of the proposed model are illustrated with two real data examples.
△ Less
Submitted 20 June, 2023; v1 submitted 27 March, 2022;
originally announced March 2022.
-
A Bayesian Precision Response-adaptive Phase II Clinical Trial Design for Radiotherapies with Competing Risk Survival Outcomes
Authors:
Jina Park,
Wenjing Hu,
Ick Hoon Jin,
Hao Liu,
Yong Zang
Abstract:
Many phase II clinical trials have used survival outcomes as the primary endpoints in recent decades. Suppose the radiotherapy is evaluated in a phase II trial using survival outcomes. In that case, the competing risk issue often arises because the time to disease progression can be censored by the time to normal tissue complications, and vice versa. Besides, much literature has examined that pati…
▽ More
Many phase II clinical trials have used survival outcomes as the primary endpoints in recent decades. Suppose the radiotherapy is evaluated in a phase II trial using survival outcomes. In that case, the competing risk issue often arises because the time to disease progression can be censored by the time to normal tissue complications, and vice versa. Besides, much literature has examined that patients receiving the same radiotherapy dose may yield distinct responses due to their heterogeneous radiation susceptibility statuses. Therefore, the "one-dose-fit-all" strategy often fails, and it is more relevant to evaluate the subgroup-specific treatment effect with the subgroup defined by the radiation susceptibility status. In this paper, we propose a Bayesian precision phase II trial design evaluating the subgroup-specific treatment effects of radiotherapy. We use the cause-specific hazard approach to model the competing risk survival outcomes. We propose restricting the candidate radiation doses based on each patient's radiation susceptibility status. Only the clinically feasible personalized dose will be considered, which enhances the benefit for the patients in the trial. In addition, we propose a stratified Bayesian adaptive randomization scheme such that more patients will be randomized to the dose reporting more favorable survival outcomes. Numerical studies have shown that the proposed design performed well and outperformed the conventional design ignoring the competing risk issue.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Oncology Dose Finding Using Approximate Bayesian Computation Design
Authors:
Huaqing Jin,
Wenbin Du,
Guosheng Yin
Abstract:
In the development of new cancer treatment, an essential step is to determine the maximum tolerated dose (MTD) via phase I clinical trials. Generally speaking, phase I trial designs can be classified as either model-based or algorithm-based approaches. Model-based phase I designs are typically more efficient by using all observed data, while there is a potential risk of model misspecification that…
▽ More
In the development of new cancer treatment, an essential step is to determine the maximum tolerated dose (MTD) via phase I clinical trials. Generally speaking, phase I trial designs can be classified as either model-based or algorithm-based approaches. Model-based phase I designs are typically more efficient by using all observed data, while there is a potential risk of model misspecification that may lead to unreliable dose assignment and incorrect MTD identification. In contrast, most of the algorithm-based designs are less efficient in using cumulative information, because they tend to focus on the observed data in the neighborhood of the current dose level for dose movement. To use the data more efficiently yet without any model assumption, we propose a novel approximate Bayesian computation (ABC) approach for phase I trial design. Not only is the ABC design free of any dose--toxicity curve assumption, but it can also aggregate all the available information accrued in the trial for dose assignment. Extensive simulation studies demonstrate its robustness and efficiency compared with other phase I designs. We apply the ABC design to the MEK inhibitor selumetinib trial to demonstrate its satisfactory performance. The proposed design can be a useful addition to the family of phase I clinical trial designs due to its simplicity, efficiency and robustness.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Hyperparameter Importance for Machine Learning Algorithms
Authors:
Honghe Jin
Abstract:
Hyperparameter plays an essential role in the fitting of supervised machine learning algorithms. However, it is computationally expensive to tune all the tunable hyperparameters simultaneously especially for large data sets. In this paper, we give a definition of hyperparameter importance that can be estimated by subsampling procedures. According to the importance, hyperparameters can then be tune…
▽ More
Hyperparameter plays an essential role in the fitting of supervised machine learning algorithms. However, it is computationally expensive to tune all the tunable hyperparameters simultaneously especially for large data sets. In this paper, we give a definition of hyperparameter importance that can be estimated by subsampling procedures. According to the importance, hyperparameters can then be tuned on the entire data set more efficiently. We show theoretically that the proposed importance on subsets of data is consistent with the one on the population data under weak conditions. Numerical experiments show that the proposed importance is consistent and can save a lot of computational resources.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Quantile Regression with Multiple Proxy Variables
Authors:
Dongyoung Go,
Jongho Im,
Ick Hoon Jin
Abstract:
Data integration has become increasingly popular owing to the availability of multiple data sources. This study considered quantile regression estimation when a key covariate had multiple proxies across several datasets. In a unified estimation procedure, the proposed method incorporates multiple proxies that have various relationships with the unobserved covariates. The proposed approach allows t…
▽ More
Data integration has become increasingly popular owing to the availability of multiple data sources. This study considered quantile regression estimation when a key covariate had multiple proxies across several datasets. In a unified estimation procedure, the proposed method incorporates multiple proxies that have various relationships with the unobserved covariates. The proposed approach allows the inference of both the quantile function and unobserved covariates. Moreover, it does not require the quantile function's linearity and, simultaneously, accommodates both the linear and nonlinear proxies. Simulation studies have demonstrated that this methodology successfully integrates multiple proxies and revealed quantile relationships for a wide range of nonlinear data. The proposed method is applied to administrative data obtained from the Survey of Household Finances and Living Conditions provided by Statistics Korea, to specify the relationship between assets and salary income in the presence of multiple income records.
△ Less
Submitted 21 October, 2022; v1 submitted 23 December, 2021;
originally announced December 2021.
-
How social networks influence human behavior: An integrated latent space approach for differential social influence
Authors:
Jina Park,
Ick Hoon Jin,
Minjeong Jeon
Abstract:
How social networks influence human behavior has been an interesting topic in applied research. Existing methods often utilized scale-level behavioral data to estimate the influence of a social network on human behavior. This study proposes a novel approach to studying social influence that utilizes item-level behavioral measures. Under the latent space modeling framework, we integrate the two int…
▽ More
How social networks influence human behavior has been an interesting topic in applied research. Existing methods often utilized scale-level behavioral data to estimate the influence of a social network on human behavior. This study proposes a novel approach to studying social influence that utilizes item-level behavioral measures. Under the latent space modeling framework, we integrate the two interaction maps for respondents' social network data and item-level behavior measures. The interaction map visualizes the association between the latent homophily of the respondents and their behaviors measured at the item level in a low-dimensional latent space, revealing the potential, differential social influence effects across specific behaviors measured at the item level. We also measure overall social influence as the impact of the interaction map configuration contributed by the social network data on the behavior data. The performance and properties of the proposed approach are evaluated via simulation studies. We apply the proposed model to an empirical dataset to demonstrate how the students' friendship network influences their participation in school activities.
△ Less
Submitted 27 February, 2023; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Network-based Topic Interaction Map for Big Data Mining of COVID-19 Biomedical Literature
Authors:
Yeseul Jeon,
Dongjun Chung,
Jina Park,
Ick Hoon Jin
Abstract:
Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which yields an abundant amount of big data in biomedical literature. Due to the high volum of relevant literature, it is practically impossible to follow up the research manually. Topic modeling is a well-known unsupervised learning that aims to reveal latent topics from text data.…
▽ More
Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which yields an abundant amount of big data in biomedical literature. Due to the high volum of relevant literature, it is practically impossible to follow up the research manually. Topic modeling is a well-known unsupervised learning that aims to reveal latent topics from text data. In this paper, we propose a novel analytical framework for estimating topic interactions and effective visualization to improve topics' relationships. We first estimate topic-word distributions using the biterm topic model and estimate the topics' interaction based on the word distribution using the latent space item response model. We mapped these latent topics onto networks to visualize relationships among the topics. Moreover, in the proposed approach, we developed a score that is helpful in selecting meaningful words that characterize the topic. We figure out how topics are related by looking at how their relationships change. We do this with a "trajectory plot" that is made with different levels of word richness. These findings provide a thoroughly mined and intuitive representation of relationships between topics related to a specific research area. The application of this proposed framework to the PubMed literature demonstrates utility of our approach in understanding of the topic composition related to COVID-19 studies in the stage of its emergence.
△ Less
Submitted 8 December, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Unit Information Prior for Adaptive Information Borrowing from Multiple Historical Datasets
Authors:
Huaqing Jin,
Guosheng Yin
Abstract:
In clinical trials, there often exist multiple historical studies for the same or related treatment investigated in the current trial. Incorporating historical data in the analysis of the current study is of great importance, as it can help to gain more information, improve efficiency, and provide a more comprehensive evaluation of treatment. Enlightened by the unit information prior (UIP) concept…
▽ More
In clinical trials, there often exist multiple historical studies for the same or related treatment investigated in the current trial. Incorporating historical data in the analysis of the current study is of great importance, as it can help to gain more information, improve efficiency, and provide a more comprehensive evaluation of treatment. Enlightened by the unit information prior (UIP) concept in the reference Bayesian test, we propose a new informative prior called UIP from an information perspective that can adaptively borrow information from multiple historical datasets. We consider both binary and continuous data and also extend the new UIP methods to linear regression settings. Extensive simulation studies demonstrate that our method is comparable to other commonly used informative priors, while the interpretation of UIP is intuitive and its implementation is relatively easy. One distinctive feature of UIP is that its construction only requires summary statistics commonly reported in the literature rather than the patient-level data. By applying our UIP methods to phase III clinical trials for investigating the efficacy of memantine in Alzheimer's disease, we illustrate its ability of adaptively borrowing information from multiple historical datasets in the real application.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Mapping unobserved item-respondent interactions: A latent space item response model with interaction map
Authors:
Minjeong Jeon,
Ick Hoon Jin,
Michael Schweinberger,
Samuel Baugh
Abstract:
Classic item response models assume that all items with the same difficulty have the same response probability among all respondents with the same ability. These assumptions, however, may very well be violated in practice, and it is not straightforward to assess whether these assumptions are violated, because neither the abilities of respondents nor the difficulties of items are observed. An examp…
▽ More
Classic item response models assume that all items with the same difficulty have the same response probability among all respondents with the same ability. These assumptions, however, may very well be violated in practice, and it is not straightforward to assess whether these assumptions are violated, because neither the abilities of respondents nor the difficulties of items are observed. An example is an educational assessment where unobserved heterogeneity is present, arising from unobserved variables such as cultural background and upbringing of students, the quality of mentorship and other forms of emotional and professional support received by students, and other unobserved variables that may affect response probabilities. To address such violations of assumptions, we introduce a novel latent space model which assumes that both items and respondents are embedded in an unobserved metric space, with the probability of a correct response decreasing as a function of the distance between the respondent's and the item's position in the latent space. The resulting latent space approach provides an interaction map that represents interactions of respondents and items, and helps derive insightful diagnostic information on items as well as respondents. In practice, such interaction maps enable teachers to detect students from underrepresented groups who need more support than other students. We provide empirical evidence to demonstrate the usefulness of the proposed latent space approach, along with simulation results.
△ Less
Submitted 15 November, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
AutoRec: An Automated Recommender System
Authors:
Ting-Hsiang Wang,
Qingquan Song,
Xiaotian Han,
Zirui Liu,
Haifeng Jin,
Xia Hu
Abstract:
Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommenda…
▽ More
Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommendation models. AutoRec also supports a highly flexible pipeline that accommodates both sparse and dense inputs, rating prediction and click-through rate (CTR) prediction tasks, and an array of recommendation models. Lastly, AutoRec provides a simple, user-friendly API. Experiments conducted on the benchmark datasets reveal AutoRec is reliable and can identify models which resemble the best model without prior knowledge.
△ Less
Submitted 26 June, 2020;
originally announced July 2020.
-
Bayesian Shrinkage for Functional Network Models, with Applications to Longitudinal Item Response Data
Authors:
Jaewoo Park,
Yeseul Jeon,
Minsuk Shin,
Minjeong Jeon,
Ick Hoon Jin
Abstract:
Longitudinal item response data are common in social science, educational science, and psychology, among other disciplines. Studying the time-varying relationships between items is crucial for educational assessment or designing marketing strategies from survey questions. Although dynamic network models have been widely developed, we cannot apply them directly to item response data because there a…
▽ More
Longitudinal item response data are common in social science, educational science, and psychology, among other disciplines. Studying the time-varying relationships between items is crucial for educational assessment or designing marketing strategies from survey questions. Although dynamic network models have been widely developed, we cannot apply them directly to item response data because there are multiple systems of nodes with various types of local interactions among items, resulting in multiplex network structures. We propose a new model to study these temporal interactions among items by embedding the functional parameters within the exponential random graph model framework. Inference on such models is difficult because the likelihood functions contain intractable normalizing constants. Furthermore, the number of functional parameters grows exponentially as the number of items increases. Variable selection for such models is not trivial because standard shrinkage approaches do not consider temporal trends in functional parameters. To overcome these challenges, we develop a novel Bayes approach by combining an auxiliary variable MCMC algorithm and a recently-developed functional shrinkage method. We apply our algorithm to survey and review data sets, illustrating that the proposed approach can avoid the evaluation of intractable normalizing constants as well as the detection of significant temporal interactions among items. Through a simulation study under different scenarios, we examine the performance of our algorithm. Our method is, to our knowledge, the first attempt to select functional variables for models with intractable normalizing constants.
△ Less
Submitted 22 October, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
AutoOD: Automated Outlier Detection via Curiosity-guided Search and Self-imitation Learning
Authors:
Yuening Li,
Zhengzhang Chen,
Daochen Zha,
Kaixiong Zhou,
Haifeng Jin,
Haifeng Chen,
Xia Hu
Abstract:
Outlier detection is an important data mining task with numerous practical applications such as intrusion detection, credit card fraud detection, and video surveillance. However, given a specific complicated task with big data, the process of building a powerful deep learning based system for outlier detection still highly relies on human expertise and laboring trials. Although Neural Architecture…
▽ More
Outlier detection is an important data mining task with numerous practical applications such as intrusion detection, credit card fraud detection, and video surveillance. However, given a specific complicated task with big data, the process of building a powerful deep learning based system for outlier detection still highly relies on human expertise and laboring trials. Although Neural Architecture Search (NAS) has shown its promise in discovering effective deep architectures in various domains, such as image classification, object detection, and semantic segmentation, contemporary NAS methods are not suitable for outlier detection due to the lack of intrinsic search space, unstable search process, and low sample efficiency. To bridge the gap, in this paper, we propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model within a predefined search space. Specifically, we firstly design a curiosity-guided search strategy to overcome the curse of local optimality. A controller, which acts as a search agent, is encouraged to take actions to maximize the information gain about the controller's internal belief. We further introduce an experience replay mechanism based on self-imitation learning to improve the sample efficiency. Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance, comparing with existing handcrafted models and traditional search methods.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks
Authors:
Hui Jin,
Guido Montúfar
Abstract:
We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curva…
▽ More
We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. \hj{For stochastic gradient descent we obtain the same implicit bias result.} We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.
△ Less
Submitted 28 May, 2023; v1 submitted 12 June, 2020;
originally announced June 2020.
-
A Complex KBQA System using Multiple Reasoning Paths
Authors:
Kechen Qin,
Yu Wang,
Cheng Li,
Kalpa Gunaratna,
Hongxia Jin,
Virgil Pavlu,
Javed A. Aslam
Abstract:
Multi-hop knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system's performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce an end-to-e…
▽ More
Multi-hop knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system's performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce an end-to-end KBQA system which can leverage multiple reasoning paths' information and only requires labeled answer as supervision. We conduct experiments on several benchmark datasets containing both single-hop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Applying the Network Item Response Model to Student Assessment Data
Authors:
Alex Brodersen,
Ick Hoon Jin,
Ying Cheng,
Minjeong Jeon
Abstract:
This study discusses an alternative tool for modeling student assessment data. The model constructs networks from a matrix item responses and attempts to represent these data in low dimensional Euclidean space. This procedure has advantages over common methods used for modeling student assessment data such as Item Response Theory because it relaxes the highly restrictive local-independence assumpt…
▽ More
This study discusses an alternative tool for modeling student assessment data. The model constructs networks from a matrix item responses and attempts to represent these data in low dimensional Euclidean space. This procedure has advantages over common methods used for modeling student assessment data such as Item Response Theory because it relaxes the highly restrictive local-independence assumption. This article provides a deep discussion of the model and the steps one must take to estimate it. To enable extending a present model by adding data, two methods for estimating the positions of new individuals in the network are discussed. Then, a real data analysis is then provided as a case study on using the model and how to interpret the results. Finally, the model is compared and contrasted to other popular models in psychological and educational measurement: Item response theory (IRT) and network psychometric Ising model for binary data.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Bayesian Model Selection for High-Dimensional Ising Models, With Applications to Educational Data
Authors:
Jaewoo Park,
Ick Hoon Jin,
Michael Schweinberger
Abstract:
Doubly-intractable posterior distributions arise in many applications of statistics concerned with discrete and dependent data, including physics, spatial statistics, machine learning, the social sciences, and other fields. A specific example is psychometrics, which has adapted high-dimensional Ising models from machine learning, with a view to studying the interactions among binary item responses…
▽ More
Doubly-intractable posterior distributions arise in many applications of statistics concerned with discrete and dependent data, including physics, spatial statistics, machine learning, the social sciences, and other fields. A specific example is psychometrics, which has adapted high-dimensional Ising models from machine learning, with a view to studying the interactions among binary item responses in educational assessments. To estimate high-dimensional Ising models from educational assessment data, $\ell_1$-penalized nodewise logistic regressions have been used. Theoretical results in high-dimensional statistics show that $\ell_1$-penalized nodewise logistic regressions can recover the true interaction structure with high probability, provided that certain assumptions are satisfied. Those assumptions are hard to verify in practice and may be violated, and quantifying the uncertainty about the estimated interaction structure and parameter estimators is challenging. We propose a Bayesian approach that helps quantify the uncertainty about the interaction structure and parameters without requiring strong assumptions, and can be applied to Ising models with thousands of parameters. We demonstrate the advantages of the proposed Bayesian approach compared with $\ell_1$-penalized nodewise logistic regressions by simulation studies and applications to small and large educational data sets with up to 2,485 parameters. Among other things, the simulation studies suggest that the Bayesian approach is more robust against model misspecification due to omitted covariates than $\ell_1$-penalized nodewise logistic regressions.
△ Less
Submitted 19 May, 2021; v1 submitted 16 November, 2019;
originally announced November 2019.
-
Deep Weakly-supervised Anomaly Detection
Authors:
Guansong Pang,
Chunhua Shen,
Huidong Jin,
Anton van den Hengel
Abstract:
Recent semi-supervised anomaly detection methods that are trained using small labeled anomaly examples and large unlabeled data (mostly normal data) have shown largely improved performance over unsupervised methods. However, these methods often focus on fitting abnormalities illustrated by the given anomaly examples only (i.e.,, seen anomalies), and consequently they fail to generalize to those th…
▽ More
Recent semi-supervised anomaly detection methods that are trained using small labeled anomaly examples and large unlabeled data (mostly normal data) have shown largely improved performance over unsupervised methods. However, these methods often focus on fitting abnormalities illustrated by the given anomaly examples only (i.e.,, seen anomalies), and consequently they fail to generalize to those that are not, i.e., new types/classes of anomaly unseen during training. To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled. Since unlabeled instances are mostly normal, the relation prediction enforces a joint learning of anomaly-anomaly, anomaly-normal, and normal-normal pairwise discriminative patterns, respectively. PReNet can then detect any seen/unseen abnormalities that fit the learned pairwise abnormal patterns, or deviate from the normal patterns. Further, this pairwise approach also seamlessly and significantly augments the training anomaly data. Empirical results on 12 real-world datasets show that PReNet significantly outperforms nine competing methods in detecting seen and unseen anomalies. We also theoretically and empirically justify the robustness of our model w.r.t. anomaly contamination in the unlabeled data. The code is available at https://github.com/mala-lab/PReNet.
△ Less
Submitted 5 June, 2023; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Network Mediation Analysis Using Model-based Eigenvalue Decomposition
Authors:
Chang Che,
Ick Hoon Jin,
Zhiyong Zhang
Abstract:
This paper proposes a new two-stage network mediation method based on the use of a latent network approach -- model-based eigenvalue decomposition -- for analyzing social network data with nodal covariates. In the decomposition stage of the observed network, no assumption on the metric of the latent space structure is required. In the mediation stage, the most important eigenvectors of a network a…
▽ More
This paper proposes a new two-stage network mediation method based on the use of a latent network approach -- model-based eigenvalue decomposition -- for analyzing social network data with nodal covariates. In the decomposition stage of the observed network, no assumption on the metric of the latent space structure is required. In the mediation stage, the most important eigenvectors of a network are used as mediators. This method further offers an innovative way for controlling for the conditional covariates and it only considers the information left in the network. We demonstrate this approach in a detailed tutorial R code provided for four separate cases -- unconditional and conditional model-based eigenvalue decompositions for either a continuous outcome or a binary outcome -- to show its applicability to empirical network data.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Authors:
Hao Jin,
Dachao Lin,
Zhihua Zhang
Abstract:
Stochastic variance-reduced gradient (SVRG) is a classical optimization method. Although it is theoretically proved to have better convergence performance than stochastic gradient descent (SGD), the generalization performance of SVRG remains open. In this paper we investigate the effects of some training techniques, mini-batching and learning rate decay, on the generalization performance of SVRG,…
▽ More
Stochastic variance-reduced gradient (SVRG) is a classical optimization method. Although it is theoretically proved to have better convergence performance than stochastic gradient descent (SGD), the generalization performance of SVRG remains open. In this paper we investigate the effects of some training techniques, mini-batching and learning rate decay, on the generalization performance of SVRG, and verify the generalization performance of Batch-SVRG (B-SVRG). In terms of the relationship between optimization and generalization, we believe that the average norm of gradients on each training sample as well as the norm of average gradient indicate how flat the landscape is and how well the model generalizes. Based on empirical observations of such metrics, we perform a sign switch on B-SVRG and derive a practical algorithm, BatchPlus-SVRG (BP-SVRG), which is numerically shown to enjoy better generalization performance than B-SVRG, even SGD in some scenarios of deep neural networks.
△ Less
Submitted 18 August, 2019;
originally announced August 2019.
-
Multi-Label Adversarial Perturbations
Authors:
Qingquan Song,
Haifeng Jin,
Xiao Huang,
Xia Hu
Abstract:
Adversarial examples are delicately perturbed inputs, which aim to mislead machine learning models towards incorrect outputs. While most of the existing work focuses on generating adversarial perturbations in multi-class classification problems, many real-world applications fall into the multi-label setting in which one instance could be associated with more than one label. For example, a spammer…
▽ More
Adversarial examples are delicately perturbed inputs, which aim to mislead machine learning models towards incorrect outputs. While most of the existing work focuses on generating adversarial perturbations in multi-class classification problems, many real-world applications fall into the multi-label setting in which one instance could be associated with more than one label. For example, a spammer may generate adversarial spams with malicious advertising while maintaining the other labels such as topic labels unchanged. To analyze the vulnerability and robustness of multi-label learning models, we investigate the generation of multi-label adversarial perturbations. This is a challenging task due to the uncertain number of positive labels associated with one instance, as well as the fact that multiple labels are usually not mutually exclusive with each other. To bridge this gap, in this paper, we propose a general attacking framework targeting on multi-label classification problem and conduct a premier analysis on the perturbations for deep neural networks. Leveraging the ranking relationships among labels, we further design a ranking-based framework to attack multi-label ranking algorithms. We specify the connection between the two proposed frameworks and separately design two specific methods grounded on each of them to generate targeted multi-label perturbations. Experiments on real-world multi-label image classification and ranking problems demonstrate the effectiveness of our proposed frameworks and provide insights of the vulnerability of multi-label deep learning models under diverse targeted attacking strategies. Several interesting findings including an unpolished defensive strategy, which could potentially enhance the interpretability and robustness of multi-label deep learning models, are further presented and discussed at the end.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System
Authors:
Yu Wang,
Abhishek Patel,
Hongxia Jin
Abstract:
In this paper, a new deep reinforcement learning based augmented general sequence tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence tagging model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and…
▽ More
In this paper, a new deep reinforcement learning based augmented general sequence tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence tagging model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and NLU sequence tagging tasks using ATIS and CoNLL-2003 benchmark datasets, to demonstrate the new system's outstanding performance on general tagging tasks. Evaluated by F1 scores, it shows that the new system outperforms the current state-of-the-art model on ATIS dataset by 1.9% and that on CoNLL-2003 dataset by 1.4%.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
A Variational Dirichlet Framework for Out-of-Distribution Detection
Authors:
Wenhu Chen,
Yilin Shen,
Hongxia Jin,
William Wang
Abstract:
With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications. However, deep neural networks are also known to have very little control over its uncertainty for unseen examples, which potentially causes very harmful and annoying consequences in practical scenarios. In this paper, we are particularly interested in designing a high…
▽ More
With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications. However, deep neural networks are also known to have very little control over its uncertainty for unseen examples, which potentially causes very harmful and annoying consequences in practical scenarios. In this paper, we are particularly interested in designing a higher-order uncertainty metric for deep neural networks and investigate its effectiveness under the out-of-distribution detection task proposed by~\cite{hendrycks2016baseline}. Our method first assumes there exists an underlying higher-order distribution $\mathbb{P}(z)$, which controls label-wise categorical distribution $\mathbb{P}(y)$ over classes on the K-dimension simplex, and then approximate such higher-order distribution via parameterized posterior function $p_θ(z|x)$ under variational inference framework, finally we use the entropy of learned posterior distribution $p_θ(z|x)$ as uncertainty measure to detect out-of-distribution examples. Further, we propose an auxiliary objective function to discriminate against synthesized adversarial examples to further increase the robustness of the proposed uncertainty measure. Through comprehensive experiments on various datasets, our proposed framework is demonstrated to consistently outperform competing algorithms.
△ Less
Submitted 20 April, 2019; v1 submitted 18 November, 2018;
originally announced November 2018.
-
Multilevel Network Item Response Modeling for Discovering Differences Between Innovation and Regular School Systems in Korea
Authors:
Ick Hoon Jin,
Minjeong Jeon,
Michael Schweinberger,
Jonghyun Yun,
Lizhen Lin
Abstract:
The innovation school system in South Korea has been developed in response to the traditional high-pressure school system in South Korea, with a view to cultivating a bottom-up and student-centered educational culture. Despite its ambitious goals, questions have been raised about the success of the innovation school system. Leveraging data from the Gyeonggi Education Panel Study (GEPS) along with…
▽ More
The innovation school system in South Korea has been developed in response to the traditional high-pressure school system in South Korea, with a view to cultivating a bottom-up and student-centered educational culture. Despite its ambitious goals, questions have been raised about the success of the innovation school system. Leveraging data from the Gyeonggi Education Panel Study (GEPS) along with advances in the statistical analysis of network data and educational data, we compare the two school systems in more depth. We find that some schools are indeed different from others, and those differences are not detected by conventional multilevel models. Having said that, we do not find much evidence that the innovation school system differs from the regular school system in terms of self-reported mental well-being, although we do detect differences among some schools that appear to be unrelated to the school system.
△ Less
Submitted 20 January, 2022; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Bayesian Hierarchical Spatial Model for Small Area Estimation with Non-ignorable Nonresponses and Its Applications to the NHANES Dental Caries Assessments
Authors:
Ick Hoon Jin,
Fang Liu,
Evercita C. Eugenio,
Kisung You,
Suyu Liu
Abstract:
The National Health and Nutrition Examination Survey (NHANES) is a major program of the National Center for Health Statistics, designed to assess the health and nutritional status of adults and children in the United States. The analysis of NHANES dental caries data faces several challenges, including (1) the data were collected using a complex, multistage, stratified, unequal-probability sampling…
▽ More
The National Health and Nutrition Examination Survey (NHANES) is a major program of the National Center for Health Statistics, designed to assess the health and nutritional status of adults and children in the United States. The analysis of NHANES dental caries data faces several challenges, including (1) the data were collected using a complex, multistage, stratified, unequal-probability sampling design; (2) the sample size of some primary sampling units (PSU), e.g., counties, is very small; (3) the measures of dental caries have complicated structure and correlation, and (4) there is a substantial percentage of nonresponses, for which the missing data are expected to be not missing at random or non-ignorable. We propose a Bayesian hierarchical spatial model to address these analysis challenges. We develop a two-level Potts model that closely resembles the caries evolution process and captures complicated spatial correlations between teeth and surfaces of the teeth. By adding Bayesian hierarchies to the Potts model, we account for the multistage survey sampling design and also enable information borrowing across PSUs for small area estimation. We incorporate sampling weights by including them as a covariate in the model and adopt flexible B-splines to achieve robust inference. We account for non-ignorable missing outcomes and covariates using the selection model. We use data augmentation coupled with the noisy exchange sampler to obtain the posterior of model parameters that involve doubly-intractable normalizing constants. Our analysis results show strong spatial associations between teeth and tooth surfaces and that dental hygienic factors, fluorosis and sealant reduce the risks of having dental diseases.
△ Less
Submitted 14 October, 2019; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Stochastic Approximation Hamiltonian Monte Carlo
Authors:
Jonghyun Yun,
Minsuk Shin,
Ick Hoon Jin,
Faming Liang
Abstract:
Recently, the Hamilton Monte Carlo (HMC) has become widespread as one of the more reliable approaches to efficient sample generation processes. However, HMC is difficult to sample in a multimodal posterior distribution because the HMC chain cannot cross energy barrier between modes due to the energy conservation property. In this paper, we propose a Stochastic Approximate Hamilton Monte Carlo (SAH…
▽ More
Recently, the Hamilton Monte Carlo (HMC) has become widespread as one of the more reliable approaches to efficient sample generation processes. However, HMC is difficult to sample in a multimodal posterior distribution because the HMC chain cannot cross energy barrier between modes due to the energy conservation property. In this paper, we propose a Stochastic Approximate Hamilton Monte Carlo (SAHMC) algorithm for generating samples from multimodal density under the Hamiltonian Monte Carlo (HMC) framework. SAHMC can adaptively lower the energy barrier to move the Hamiltonian trajectory more frequently and more easily between modes. Our simulation studies show that the potential for SAHMC to explore a multimodal target distribution more efficiently than HMC based implementations.
△ Less
Submitted 19 June, 2020; v1 submitted 10 October, 2018;
originally announced October 2018.
-
Social Network Mediation Analysis: a Latent Space Approach
Authors:
Haiyan Liu,
Ick Hoon Jin,
Zhiyong Zhang,
Ying Yuan
Abstract:
Social networks contain data on both actor attributes and social connections among them. Such connections reflect the dependence among social actors, which is important for individual's mental health and social development. To investigate the potential mediation role of a social network, we propose a mediation model with a social network as a mediator. In the model, dependence among actors is acco…
▽ More
Social networks contain data on both actor attributes and social connections among them. Such connections reflect the dependence among social actors, which is important for individual's mental health and social development. To investigate the potential mediation role of a social network, we propose a mediation model with a social network as a mediator. In the model, dependence among actors is accounted by a few mutually orthogonal latent dimensions. The scores on these dimensions are directly involved in the intervention process between an independent variable and a dependent variable. Because all the latent dimensions are equivalent in terms of their relationship to social networks, it is hardly to name them. The intervening effect through an individual dimension is thus of little practical interest. Therefore, we would rather focus on the mediation effect of a network. Although the scores are not unique, we rigorously articulate that the proposed network mediation effect is still well-defined. To estimate the model, we adopt a Bayesian estimation method. This modeling framework and the Bayesian estimation method is evaluated through a simulation study under representative conditions. Its usefulness is demonstrated through an empirical application to a college friendship network.
△ Less
Submitted 24 June, 2020; v1 submitted 8 October, 2018;
originally announced October 2018.
-
SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities
Authors:
Zhen Li,
Deqing Zou,
Shouhuai Xu,
Hai Jin,
Yawei Zhu,
Zhaoxuan Chen
Abstract:
The detection of software vulnerabilities (or vulnerabilities for short) is an important problem that has yet to be tackled, as manifested by the many vulnerabilities reported on a daily basis. This calls for machine learning methods for vulnerability detection. Deep learning is attractive for this purpose because it alleviates the requirement to manually define features. Despite the tremendous su…
▽ More
The detection of software vulnerabilities (or vulnerabilities for short) is an important problem that has yet to be tackled, as manifested by the many vulnerabilities reported on a daily basis. This calls for machine learning methods for vulnerability detection. Deep learning is attractive for this purpose because it alleviates the requirement to manually define features. Despite the tremendous success of deep learning in other application domains, its applicability to vulnerability detection is not systematically understood. In order to fill this void, we propose the first systematic framework for using deep learning to detect vulnerabilities in C/C++ programs with source code. The framework, dubbed Syntax-based, Semantics-based, and Vector Representations (SySeVR), focuses on obtaining program representations that can accommodate syntax and semantic information pertinent to vulnerabilities. Our experiments with 4 software products demonstrate the usefulness of the framework: we detect 15 vulnerabilities that are not reported in the National Vulnerability Database. Among these 15 vulnerabilities, 7 are unknown and have been reported to the vendors, and the other 8 have been "silently" patched by the vendors when releasing newer versions of the pertinent software products.
△ Less
Submitted 11 January, 2021; v1 submitted 17 July, 2018;
originally announced July 2018.
-
Auto-Keras: An Efficient Neural Architecture Search System
Authors:
Haifeng Jin,
Qingquan Song,
Xia Hu
Abstract:
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper,…
▽ More
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Intensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.
△ Less
Submitted 26 March, 2019; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Deep Neural Network Approximation using Tensor Sketching
Authors:
Shiva Prasad Kasiviswanathan,
Nina Narodytska,
Hongxia Jin
Abstract:
Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The qu…
▽ More
Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.
In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.
△ Less
Submitted 21 October, 2017;
originally announced October 2017.
-
Predicting Cognitive Decline with Deep Learning of Brain Metabolism and Amyloid Imaging
Authors:
Hongyoon Choi,
Kyong Hwan Jin
Abstract:
For effective treatment of Alzheimer disease (AD), it is important to identify subjects who are most likely to exhibit rapid cognitive decline. Herein, we developed a novel framework based on a deep convolutional neural network which can predict future cognitive decline in mild cognitive impairment (MCI) patients using flurodeoxyglucose and florbetapir positron emission tomography (PET). The archi…
▽ More
For effective treatment of Alzheimer disease (AD), it is important to identify subjects who are most likely to exhibit rapid cognitive decline. Herein, we developed a novel framework based on a deep convolutional neural network which can predict future cognitive decline in mild cognitive impairment (MCI) patients using flurodeoxyglucose and florbetapir positron emission tomography (PET). The architecture of the network only relies on baseline PET studies of AD and normal subjects as the training dataset. Feature extraction and complicated image preprocessing including nonlinear warping are unnecessary for our approach. Accuracy of prediction (84.2%) for conversion to AD in MCI patients outperformed conventional feature-based quantification approaches. ROC analyses revealed that performance of CNN-based approach was significantly higher than that of the conventional quantification methods (p < 0.05). Output scores of the network were strongly correlated with the longitudinal change in cognitive measurements. These results show the feasibility of deep learning as a tool for predicting disease outcome using brain images.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
Private Incremental Regression
Authors:
Shiva Prasad Kasiviswanathan,
Kobbi Nissim,
Hongxia Jin
Abstract:
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time.
We introduce the problems of private…
▽ More
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of $\approx\sqrt{d}$, where $d$ is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.
△ Less
Submitted 4 January, 2017;
originally announced January 2017.
-
A Doubly Latent Space Joint Model for Local Item and Person Dependence in the Analysis of Item Response Data
Authors:
Ick Hoon Jin,
Minjeong Jeon
Abstract:
Item response theory (IRT) models explain an observed item response as a function of a respondent's latent trait and the item's property. IRT is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical appro…
▽ More
Item response theory (IRT) models explain an observed item response as a function of a respondent's latent trait and the item's property. IRT is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical approach for item response data that does not require standard local independence assumptions. By adapting a latent space joint modeling approach, our proposed model can estimate pairwise distances to represent the item and person dependence structures, from which item and person clusters in latent spaces can be identified. We provide an empirical data analysis to illustrate an application of the proposed method. A simulation study was also provided to evaluate the performance of the proposed method in comparison to an existing method.
△ Less
Submitted 1 June, 2018; v1 submitted 20 December, 2016;
originally announced December 2016.