-
GenAI-Powered Inference
Authors:
Kosuke Imai,
Kentaro Nakamura
Abstract:
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that cap…
▽ More
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments
Authors:
Kosuke Imai,
Kentaro Nakamura
Abstract:
In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effec…
▽ More
In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings in which the treatment feature is based on human perception. The proposed GPI methodology is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.
△ Less
Submitted 2 July, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Aortic root landmark localization with optimal transport loss for heatmap regression
Authors:
Tsuyoshi Ishizone,
Masaki Miyasaka,
Sae Ochi,
Norio Tada,
Kazuyuki Nakamura
Abstract:
Anatomical landmark localization is gaining attention to ease the burden on physicians. Focusing on aortic root landmark localization, the three hinge points of the aortic valve can reduce the burden by automatically determining the valve size required for transcatheter aortic valve implantation surgery. Existing methods for landmark prediction of the aortic root mainly use time-consuming two-step…
▽ More
Anatomical landmark localization is gaining attention to ease the burden on physicians. Focusing on aortic root landmark localization, the three hinge points of the aortic valve can reduce the burden by automatically determining the valve size required for transcatheter aortic valve implantation surgery. Existing methods for landmark prediction of the aortic root mainly use time-consuming two-step estimation methods. We propose a highly accurate one-step landmark localization method from even coarse images. The proposed method uses an optimal transport loss to break the trade-off between prediction precision and learning stability in conventional heatmap regression methods. We apply the proposed method to the 3D CT image dataset collected at Sendai Kousei Hospital and show that it significantly improves the estimation error over existing methods and other loss functions. Our code is available on GitHub.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
The Application of Zig-Zag Sampler in Sequential Markov Chain Monte Carlo
Authors:
Yu Han,
Kazuyuki Nakamura
Abstract:
Particle filtering methods are widely applied in sequential state estimation within nonlinear non-Gaussian state space model. However, the traditional particle filtering methods suffer the weight degeneracy in the high-dimensional state space model. Currently, there are many methods to improve the performance of particle filtering in high-dimensional state space model. Among these, the more advanc…
▽ More
Particle filtering methods are widely applied in sequential state estimation within nonlinear non-Gaussian state space model. However, the traditional particle filtering methods suffer the weight degeneracy in the high-dimensional state space model. Currently, there are many methods to improve the performance of particle filtering in high-dimensional state space model. Among these, the more advanced method is to construct the Sequential Makov chian Monte Carlo (SMCMC) framework by implementing the Composite Metropolis-Hasting (MH) Kernel. In this paper, we proposed to discrete the Zig-Zag Sampler and apply the Zig-Zag Sampler in the refinement stage of the Composite MH Kernel within the SMCMC framework which is implemented the invertible particle flow in the joint draw stage. We evaluate the performance of proposed method through numerical experiments of the challenging complex high-dimensional filtering examples. Nemurical experiments show that in high-dimensional state estimation examples, the proposed method improves estimation accuracy and increases the acceptance ratio compared with state-of-the-art filtering methods.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Health improvement framework for planning actionable treatment process using surrogate Bayesian model
Authors:
Kazuki Nakamura,
Ryosuke Kojima,
Eiichiro Uchino,
Koichi Murashita,
Ken Itoh,
Shigeyuki Nakaji,
Yasushi Okuno
Abstract:
Clinical decision making regarding treatments based on personal characteristics leads to effective health improvements. Machine learning (ML) has been the primary concern of diagnosis support according to comprehensive patient information. However, the remaining prominent issue is the development of objective treatment processes in clinical situations. This study proposes a novel framework to plan…
▽ More
Clinical decision making regarding treatments based on personal characteristics leads to effective health improvements. Machine learning (ML) has been the primary concern of diagnosis support according to comprehensive patient information. However, the remaining prominent issue is the development of objective treatment processes in clinical situations. This study proposes a novel framework to plan treatment processes in a data-driven manner. A key point of the framework is the evaluation of the "actionability" for personal health improvements by using a surrogate Bayesian model in addition to a high-performance nonlinear ML model. We first evaluated the framework from the viewpoint of its methodology using a synthetic dataset. Subsequently, the framework was applied to an actual health checkup dataset comprising data from 3,132 participants, to improve systolic blood pressure values at the individual level. We confirmed that the computed treatment processes are actionable and consistent with clinical knowledge for lowering blood pressure. These results demonstrate that our framework could contribute toward decision making in the medical field, providing clinicians with deeper insights.
△ Less
Submitted 13 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Ensemble Kalman Variational Objectives: Nonlinear Latent Trajectory Inference with A Hybrid of Variational Inference and Ensemble Kalman Filter
Authors:
Tsuyoshi Ishizone,
Tomoyuki Higuchi,
Kazuyuki Nakamura
Abstract:
Variational inference (VI) combined with Bayesian nonlinear filtering produces state-of-the-art results for latent time-series modeling. A body of recent work has focused on sequential Monte Carlo (SMC) and its variants, e.g., forward filtering backward simulation (FFBSi). Although these studies have succeeded, serious problems remain in particle degeneracy and biased gradient estimators. In this…
▽ More
Variational inference (VI) combined with Bayesian nonlinear filtering produces state-of-the-art results for latent time-series modeling. A body of recent work has focused on sequential Monte Carlo (SMC) and its variants, e.g., forward filtering backward simulation (FFBSi). Although these studies have succeeded, serious problems remain in particle degeneracy and biased gradient estimators. In this paper, we propose Ensemble Kalman Variational Objective (EnKO), a hybrid method of VI and the ensemble Kalman filter (EnKF), to infer state space models (SSMs). Our proposed method can efficiently identify latent dynamics because of its particle diversity and unbiased gradient estimators. We demonstrate that our EnKO outperforms SMC-based methods in terms of predictive ability and particle efficiency for three benchmark nonlinear system identification tasks.
△ Less
Submitted 9 November, 2021; v1 submitted 17 October, 2020;
originally announced October 2020.
-
Stochastic batch size for adaptive regularization in deep network optimization
Authors:
Kensuke Nakamura,
Stefano Soatto,
Byung-Woo Hong
Abstract:
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter followi…
▽ More
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter following a distribution of gradient norms in consideration of their local and global properties in the neural network architecture where the range of gradient norms may vary within and across layers. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets. The quantitative evaluation indicates that our algorithm outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the selection of regularity.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Real-time Linear Operator Construction and State Estimation with the Kalman Filter
Authors:
Tsuyoshi Ishizone,
Kazuyuki Nakamura
Abstract:
The Kalman filter is the most powerful tool for estimation of the states of a linear Gaussian system. In addition, using this method, an expectation maximization algorithm can be used to estimate the parameters of the model. However, this algorithm cannot function in real time. Thus, we propose a new method that can be used to estimate the transition matrices and the states of the system in real t…
▽ More
The Kalman filter is the most powerful tool for estimation of the states of a linear Gaussian system. In addition, using this method, an expectation maximization algorithm can be used to estimate the parameters of the model. However, this algorithm cannot function in real time. Thus, we propose a new method that can be used to estimate the transition matrices and the states of the system in real time. The proposed method uses three ideas: estimation in an observation space, a time-invariant interval, and an online learning framework. Applied to damped oscillation model, we have obtained extraordinary performance to estimate the matrices. In addition, by introducing localization and spatial uniformity to the proposed method, we have demonstrated that noise can be reduced in high-dimensional spatio-temporal data. Moreover, the proposed method has potential for use in areas such as weather forecasting and vector field analysis.
△ Less
Submitted 28 May, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
Adaptive Weight Decay for Deep Neural Networks
Authors:
Kensuke Nakamura,
Byung-Woo Hong
Abstract:
Regularization in the optimization of deep neural networks is often critical to avoid undesirable over-fitting leading to better generalization of model. One of the most popular regularization algorithms is to impose L-2 penalty on the model parameters resulting in the decay of parameters, called weight-decay, and the decay rate is generally constant to all the model parameters in the course of op…
▽ More
Regularization in the optimization of deep neural networks is often critical to avoid undesirable over-fitting leading to better generalization of model. One of the most popular regularization algorithms is to impose L-2 penalty on the model parameters resulting in the decay of parameters, called weight-decay, and the decay rate is generally constant to all the model parameters in the course of optimization. In contrast to the previous approach based on the constant rate of weight-decay, we propose to consider the residual that measures dissimilarity between the current state of model and observations in the determination of the weight-decay for each parameter in an adaptive way, called adaptive weight-decay (AdaDecay) where the gradient norms are normalized within each layer and the degree of regularization for each parameter is determined in proportional to the magnitude of its gradient using the sigmoid function. We empirically demonstrate the effectiveness of AdaDecay in comparison to the state-of-the-art optimization algorithms using popular benchmark datasets: MNIST, Fashion-MNIST, and CIFAR-10 with conventional neural network models ranging from shallow to deep. The quantitative evaluation of our proposed algorithm indicates that AdaDecay improves generalization leading to better accuracy across all the datasets and models.
△ Less
Submitted 7 August, 2019; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Learning partially ranked data based on graph regularization
Authors:
Kento Nakamura,
Keisuke Yano,
Fumiyasu Komaki
Abstract:
Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechani…
▽ More
Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechanisms together with a simple estimation procedure. Our estimation procedure leverages a graph regularization in conjunction with the Expectation-Maximization algorithm. Our estimation procedure is theoretically guaranteed to have the convergence properties. We reduce a modeling bias by allowing a non-ignorable missing mechanism. In addition, we avoid the inherent complexity within a non-ignorable missing mechanism by introducing a graph regularization. The experimental results demonstrate that the proposed estimators work well under non-ignorable missing mechanisms.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.