Search | arXiv e-print repository

Assumption-Lean Post-Integrated Inference with Negative Control Outcomes

Authors: Jin-Hong Du, Kathryn Roeder, Larry Wasserman

Abstract: Data integration methods aim to extract low-dimensional embeddings from high-dimensional outcomes to remove unwanted variations, such as batch effects and unmeasured covariates, across heterogeneous datasets. However, multiple hypothesis testing after integration can be biased due to data-dependent processes. We introduce a robust post-integrated inference (PII) method that adjusts for latent hete… ▽ More Data integration methods aim to extract low-dimensional embeddings from high-dimensional outcomes to remove unwanted variations, such as batch effects and unmeasured covariates, across heterogeneous datasets. However, multiple hypothesis testing after integration can be biased due to data-dependent processes. We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. Leveraging causal interpretations, we derive nonparametric identifiability of the direct effects, which motivates our semiparametric inference method. Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators. These estimands remain statistically meaningful under model misspecifications and with error-prone embeddings. We provide bias quantifications and finite-sample linear expansions with uniform concentration bounds. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification, facilitating data-adaptive estimation with machine learning algorithms. Our proposal is evaluated with random forests through simulations and analysis of single-cell CRISPR perturbed datasets with potential unmeasured confounders. △ Less

Submitted 24 November, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

Comments: 22 pages for the main text, 27 pages for the appendix, 8 figures for the main text, 5 figures for the appendix

arXiv:2409.13997 [pdf, other]

Drift to Remember

Authors: Jin Du, Xinhe Zhang, Hao Shen, Xun Xian, Ganghua Wang, Jiawei Zhang, Yuhong Yang, Na Li, Jia Liu, Jie Ding

Abstract: Lifelong learning in artificial intelligence (AI) aims to mimic the biological brain's ability to continuously learn and retain knowledge, yet it faces challenges such as catastrophic forgetting. Recent neuroscience research suggests that neural activity in biological systems undergoes representational drift, where neural responses evolve over time, even with consistent inputs and tasks. We hypoth… ▽ More Lifelong learning in artificial intelligence (AI) aims to mimic the biological brain's ability to continuously learn and retain knowledge, yet it faces challenges such as catastrophic forgetting. Recent neuroscience research suggests that neural activity in biological systems undergoes representational drift, where neural responses evolve over time, even with consistent inputs and tasks. We hypothesize that representational drift can alleviate catastrophic forgetting in AI during new task acquisition. To test this, we introduce DriftNet, a network designed to constantly explore various local minima in the loss landscape while dynamically retrieving relevant tasks. This approach ensures efficient integration of new information and preserves existing knowledge. Experimental studies in image classification and natural language processing demonstrate that DriftNet outperforms existing models in lifelong learning. Importantly, DriftNet is scalable in handling a sequence of tasks such as sentiment analysis and question answering using large language models (LLMs) with billions of parameters on a single Nvidia A100 GPU. DriftNet efficiently updates LLMs using only new data, avoiding the need for full dataset retraining. Tested on GPT-2 and RoBERTa, DriftNet is a robust, cost-effective solution for lifelong learning in LLMs. This study not only advances AI systems to emulate biological learning, but also provides insights into the adaptive mechanisms of biological neural systems, deepening our understanding of lifelong learning in nature. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2311.08255 [pdf]

Neural Dynamics of Delayed Feedback in Robot Teleoperation: Insights from fNIRS Analysis

Authors: Tianyu Zhou, Yang Ye, Qi Zhu, William Vann, Jing Du

Abstract: As robot teleoperation increasingly becomes integral in executing tasks in distant, hazardous, or inaccessible environments, the challenge of operational delays remains a significant obstacle. These delays are inherent in signal transmission and processing and can adversely affect the operators performance, particularly in tasks requiring precision and timeliness. While current research has made s… ▽ More As robot teleoperation increasingly becomes integral in executing tasks in distant, hazardous, or inaccessible environments, the challenge of operational delays remains a significant obstacle. These delays are inherent in signal transmission and processing and can adversely affect the operators performance, particularly in tasks requiring precision and timeliness. While current research has made strides in mitigating these delays through advanced control strategies and training methods, a crucial gap persists in understanding the neurofunctional impacts of these delays and the efficacy of countermeasures from a cognitive perspective. Our study narrows this gap by leveraging functional Near-Infrared Spectroscopy (fNIRS) to examine the neurofunctional implications of simulated haptic feedback on cognitive activity and motor coordination under delayed conditions. In a human-subject experiment (N=41), we manipulated sensory feedback to observe its influences on various brain regions of interest (ROIs) response during teleoperation tasks. The fNIRS data provided a detailed assessment of cerebral activity, particularly in ROIs implicated in time perception and the execution of precise movements. Our results reveal that certain conditions, which provided immediate simulated haptic feedback, significantly optimized neural functions related to time perception and motor coordination, and improved motor performance. These findings provide empirical evidence about the neurofunctional basis of the enhanced motor performance with simulated synthetic force feedback in the presence of teleoperation delays. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: Submitted to Frontiers in Human Neuroscience

arXiv:2309.07261 [pdf, other]

Simultaneous inference for generalized linear models with unmeasured confounders

Authors: Jin-Hong Du, Larry Wasserman, Kathryn Roeder

Abstract: Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under… ▽ More Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover the latent coefficients. Subsequently, latent factors and primary effects are jointly estimated through lasso-type optimization. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish the identification conditions of various effects and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model. △ Less

Submitted 15 March, 2025; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Main text: 23 pages and 6 figures; appendix: 50 pages and 12 figures

arXiv:2109.06123 [pdf, other]

Knowledge Graph-based Neurodegenerative Diseases and Diet Relationship Discovery

Authors: Yi Nian, Jingcheng Du, Larry Bu, Fang Li, Xinyue Hu, Yuji Zhang, Cui Tao

Abstract: To date, there are no effective treatments for most neurodegenerative diseases. However, certain foods may be associated with these diseases and bring an opportunity to prevent or delay neurodegenerative progression. Our objective is to construct a knowledge graph for neurodegenerative diseases using literature mining to study their relations with diet. We collected biomedical annotations (Disease… ▽ More To date, there are no effective treatments for most neurodegenerative diseases. However, certain foods may be associated with these diseases and bring an opportunity to prevent or delay neurodegenerative progression. Our objective is to construct a knowledge graph for neurodegenerative diseases using literature mining to study their relations with diet. We collected biomedical annotations (Disease, Chemical, Gene, Species, SNP&Mutation) in the abstracts from 4,300 publications relevant to both neurodegenerative diseases and diet using PubTator, an NIH-supported tool that can extract biomedical concepts from literature. A knowledge graph was created from these annotations. Graph embeddings were then trained with the node2vec algorithm to support potential concept clustering and similar concept identification. We found several food-related species and chemicals that might come from diet and have an impact on neurodegenerative diseases. △ Less

Submitted 25 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by CIBB 2021 (The 17th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics)

arXiv:1810.00387 [pdf, other]

Aspiration dynamics generate robust predictions in structured populations

Authors: Lei Zhou, Bin Wu, Jinming Du, Long Wang

Abstract: Evolutionary game dynamics in structured populations are strongly affected by updating rules. Previous studies usually focus on imitation-based rules, which rely on payoff information of social peers. Recent behavioral experiments suggest that whether individuals use such social information for strategy updating may be crucial to the outcomes of social interactions. This hints at the importance of… ▽ More Evolutionary game dynamics in structured populations are strongly affected by updating rules. Previous studies usually focus on imitation-based rules, which rely on payoff information of social peers. Recent behavioral experiments suggest that whether individuals use such social information for strategy updating may be crucial to the outcomes of social interactions. This hints at the importance of considering updating rules without dependence on social peers' payoff information, which, however, is rarely investigated. Here, we study aspiration-based self-evaluation rules, with which individuals self-assess the performance of strategies by comparing own payoffs with an imaginary value they aspire, called the aspiration level. We explore the fate of strategies on population structures represented by graphs or networks. Under weak selection, we analytically derive the condition for strategy dominance, which is found to coincide with the classical condition of risk-dominance. This condition holds for all networks and all distributions of aspiration levels, and for individualized ways of self-evaluation. Our condition can be intuitively interpreted: one strategy prevails over the other if the strategy brings more satisfaction to individuals than the other does. Our work thus sheds light on the intrinsic difference between evolutionary dynamics induced by aspiration-based and imitation-based rules. △ Less

Submitted 30 September, 2018; originally announced October 2018.

arXiv:1402.5270 [pdf, other]

doi 10.1098/rsif.2014.0077

Aspiration Dynamics of Multi-player Games in Finite Populations

Authors: Jinming Du, Bin Wu, Philipp M. Altrock, Long Wang

Abstract: Studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics. In the former case, individuals imitate the strategy of a more successful peer. In the latter case, individuals adjust their strategies based on a comparison of their payoffs from the evolutionary game to a value they aspire, called the leve… ▽ More Studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics. In the former case, individuals imitate the strategy of a more successful peer. In the latter case, individuals adjust their strategies based on a comparison of their payoffs from the evolutionary game to a value they aspire, called the level of aspiration. Unlike imitation processes of pairwise comparison, aspiration-driven updates do not require additional information about the strategic environment and can thus be interpreted as being more spontaneous. Recent work has mainly focused on understanding how aspiration dynamics alter the evolutionary outcome in structured populations. However, the baseline case for understanding strategy selection is the well-mixed population case, which is still lacking sufficient understanding. We explore how aspiration-driven strategy-update dynamics under imperfect rationality influence the average abundance of a strategy in multi-player evolutionary games with two strategies. We analytically derive a condition under which a strategy is more abundant than the other in the weak selection limiting case. This approach has a long standing history in evolutionary game and is mostly applied for its mathematical approachability. Hence, we also explore strong selection numerically, which shows that our weak selection condition is a robust predictor of the average abundance of a strategy. The condition turns out to differ from that of a wide class of imitation dynamics, as long as the game is not dyadic. Therefore a strategy favored under imitation dynamics can be disfavored under aspiration dynamics. This does not require any population structure thus highlights the intrinsic difference between imitation and aspiration dynamics. △ Less

Submitted 21 February, 2014; originally announced February 2014.

Showing 1–7 of 7 results for author: Du, J