Skip to main content

Showing 1–27 of 27 results for author: Page, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.12964  [pdf, other

    cs.CV cs.AI cs.LG

    Training Video Foundation Models with NVIDIA NeMo

    Authors: Zeeshan Patel, Ethan He, Parth Mannan, Xiaowei Ren, Ryan Wolf, Niket Agarwal, Jacob Huffman, Zhuoyao Wang, Carl Wang, Jack Chang, Yan Bai, Tommy Huang, Linnan Wang, Sahil Jain, Shanmugam Ramasamy, Joseph Jennings, Ekaterina Sirazitdinova, Oleg Sudakov, Mingyuan Ma, Bobby Chen, Forrest Lin, Hao Wang, Vasanth Rao Naik Sabavat, Sriharsha Niverty, Rong Ou , et al. (4 additional authors not shown)

    Abstract: Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  2. arXiv:2501.03575  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Cosmos World Foundation Model Platform for Physical AI

    Authors: NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman , et al. (54 additional authors not shown)

    Abstract: Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into cu… ▽ More

    Submitted 18 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  3. arXiv:2408.01869  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.MA q-bio.QM

    MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance

    Authors: Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page

    Abstract: In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (A… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Paper published at Machine Learning for Healthcare 2024 (MLHC'24)

  4. arXiv:2404.14870  [pdf, other

    cs.CR cs.SE

    Super Mario in the Pernicious Kingdoms: Classifying glitches in old games

    Authors: Llewellyn Forward, Io Limmer, Joseph Hallett, Dan Page

    Abstract: In a case study spanning four classic Super Mario games and the analysis of 237 known glitches within them, we classify a variety of weaknesses that are exploited by speedrunners to enable them to beat games quickly and in surprising ways. Using the Seven Pernicious Kingdoms software defect taxonomy and the Common Weakness Enumeration, we categorize the glitches by the weaknesses that enable them.… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Presented at the 8th International Workshop on Games and Software Engineering (GAS), April 14 2024. Co-located with ICSE

  5. arXiv:2312.02017  [pdf, other

    eess.IV cs.CV physics.med-ph

    A multi-channel cycleGAN for CBCT to CT synthesis

    Authors: Chelsea A. H. Sargeant, Edward G. A. Henderson, Dónal M. McSweeney, Aaron G. Rankin, Denis Page

    Abstract: Image synthesis is used to generate synthetic CTs (sCTs) from on-treatment cone-beam CTs (CBCTs) with a view to improving image quality and enabling accurate dose computation to facilitate a CBCT-based adaptive radiotherapy workflow. As this area of research gains momentum, developments in sCT generation methods are difficult to compare due to the lack of large public datasets and sizeable variati… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: RRRocket_Lollies submission for the Synthesizing computed tomography for radiotherapy (SynthRAD2023) Challenge at MICCAI 2023

  6. arXiv:2312.01521  [pdf, other

    cs.AI cs.LG cs.PL

    Neural Markov Prolog

    Authors: Alexander Thomson, David Page

    Abstract: The recent rapid advance of AI has been driven largely by innovations in neural network architectures. A concomitant concern is how to understand these resulting systems. In this paper, we propose a tool to assist in both the design of further innovative architectures and the simple yet precise communication of their structure. We propose the language Neural Markov Prolog (NMP), based on both Mark… ▽ More

    Submitted 27 November, 2023; originally announced December 2023.

    Comments: 13 pages, 4 figures

  7. arXiv:2310.06237  [pdf, other

    cs.LG cs.CR

    Differentially Private Multi-Site Treatment Effect Estimation

    Authors: Tatsuki Koga, Kamalika Chaudhuri, David Page

    Abstract: Patient privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, litera… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 16 pages

  8. arXiv:2305.17583  [pdf, other

    stat.ML cs.LG

    On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

    Authors: Boyao Li, Alexander J. Thomson, Houssam Nassif, Matthew M. Engelhard, David Page

    Abstract: Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are prec… ▽ More

    Submitted 17 January, 2025; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2024

    Journal ref: Conference on Neural Information Processing Systems (NeurIPS'24), Vancouver, BC, pp. 4598-4628, 2024

  9. arXiv:2302.11715  [pdf, other

    stat.ME cs.LG econ.EM

    Variable Importance Matching for Causal Inference

    Authors: Quinn Lanners, Harsh Parikh, Alexander Volfovsky, Cynthia Rudin, David Page

    Abstract: Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the… ▽ More

    Submitted 28 June, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1174-1184, 2023

  10. arXiv:2103.10916  [pdf, other

    cs.LG

    Predicting Drug-Drug Interactions from Heterogeneous Data: An Embedding Approach

    Authors: Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli, David Page, Sriraam Natarajan

    Abstract: Predicting and discovering drug-drug interactions (DDIs) using machine learning has been studied extensively. However, most of the approaches have focused on text data or textual representation of the drug structures. We present the first work that uses multiple data sources such as drug structure images, drug structure string representation and relational representation of drug relationships as t… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: 10 pages, 6 figures, Accepted as a short paper to 'Artificial Intelligence in Medicine 2021'

  11. arXiv:2011.09497  [pdf

    cs.LG

    High-Throughput Approach to Modeling Healthcare Costs Using Electronic Healthcare Records

    Authors: Alex Taylor, Ross Kleiman, Scott Hebbring, Peggy Peissig, David Page

    Abstract: Accurate estimation of healthcare costs is crucial for healthcare systems to plan and effectively negotiate with insurance companies regarding the coverage of patient-care costs. Greater accuracy in estimating healthcare costs would provide mutual benefit for both health systems and the insurers that support these systems by better aligning payment models with patient-care costs. This study presen… ▽ More

    Submitted 1 June, 2022; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: 4 pages, 3 figures

  12. arXiv:2005.06462  [pdf, other

    cs.LG stat.ML

    Temporal Poisson Square Root Graphical Models

    Authors: Sinong Geng, Zhaobin Kuang, Peggy Peissig, David Page

    Abstract: We propose temporal Poisson square root graphical models (TPSQRs), a generalization of Poisson square root graphical models (PSQRs) specifically designed for modeling longitudinal event data. By estimating the temporal relationships for all possible pairs of event types, TPSQRs can offer a holistic perspective about whether the occurrences of any given event type could excite or inhibit any other… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  13. arXiv:2005.06083  [pdf, other

    cs.LG stat.ML

    Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

    Authors: Sinong Geng, Zhaobin Kuang, Jie Liu, Stephen Wright, David Page

    Abstract: We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference. To address these challenges, we consider a stochastic learning framework called stochastic proximal gradient (SPG; Honorio 2012a, Atchade et al. 2014,Miasojedow and Rejchel… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  14. arXiv:2003.03626  [pdf

    q-bio.NC cs.RO

    Discrimination Among Multiple Cutaneous and Proprioceptive Hand Percepts Evoked by Nerve Stimulation with Utah Slanted Electrode Arrays in Human Amputees

    Authors: David M. Page, Suzanne M. Wendelken, Tyler S. Davis, David T. Kluger, Douglas T. Hutchinson, Jacob A. George, Gregory A. Clark

    Abstract: Objective: This paper aims to demonstrate functional discriminability among restored hand sensations with different locations, qualities, and intensities that are evoked by microelectrode stimulation of residual afferent fibers in human amputees. Methods: We implanted a Utah Slanted Electrode Array (USEA) in the median and ulnar residual arm nerves of three transradial amputees and delivered stimu… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 19 pages

  15. arXiv:2002.07906  [pdf, other

    cs.LG stat.ML

    CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods

    Authors: Wei Zhang, Thomas Kobber Panum, Somesh Jha, Prasad Chalasani, David Page

    Abstract: We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger causality across a wide variety of event sequences with diverse event interdependency. To address these weaknesses, we propose CAUSE (Causality… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  16. AutoBlock: A Hands-off Blocking Framework for Entity Matching

    Authors: Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page

    Abstract: Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human e… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

    Comments: In The Thirteenth ACM International Conference on Web Search and Data Mining (WSDM '20), February 3-7, 2020, Houston, TX, USA. ACM, Anchorage, Alaska, USA , 9 pages

  17. arXiv:1911.06356  [pdf, other

    cs.LG stat.ML

    Beyond Textual Data: Predicting Drug-Drug Interactions from Molecular Structure Images using Siamese Neural Networks

    Authors: Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli, David Page, Sriraam Natarajan

    Abstract: Predicting and discovering drug-drug interactions (DDIs) is an important problem and has been studied extensively both from medical and machine learning point of view. Almost all of the machine learning approaches have focused on text data or textual representation of the structural data of drugs. We present the first work that uses drug structure images as the input and utilizes a Siamese convolu… ▽ More

    Submitted 29 June, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: 9 pages, 9 figures

  18. arXiv:1907.01901  [pdf, other

    q-bio.QM cs.LG stat.ML

    High-Throughput Machine Learning from Electronic Health Records

    Authors: Ross S. Kleiman, Paul S. Bennett, Peggy L. Peissig, Richard L. Berg, Zhaobin Kuang, Scott J. Hebbring, Michael D. Caldwell, David Page

    Abstract: The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, we currently know very little about how accurately… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

  19. arXiv:1906.05255  [pdf, other

    cs.IR q-bio.QM

    A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications

    Authors: Finn Kuusisto, John Steill, Zhaobin Kuang, James Thomson, David Page, Ron Stewart

    Abstract: We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the seco… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Journal ref: AMIA Joint Summits on Translational Science Proceedings (2017) 166-174

  20. arXiv:1905.02121  [pdf, other

    q-bio.QM cs.LG q-bio.TO stat.ML

    Machine Learning to Predict Developmental Neurotoxicity with High-throughput Data from 2D Bio-engineered Tissues

    Authors: Finn Kuusisto, Vitor Santos Costa, Zhonggang Hou, James Thomson, David Page, Ron Stewart

    Abstract: There is a growing need for fast and accurate methods for testing developmental neurotoxicity across several chemical exposure sources. Current approaches, such as in vivo animal studies, and assays of animal and human primary cell cultures, suffer from challenges related to time, cost, and applicability to human physiology. We previously demonstrated success employing machine learning to predict… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  21. arXiv:1811.08695  [pdf, other

    cs.LG stat.ML

    Privacy-Preserving Collaborative Prediction using Random Forests

    Authors: Irene Giacomelli, Somesh Jha, Ross Kleiman, David Page, Kyonghwan Yoon

    Abstract: We study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially important in privacy sensitive applications such as learning predictive models for clinical decision support from EHR data from different clini… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: Accepted at the AMIA Informatics Summit 2019

  22. An Inductive Logic Programming Approach to Validate Hexose Binding Biochemical Knowledge

    Authors: Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, David Page

    Abstract: Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. They incorporate different parts of these findings in predictive black-box models. We investigate the empirical support for biochemical findings b… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Journal ref: International Conference on Inductive Logic Programming (ILP'09), Leuven, Belgium, pp. 149-165, 2009

  23. arXiv:1212.2519  [pdf

    cs.AI

    CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

    Authors: Vitor Santos Costa, David Page, Maleeha Qazi, James Cussens

    Abstract: We present CLP(BN), a novel approach that aims at expressing Bayesian networks through the constraint logic programming framework. Arguably, an important limitation of traditional Bayesian networks is that they are propositional, and thus cannot represent relations between multiple similar objects in multiple contexts. Several researchers have thus proposed first-order language… ▽ More

    Submitted 19 October, 2012; originally announced December 2012.

    Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

    Report number: UAI-P-2003-PG-517-524

  24. arXiv:1210.4868  [pdf

    stat.ME cs.CE stat.AP

    Graphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies

    Authors: Jie Liu, Chunming Zhang, Catherine McCarty, Peggy Peissig, Elizabeth Burnside, David Page

    Abstract: Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-511-522

  25. arXiv:1206.6399  [pdf

    cs.LG cs.AI stat.ML

    Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events

    Authors: Jesse Davis, Vitor Santos Costa, Peggy Peissig, Michael Caldwell, Elizabeth Berg, David Page

    Abstract: Learning from electronic medical records (EMR) is challenging due to their relational nature and the uncertain dependence between a patient's past and future health status. Statistical relational learning is a natural fit for analyzing EMRs but is less adept at handling their inherent latent structure, such as connections between related medications or diseases. One way to capture the latent struc… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  26. arXiv:1206.5271  [pdf

    cs.AI

    Learning Bayesian Network Structure from Correlation-Immune Data

    Authors: Eric Lantz, Soumya Ray, David Page

    Abstract: Searching the complete space of possible Bayesian networks is intractable for problems of interesting size, so Bayesian network structure learning algorithms, such as the commonly used Sparse Candidate algorithm, employ heuristics. However, these heuristics also restrict the types of relationships that can be learned exclusively from data. They are unable to learn relationships that exhibit "corre… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-235-242

  27. arXiv:1206.4667  [pdf, other

    cs.LG cs.AI cs.IR

    Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

    Authors: Kendrick Boyd, Vitor Santos Costa, Jesse Davis, David Page

    Abstract: Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable… ▽ More

    Submitted 18 July, 2012; v1 submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012, fixed citations to use correct tech report number