Search | arXiv e-print repository

Question-Answering System Extracts Information on Injection Drug Use from Clinical Notes

Authors: Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana Martins, Jodie Trafton, Edmon Begoli, Gregory Peterson

Abstract: Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU infor… ▽ More Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU information can be indicated is unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. Methods: To address this gap in clinical information, we design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical notes. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We utilize 2323 clinical notes of 1145 patients sourced from the VA Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information on temporally out-of-distribution data. Results: Here we show that for a strict match between gold-standard and predicted answers, the QA model achieves 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. Conclusions: Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care. △ Less

Submitted 28 December, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 31 pages, 11 tables, 7 figures

arXiv:2202.13174 [pdf, other]

doi 10.1093/bioinformatics/btac508

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

Authors: Maria Mahbub, Sudarshan Srinivasan, Edmon Begoli, Gregory D Peterson

Abstract: Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requ… ▽ More Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets -- BioASQ-7b, BioASQ-8b, and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. Availability: BioADAPT-MRC is freely available as an open-source project at \url{https://github.com/mmahbub/BioADAPT-MRC}. △ Less

Submitted 26 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: 31 pages, 9 figures. This is the Authors' Original Version of the article, which has been accepted for publication in Bioinformatics 2022

arXiv:2012.14312 [pdf, other]

Panarchy: ripples of a boundary concept

Authors: Juan Rocha, Linda Luvuno, Jesse Rieb, Erin Crockett, Katja Malmborg, Michael Schoon, Garry Peterson

Abstract: How do social-ecological systems change over time? In 2002 Holling and colleagues proposed the concept of Panarchy, which presented social-ecological systems as an interacting set of adaptive cycles, each of which is produced by the dynamic tensions between novelty and efficiency at multiple scales. Initially introduced as a conceptual framework and set of metaphors, panarchy has gained the attent… ▽ More How do social-ecological systems change over time? In 2002 Holling and colleagues proposed the concept of Panarchy, which presented social-ecological systems as an interacting set of adaptive cycles, each of which is produced by the dynamic tensions between novelty and efficiency at multiple scales. Initially introduced as a conceptual framework and set of metaphors, panarchy has gained the attention of scholars across many disciplines and its ideas continue to inspire further conceptual developments. Almost twenty years after this concept was introduced we review how it has been used, tested, extended and revised. We do this by combining qualitative methods and machine learning. Document analysis was used to code panarchy features that are commonly used in the scientific literature (N = 42), a qualitative analysis that was complemented with topic modeling of 2177 documents. We find that the adaptive cycle is the feature of panarchy that has attracted the most attention. Challenges remain in empirically grounding the metaphor, but recent theoretical and empirical work offers some avenues for future research. △ Less

Submitted 28 December, 2020; originally announced December 2020.

Comments: 11 pages, 5 figures

arXiv:1807.09632 [pdf, other]

doi 10.1145/3219104.3229289

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

Authors: Eduardo Ponce, Brittany Stephenson, Suzanne Lenhart, Judy Day, Gregory D. Peterson

Abstract: The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulati… ▽ More The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization. △ Less

Submitted 25 July, 2018; originally announced July 2018.

Comments: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, USA

arXiv:1305.0305 [pdf]

Network-Centric Quantum Communications with Application to Critical Infrastructure Protection

Authors: Richard J. Hughes, Jane E. Nordholt, Kevin P. McCabe, Raymond T. Newell, Charles G. Peterson, Rolando D. Somma

Abstract: Network-centric quantum communications (NQC) - a new, scalable instantiation of quantum cryptography providing key management with forward security for lightweight encryption, authentication and digital signatures in optical networks - is briefly described. Results from a multi-node experimental test-bed utilizing integrated photonics quantum communications components, known as QKarDs, include: qu… ▽ More Network-centric quantum communications (NQC) - a new, scalable instantiation of quantum cryptography providing key management with forward security for lightweight encryption, authentication and digital signatures in optical networks - is briefly described. Results from a multi-node experimental test-bed utilizing integrated photonics quantum communications components, known as QKarDs, include: quantum identification; verifiable quantum secret sharing; multi-party authenticated key establishment, including group keying; and single-fiber quantum-secured communications that can be applied as a security retrofit/upgrade to existing optical fiber installations. A demonstration that NQC meets the challenging simultaneous latency and security requirements of electric grid control communications, which cannot be met without compromises using conventional cryptography, is described. △ Less

Submitted 1 May, 2013; originally announced May 2013.

Comments: 7 pages, 3 figures

Report number: LA-UR-13-22718 (version 2)

arXiv:1112.5906 [pdf, other]

Power-law distribution functions derived from maximum entropy and a symmetry relationship

Authors: G. J. Peterson, K. A. Dill

Abstract: Power-law distributions are common, particularly in social physics. Here, we explore whether power-laws might arise as a consequence of a general variational principle for stochastic processes. We describe communities of 'social particles', where the cost of adding a particle to the community is shared equally between the particle joining the cluster and the particles that are already members of t… ▽ More Power-law distributions are common, particularly in social physics. Here, we explore whether power-laws might arise as a consequence of a general variational principle for stochastic processes. We describe communities of 'social particles', where the cost of adding a particle to the community is shared equally between the particle joining the cluster and the particles that are already members of the cluster. Power-law probability distributions of community sizes arise as a natural consequence of the maximization of entropy, subject to this 'equal cost sharing' rule. We also explore a generalization in which there is unequal sharing of the costs of joining a community. Distributions change smoothly from exponential to power-law as a function of a sharing-inequality quantity. This work gives an interpretation of power-law distributions in terms of shared costs. △ Less

Submitted 26 December, 2011; originally announced December 2011.

Comments: 5 pages, 1 figure

arXiv:1009.0574 [pdf, other]

doi 10.1073/pnas.1010757107

Nonuniversal power law scaling in the probability distribution of scientific citations

Authors: G. J. Peterson, S. Pressé, K. A. Dill

Abstract: We develop a model for the distribution of scientific citations. The model involves a dual mechanism: in the direct mechanism, the author of a new paper finds an old paper A and cites it. In the indirect mechanism, the author of a new paper finds an old paper A only via the reference list of a newer intermediary paper B, which has previously cited A. By comparison to citation databases, we find th… ▽ More We develop a model for the distribution of scientific citations. The model involves a dual mechanism: in the direct mechanism, the author of a new paper finds an old paper A and cites it. In the indirect mechanism, the author of a new paper finds an old paper A only via the reference list of a newer intermediary paper B, which has previously cited A. By comparison to citation databases, we find that papers having few citations are cited mainly by the direct mechanism. Papers already having many citations ('classics') are cited mainly by the indirect mechanism. The indirect mechanism gives a power-law tail. The 'tipping point' at which a paper becomes a classic is about 21 citations for papers published in the Institute for Scientific Information (ISI) Web of Science database in 1981, 29 for Physical Review D papers published from 1975-1994, and 39 for all publications from a list of high h-index chemists assembled in 2007. The power-law exponent is not universal. Individuals who are highly cited have a systematically smaller exponent than individuals who are less cited. △ Less

Submitted 7 March, 2011; v1 submitted 2 September, 2010; originally announced September 2010.

Comments: 7 pages, 3 figures, 2 tables

Journal ref: PNAS 107 (2010) 16023-16027

Showing 1–7 of 7 results for author: Peterson, G