-
Ensemble Riemannian Data Assimilation over the Wasserstein Space
Authors:
Sagar K. Tamang,
Ardeshir Ebtehaj,
Peter J. Van Leeuwen,
Dongmian Zou,
Gilad Lerman
Abstract:
In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations -- enabling to formally penalize ge…
▽ More
In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations -- enabling to formally penalize geophysical biases in state-space with non-Gaussian distributions. The new approach is applied to dissipative and chaotic evolutionary dynamics and its potential advantages and limitations are highlighted compared to the classic variational and filtering data assimilation approaches under systematic and random errors.
△ Less
Submitted 24 March, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Regularized Variational Data Assimilation for Bias Treatment using the Wasserstein Metric
Authors:
Sagar K. Tamang,
Ardeshir Ebtehaj,
Dongmian Zou,
Gilad Lerman
Abstract:
This paper presents a new variational data assimilation (VDA) approach for the formal treatment of bias in both model outputs and observations. This approach relies on the Wasserstein metric stemming from the theory of optimal mass transport to penalize the distance between the probability histograms of the analysis state and an a priori reference dataset, which is likely to be more uncertain but…
▽ More
This paper presents a new variational data assimilation (VDA) approach for the formal treatment of bias in both model outputs and observations. This approach relies on the Wasserstein metric stemming from the theory of optimal mass transport to penalize the distance between the probability histograms of the analysis state and an a priori reference dataset, which is likely to be more uncertain but less biased than both model and observations. Unlike previous bias-aware VDA approaches, the new Wasserstein metric VDA (WM-VDA) dynamically treats systematic biases of unknown magnitude and sign in both model and observations through assimilation of the reference data in the probability domain and can fully recover the probability histogram of the analysis state. The performance of WM-VDA is compared with the classic three-dimensional VDA (3D-Var) scheme on first-order linear dynamics and the chaotic Lorenz attractor. Under positive systematic biases in both model and observations, we consistently demonstrate a significant reduction in the forecast bias and unbiased root mean squared error.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Snomed2Vec: Random Walk and Poincaré Embeddings of a Clinical Knowledge Base for Healthcare Analytics
Authors:
Khushbu Agarwal,
Tome Eftimov,
Raghavendra Addanki,
Sutanay Choudhury,
Suzanne Tamang,
Robert Rallo
Abstract:
Representation learning methods that transform encoded data (e.g., diagnosis and drug codes) into continuous vector spaces (i.e., vector embeddings) are critical for the application of deep learning in healthcare. Initial work in this area explored the use of variants of the word2vec algorithm to learn embeddings for medical concepts from electronic health records or medical claims datasets. We pr…
▽ More
Representation learning methods that transform encoded data (e.g., diagnosis and drug codes) into continuous vector spaces (i.e., vector embeddings) are critical for the application of deep learning in healthcare. Initial work in this area explored the use of variants of the word2vec algorithm to learn embeddings for medical concepts from electronic health records or medical claims datasets. We propose learning embeddings for medical concepts by using graph-based representation learning methods on SNOMED-CT, a widely popular knowledge graph in the healthcare domain with numerous operational and research applications. Current work presents an empirical analysis of various embedding methods, including the evaluation of their performance on multiple tasks of biomedical relevance (node classification, link prediction, and patient state prediction). Our results show that concept embeddings derived from the SNOMED-CT knowledge graph significantly outperform state-of-the-art embeddings, showing 5-6x improvement in ``concept similarity" and 6-20\% improvement in patient diagnosis.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
On Changes of Global Wet-bulb Temperature and Snowfall Regimes
Authors:
Sagar K. Tamang,
Ardeshir M. Ebtehaj,
Andreas F. Prein,
Andrew J. Heymsfield
Abstract:
To properly interpret the observed shrinkage of the Earth's cryosphere it is important to understand global changes of snowfall dominant regimes. To document these changes, three different reanalysis products of wet-bulb temperature together with observationally-based data sets are processed from 1979 to 2017. It is found that over the Northern Hemisphere (NH), the annual mean wet-bulb temperature…
▽ More
To properly interpret the observed shrinkage of the Earth's cryosphere it is important to understand global changes of snowfall dominant regimes. To document these changes, three different reanalysis products of wet-bulb temperature together with observationally-based data sets are processed from 1979 to 2017. It is found that over the Northern Hemisphere (NH), the annual mean wet-bulb temperature has increased at a rate of 0.34$^\circ$C per decade (pd) over land and 0.35$^\circ$C pd over ocean, resulting in a reduction of the annual mean potential areas of snowfall dominant regimes by 0.52/0.34 million km$^2$pd over land/ocean. However, the changes in the Southern Hemisphere (SH) are less conclusive and more uncertain. Among the K$ö$ppen-Geiger climate classes, the highest warming trend is observed over the NH polar climate regimes. Over studied mountain regions, the Alps are warming at a faster rate compared to the Rockies, Andes and High Mountain Asia (HMA). Due to such warming, potential snowfall areas over the Alps is reducing at 3.64% pd followed by Rockies at 2.81 and HMA at 1.85% pd. On average, these mountain ranges have lost 0.02 million km$^2$pd of potential snowfall areas. The NH potential snowfall areas is retracting towards the North pole over the Central Asia and Europe at a rate of 0.45 and 0.7 degree pd. Furthermore, terrestrial regions over the NH including the Great Plains in the United States, Canadian provinces around the Hudson Bay, Central Siberian and Tibetan Plateaus, are losing as much as 4% of the solid proportion of the annual precipitation amount pd.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
A Semi-Supervised Machine Learning Approach to Detecting Recurrent Metastatic Breast Cancer Cases Using Linked Cancer Registry and Electronic Medical Record Data
Authors:
Albee Y. Ling,
Allison W. Kurian,
Jennifer L. Caswell-Jin,
George W. Sledge Jr.,
Nigam H. Shah,
Suzanne R. Tamang
Abstract:
Objectives: Most cancer data sources lack information on metastatic recurrence. Electronic medical records (EMRs) and population-based cancer registries contain complementary information on cancer treatment and outcomes, yet are rarely used synergistically. To enable detection of metastatic breast cancer (MBC), we applied a semi-supervised machine learning framework to linked EMR-California Cancer…
▽ More
Objectives: Most cancer data sources lack information on metastatic recurrence. Electronic medical records (EMRs) and population-based cancer registries contain complementary information on cancer treatment and outcomes, yet are rarely used synergistically. To enable detection of metastatic breast cancer (MBC), we applied a semi-supervised machine learning framework to linked EMR-California Cancer Registry (CCR) data. Materials and Methods: We studied 11,459 female patients treated at Stanford Health Care who received an incident breast cancer diagnosis from 2000-2014. The dataset consisted of structured data and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results (SEER) database. We extracted information on metastatic disease from patient notes to infer a class label and then trained a regularized logistic regression model for MBC classification. We evaluated model performance on a gold standard set of set of 146 patients. Results: There are 495 patients with de novo stage IV MBC, 1,374 patients initially diagnosed with Stage 0-III disease had recurrent MBC, and 9,590 had no evidence of metastatis. The median follow-up time is 96.3 months (mean 97.8, standard deviation 46.7). The best-performing model incorporated both EMR and CCR features. The area under the receiver-operating characteristic curve=0.925 [95% confidence interval: 0.880-0.969], sensitivity=0.861, specificity=0.878 and overall accuracy=0.870. Discussion and Conclusion: A framework for MBC case detection combining EMR and CCR data achieved good sensitivity, specificity and discrimination without requiring expert-labeled examples. This approach enables population-based research on how patients die from cancer and may identify novel predictors of cancer recurrence.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Opioid Atlas: Mapping Access to Pain Medication
Authors:
Kris Sankaran,
Suzanne Tamang,
Ami Bhatt
Abstract:
Opiates are some of the most effective pain relief medications available for patients suffering from cancer and surgery-related pain. Despite the affordability and effectiveness of these medications, access to opiates is highly geographically variable. Pain researchers have attributed geographic variation to various factors including the fear of opioid addiction, diversion of legal opiods to the u…
▽ More
Opiates are some of the most effective pain relief medications available for patients suffering from cancer and surgery-related pain. Despite the affordability and effectiveness of these medications, access to opiates is highly geographically variable. Pain researchers have attributed geographic variation to various factors including the fear of opioid addiction, diversion of legal opiods to the underground market and pharmaceutical industry influences. However, the extent to which there is inequity in untreated cancer and surgery-related pain is unknown. To help opioid investigators study these questions, we designed a tool, the Opioid Atlas, for exploring data on legal opioid consumption, by country and time, collected by the International Narcotics Control Board. Our design borrows ideas from the data visualization and multivariate statistics communities, especially the principles of linking and dimensionality reduction. Our work is relevant to policymakers and pain researchers who wish to systematically assess country-level factors that contribute to differences in opioid access for patients with cancer and surgery-related pain. The Opioid Atlas, and the code behind it, is freely available with an open source license.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.