-
Developing a Natural Language Understanding Model to Characterize Cable News Bias
Authors:
Seth P. Benson,
Iain J. Cruickshank
Abstract:
Media bias has been extensively studied by both social and computational sciences. However, current work still has a large reliance on human input and subjective assessment to label biases. This is especially true for cable news research. To address these issues, we develop an unsupervised machine learning method to characterize the bias of cable news programs without any human input. This method…
▽ More
Media bias has been extensively studied by both social and computational sciences. However, current work still has a large reliance on human input and subjective assessment to label biases. This is especially true for cable news research. To address these issues, we develop an unsupervised machine learning method to characterize the bias of cable news programs without any human input. This method relies on the analysis of what topics are mentioned through Named Entity Recognition and how those topics are discussed through Stance Analysis in order to cluster programs with similar biases together. Applying our method to 2020 cable news transcripts, we find that program clusters are consistent over time and roughly correspond to the cable news network of the program. This method reveals the potential for future tools to objectively assess media bias and characterize unfamiliar media environments.
△ Less
Submitted 17 October, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Real-time high-resolution CO$_2$ geological storage prediction using nested Fourier neural operators
Authors:
Gege Wen,
Zongyi Li,
Qirui Long,
Kamyar Azizzadenesheli,
Anima Anandkumar,
Sally M. Benson
Abstract:
Carbon capture and storage (CCS) plays an essential role in global decarbonization. Scaling up CCS deployment requires accurate and high-resolution modeling of the storage reservoir pressure buildup and the gaseous plume migration. However, such modeling is very challenging at scale due to the high computational costs of existing numerical methods. This challenge leads to significant uncertainties…
▽ More
Carbon capture and storage (CCS) plays an essential role in global decarbonization. Scaling up CCS deployment requires accurate and high-resolution modeling of the storage reservoir pressure buildup and the gaseous plume migration. However, such modeling is very challenging at scale due to the high computational costs of existing numerical methods. This challenge leads to significant uncertainties in evaluating storage opportunities, which can delay the pace of large-scale CCS deployment. We introduce Nested Fourier Neural Operator (FNO), a machine-learning framework for high-resolution dynamic 3D CO2 storage modeling at a basin scale. Nested FNO produces forecasts at different refinement levels using a hierarchy of FNOs and speeds up flow prediction nearly 700,000 times compared to existing methods. By learning the solution operator for the family of governing partial differential equations, Nested FNO creates a general-purpose numerical simulator alternative for CO2 storage with diverse reservoir conditions, geological heterogeneity, and injection schemes. Our framework enables unprecedented real-time modeling and probabilistic simulations that can support the scale-up of global CCS deployment.
△ Less
Submitted 1 June, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
U-FNO -- An enhanced Fourier neural operator-based deep-learning model for multiphase flow
Authors:
Gege Wen,
Zongyi Li,
Kamyar Azizzadenesheli,
Anima Anandkumar,
Sally M. Benson
Abstract:
Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. Machine learning models trained with numerical simulation data can provide a faster alternative to traditional simulators. Here we present U-FNO, a novel neural network architecture for solving multiphase flow problems with superior accuracy, speed, and data efficiency. U-FNO is designed based on…
▽ More
Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. Machine learning models trained with numerical simulation data can provide a faster alternative to traditional simulators. Here we present U-FNO, a novel neural network architecture for solving multiphase flow problems with superior accuracy, speed, and data efficiency. U-FNO is designed based on the newly proposed Fourier neural operator (FNO), which has shown excellent performance in single-phase flows. We extend the FNO-based architecture to a highly complex CO2-water multiphase problem with wide ranges of permeability and porosity heterogeneity, anisotropy, reservoir conditions, injection configurations, flow rates, and multiphase flow properties. The U-FNO architecture is more accurate in gas saturation and pressure buildup predictions than the original FNO and a state-of-the-art convolutional neural network (CNN) benchmark. Meanwhile, it has superior data utilization efficiency, requiring only a third of the training data to achieve the equivalent accuracy as CNN. U-FNO provides superior performance in highly heterogeneous geological formations and critically important applications such as gas saturation and pressure buildup "fronts" determination. The trained model can serve as a general-purpose alternative to routine numerical simulations of 2D-radial CO2 injection problems with significant speed-ups than traditional simulators.
△ Less
Submitted 4 May, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
CCSNet: a deep learning modeling suite for CO$_2$ storage
Authors:
Gege Wen,
Catherine Hay,
Sally M. Benson
Abstract:
Numerical simulation is an essential tool for many applications involving subsurface flow and transport, yet often suffers from computational challenges due to the multi-physics nature, highly non-linear governing equations, inherent parameter uncertainties, and the need for high spatial resolutions to capture multi-scale heterogeneity. We developed CCSNet, a general-purpose deep-learning modeling…
▽ More
Numerical simulation is an essential tool for many applications involving subsurface flow and transport, yet often suffers from computational challenges due to the multi-physics nature, highly non-linear governing equations, inherent parameter uncertainties, and the need for high spatial resolutions to capture multi-scale heterogeneity. We developed CCSNet, a general-purpose deep-learning modeling suite that can act as an alternative to conventional numerical simulators for carbon capture and storage (CCS) problems where CO$_2$ is injected into saline aquifers in 2d-radial systems. CCSNet consists of a sequence of deep learning models producing all the outputs that a numerical simulator typically provides, including saturation distributions, pressure buildup, dry-out, fluid densities, mass balance, solubility trapping, and sweep efficiency. The results are 10$^3$ to 10$^4$ times faster than conventional numerical simulators. As an application of CCSNet illustrating the value of its high computational efficiency, we developed rigorous estimation techniques for the sweep efficiency and solubility trapping.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Multiphase flow prediction with deep neural networks
Authors:
Gege Wen,
Meng Tang,
Sally M. Benson
Abstract:
This paper proposes a deep neural network approach for predicting multiphase flow in heterogeneous domains with high computational efficiency. The deep neural network model is able to handle permeability heterogeneity in high dimensional systems, and can learn the interplay of viscous, gravity, and capillary forces from small data sets. Using the example of carbon dioxide (CO2) storage, we demonst…
▽ More
This paper proposes a deep neural network approach for predicting multiphase flow in heterogeneous domains with high computational efficiency. The deep neural network model is able to handle permeability heterogeneity in high dimensional systems, and can learn the interplay of viscous, gravity, and capillary forces from small data sets. Using the example of carbon dioxide (CO2) storage, we demonstrate that the model can generate highly accurate predictions of a CO2 saturation distribution given a permeability field, injection duration, injection rate, and injection location. The trained neural network model has an excellent ability to interpolate and to a limited extent, the ability to extrapolate beyond the training data ranges. To improve the prediction accuracy when the neural network model needs to extrapolate, we propose a transfer learning (fine-tuning) procedure that can quickly teach the neural network model new information without going through massive data collection and retraining. Based on this trained neural network model, a web-based tool is provided that allows users to perform CO2-water multiphase flow calculations online. With the tools provided in this paper, the deep neural network approach can provide a computationally efficient substitute for repetitive forward multiphase flow simulations, which can be adopted to the context of history matching and uncertainty quantification.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
GPCG: A Case Study in the Performance and Scalability of Optimization Algorithms
Authors:
Steven J. Benson,
Lois Curfman McInnes,
Jorge J. Moré
Abstract:
GPCG is an algorithm within the Toolkit for Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems. Originally developed by More' and Toraldo, this algorithm was designed for large-scale problems but had been implemented only for a single processor. The TAO implementation is available for a wide range of high-performance architecture, and has been tested on up to 64…
▽ More
GPCG is an algorithm within the Toolkit for Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems. Originally developed by More' and Toraldo, this algorithm was designed for large-scale problems but had been implemented only for a single processor. The TAO implementation is available for a wide range of high-performance architecture, and has been tested on up to 64 processors to solve problems with over 2.5 million variables.
△ Less
Submitted 19 January, 2001;
originally announced January 2001.