Deep learning models for predicting RNA degradation via dual crowdsourcing
Authors:
Hannah K. Wayment-Steele,
Wipapat Kladwang,
Andrew M. Watkins,
Do Soon Kim,
Bojan Tunguz,
Walter Reade,
Maggie Demkin,
Jonathan Romano,
Roger Wellington-Oguri,
John J. Nicol,
Jiayang Gao,
Kazuki Onodera,
Kazuki Fujikawa,
Hanfei Mao,
Gilles Vandewiele,
Michele Tinti,
Bram Steenwinckel,
Takuya Ito,
Taiga Noumi,
Shujun He,
Keiichiro Ishi,
Youhan Lee,
Fatih Öztürk,
Anthony Chiu,
Emin Öztürk
, et al. (4 additional authors not shown)
Abstract:
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke…
▽ More
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
△ Less
Submitted 22 April, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
PIC simulation of a shock tube: Implications for wave transmission in the heliospheric boundary region
Authors:
S. Matsukiyo,
T. Noumi,
G. P. Zank,
H. Washimi,
T. Hada
Abstract:
A shock tube problem is solved numerically by using one-dimensional full particle-in-cell simulations under the condition that a relatively tenuous and weakly magnetized plasma is continuously pushed by a relatively dense and strongly magnetized plasma having supersonic relative velocity. A forward and a reverse shock and a contact discontinuity are self-consistently reproduced. The spatial width…
▽ More
A shock tube problem is solved numerically by using one-dimensional full particle-in-cell simulations under the condition that a relatively tenuous and weakly magnetized plasma is continuously pushed by a relatively dense and strongly magnetized plasma having supersonic relative velocity. A forward and a reverse shock and a contact discontinuity are self-consistently reproduced. The spatial width of the contact discontinuity increases as the angle between the discontinuity normal and ambient magnetic field decreases. The inner structure of the discontinuity shows different profiles between magnetic field and plasma density, or pressure, which is caused by a non-MHD effect of the local plasma. The region between the two shocks is turbulent. The fluctuations in the relatively dense plasma are compressible and propagating away from the contact discontinuity, although the fluctuations in the relatively tenuous plasma contain both compressible and incompressible components. The source of the compressible fluctuations in the relatively dense plasma is in the relatively tenuous plasma. Only compressible fast mode fluctuations generated in the relatively tenuous plasma are transmitted through the contact discontinuity and propagate in the relatively dense plasma. These fast mode fluctuations are steepened when passing the contact discontinuity. This wave steepening and probably other effects may cause the broadening of the wave spectrum in the very local interstellar medium plasma. The results are discussed in the context of the heliospheric boundary region or heliopause.
△ Less
Submitted 18 December, 2019;
originally announced January 2020.