-
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Authors:
Dominique Beaini,
Shenyang Huang,
Joao Alex Cunha,
Zhiyi Li,
Gabriela Moisescu-Pareja,
Oleksandr Dymov,
Samuel Maddrell-Mander,
Callum McLean,
Frederik Wenkel,
Luis Müller,
Jama Hussein Mohamud,
Ali Parviz,
Michael Craig,
Michał Koziarski,
Jiarui Lu,
Zhaocheng Zhu,
Cristian Gabellini,
Kerstin Klaser,
Josef Dean,
Cas Wognum,
Maciej Sypetkowski,
Guillaume Rabusseau,
Reihaneh Rabbany,
Jian Tang,
Christopher Morris
, et al. (10 additional authors not shown)
Abstract:
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by…
▽ More
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks.
△ Less
Submitted 18 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Universal two-level quantum Otto machine under a squeezed reservoir
Authors:
Rogério J. de Assis,
José S. Sales,
Jefferson A. R. da Cunha,
Norton G. de Almeida
Abstract:
We study an Otto heat machine whose working substance is a single two-level system interacting with a cold thermal reservoir and with a squeezed hot thermal reservoir. By adjusting the squeezing or the adiabaticity parameter (the probability of transition) we show that our two-level system can function as a universal heat machine, either producing net work by consuming heat or consuming work that…
▽ More
We study an Otto heat machine whose working substance is a single two-level system interacting with a cold thermal reservoir and with a squeezed hot thermal reservoir. By adjusting the squeezing or the adiabaticity parameter (the probability of transition) we show that our two-level system can function as a universal heat machine, either producing net work by consuming heat or consuming work that is used to cool or heat environments. Using our model we study the performance of these machine in the finite-time regime of the isentropic strokes, which is a regime that contributes to make them useful from a practical point of view.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
A GPU-based multi-criteria optimization algorithm for HDR brachytherapy
Authors:
Cédric Bélanger,
Songye Cui,
Yunzhi Ma,
Philippe Després,
J. Adam M. Cunha,
Luc Beaulieu
Abstract:
Currently in HDR brachytherapy planning, a manual fine-tuning of an objective function is necessary to obtain case-specific valid plans. This study intends to facilitate this process by proposing a patient-specific inverse planning algorithm for HDR prostate brachytherapy: GPU-based multi-criteria optimization (gMCO).
Two GPU-based optimization engines including simulated annealing (gSA) and a q…
▽ More
Currently in HDR brachytherapy planning, a manual fine-tuning of an objective function is necessary to obtain case-specific valid plans. This study intends to facilitate this process by proposing a patient-specific inverse planning algorithm for HDR prostate brachytherapy: GPU-based multi-criteria optimization (gMCO).
Two GPU-based optimization engines including simulated annealing (gSA) and a quasi-Newton optimizer (gL-BFGS) were implemented to compute multiple plans in parallel. After evaluating the equivalence and the computation performance of these two optimization engines, one preferred optimization engine was selected for the gMCO algorithm. Five hundred sixty-two previously treated prostate HDR cases were divided into validation set (100) and test set (462). In the validation set, the number of Pareto optimal plans to achieve the best plan quality was determined for the gMCO algorithm. In the test set, gMCO plans were compared with the physician-approved clinical plans.
Over 462 cases, the number of clinically valid plans was 428 (92.6%) for clinical plans and 461 (99.8%) for gMCO plans. The number of valid plans with target V100 coverage greater than 95% was 288 (62.3%) for clinical plans and 414 (89.6%) for gMCO plans. The mean planning time was 9.4 s for the gMCO algorithm to generate 1000 Pareto optimal plans.
In conclusion, gL-BFGS is able to compute thousands of SA equivalent treatment plans within a short time frame. Powered by gL-BFGS, an ultra-fast and robust multi-criteria optimization algorithm was implemented for HDR prostate brachytherapy. A large-scale comparison against physician approved clinical plans showed that treatment plan quality could be improved and planning time could be significantly reduced with the proposed gMCO algorithm.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
A new radiobiology-based HDR brachytherapy treatment planning algorithm used to investigate the potential for hypofractionation in cervical cancer
Authors:
Kaelyn Seeley,
I-Chow J. Hsu,
Tae Min Hong,
J. Adam Cunha
Abstract:
Most commercially available treatment planning systems for brachytherapy operate based on physical dose and do not incorporate fractionation or tissue-specific response. The purpose of this study is to investigate the potential for hypofractionation in HDR brachytherapy, thereby reducing the number of implants required. A new treatment planning algorithm was built in order to optimize based on tis…
▽ More
Most commercially available treatment planning systems for brachytherapy operate based on physical dose and do not incorporate fractionation or tissue-specific response. The purpose of this study is to investigate the potential for hypofractionation in HDR brachytherapy, thereby reducing the number of implants required. A new treatment planning algorithm was built in order to optimize based on tissue and fractionation specific parameters. Different fractionation schemes were considered for 6 patients, and plans were created using the new algorithm. A baseline fractionation scheme consisting of 5 fractions was compared to hypofractionated plans of 1 to 4 fractions. The effectiveness of each plan was evaluated using radiobiological criteria taken from GEC-ESTRO guidelines. The results of this study indicate that an optimization algorithm based on biological parameters has similar functionality to traditional planning methods with the additional ability to account for fractionation effects. Using this algorithm, it was shown that plans consisting of 3 and 4 fractions have comparable target coverage with equivalent normal tissue exposure. In some specific cases, further fractionation may present acceptable target coverage as well.
△ Less
Submitted 30 November, 2017;
originally announced December 2017.
-
Pattern transitions in a nonlocal logistic map for populations
Authors:
Fernando V. Barbosa,
André L. A. Penna,
Rogelma M. S. Ferreira,
Keila L. Novais,
Jefferson A. R. da Cunha,
Fernando A. Oliveira
Abstract:
In this work, we study the pattern solutions of doubly nonlocal logistic map that include spatial kernels in both growth and competition terms. We show that this map includes as a particular case the nonlocal Fisher-Kolmogorov equation, and we demonstrate the existence of three kinds of stationary nonlinear solutions: one uniform, one cosine type that we refer to as wavelike solution, and another…
▽ More
In this work, we study the pattern solutions of doubly nonlocal logistic map that include spatial kernels in both growth and competition terms. We show that this map includes as a particular case the nonlocal Fisher-Kolmogorov equation, and we demonstrate the existence of three kinds of stationary nonlinear solutions: one uniform, one cosine type that we refer to as wavelike solution, and another in the form of Gaussian. We also obtain analytical expressions that describe the nonlinear pattern behavior in the system, and we establish the stability criterion. We define thermodynamics grandeurs such as entropy and the order parameter. Based on this, the pattern-no-pattern and pattern-pattern transitions are properly analyzed. We show that these pattern solutions may be related to the recently observed peak adding phenomenon in nonlinear optics.
△ Less
Submitted 14 September, 2016; v1 submitted 26 January, 2016;
originally announced January 2016.
-
Pattern formation and coexistence domains for a nonlocal population dynamics
Authors:
J. A. R. da Cunha,
A. L. A. Penna,
F. A. Oliveira
Abstract:
In this communication we propose a most general equation to study pattern formation for one-species population and their limit domains in systems of length L. To accomplish this we include non-locality in the growth and competition terms where the integral kernels are now depend on characteristic length parameters alpha and beta. Therefore, we derived a parameter space (alpha,beta) where it is pos…
▽ More
In this communication we propose a most general equation to study pattern formation for one-species population and their limit domains in systems of length L. To accomplish this we include non-locality in the growth and competition terms where the integral kernels are now depend on characteristic length parameters alpha and beta. Therefore, we derived a parameter space (alpha,beta) where it is possible to analyze a coexistence curve alpha*=alpha*(β) which delimits domains for the existence (or not) of pattern formation in population dynamics systems. We show that this curve has an analogy with coexistence curve in classical thermodynamics and critical phenomena physics. We have successfully compared this model with experimental data for diffusion of Escherichia coli populations.
△ Less
Submitted 21 February, 2011;
originally announced February 2011.
-
Dosimetric equivalence of non-standard high dose rate (HDR) brachytherapy catheter patterns
Authors:
J. Adam M. Cunha,
I-Chow Hsu,
Jean Pouliot
Abstract:
Purpose: To determine whether alternative HDR prostate brachytherapy catheter patterns can result in improved dose distributions while providing better access and reducing trauma.
Methods: Prostate HDR brachytherapy uses a grid of parallel needle positions to guide the catheter insertion. This geometry does not easily allow the physician to avoid piercing the critical structures near the penil…
▽ More
Purpose: To determine whether alternative HDR prostate brachytherapy catheter patterns can result in improved dose distributions while providing better access and reducing trauma.
Methods: Prostate HDR brachytherapy uses a grid of parallel needle positions to guide the catheter insertion. This geometry does not easily allow the physician to avoid piercing the critical structures near the penile bulb nor does it provide position flexibility in the case of pubic arch interference. On CT data from ten previously-treated patients new catheters were digitized following three catheter patterns: conical, bi-conical, and fireworks. The conical patterns were used to accommodate a robotic delivery using a single entry point. The bi-conical and fireworks patterns were specifically designed to avoid the critical structures near the penile bulb. For each catheter distribution, a plan was optimized with the inverse planning algorithm, IPSA, and compared with the plan used for treatment. Irrelevant of catheter geometry, a plan must fulfill the RTOG-0321 dose criteria for target dose coverage.
Results: Thirty plans from ten patients were optimized. All non-standard patterns fulfilled the RTOG criteria when the clinical plan did. In some cases, the dose distribution was improved by better sparing the organs-at-risk.
Conclusion: Alternative catheter patterns can provide the physician with additional ways to treat patients previously considered unsuited for brachytherapy treatment (pubic arch interference) and facilitate robotic guidance of catheter insertion. In addition, alternative catheter patterns may decrease toxicity by avoidance of the critical structures near the penile bulb while still fulfilling the RTOG criteria.
△ Less
Submitted 15 April, 2009;
originally announced April 2009.
-
Melting temperature of screened Wigner crystal on helium films by molecular dynamics
Authors:
J. A. R. da Cunha,
Ladir Candido
Abstract:
Using molecular dynamics (MD) simulation, we have calculated the melting temperature of two-dimensional electron systems on $ 240$Å-$ 500$Åhelium films supported by substrates of dielectric constants $ ε_{s}=2.2-11.9$ at areal densities $n$ varying from $ 3\times 10^{9}$ cm$^{-2}$ to $ 1.3\times 10^{10}$ cm$^{-2}$. Our results are in good agreement with the available theoretical and experimental…
▽ More
Using molecular dynamics (MD) simulation, we have calculated the melting temperature of two-dimensional electron systems on $ 240$Å-$ 500$Åhelium films supported by substrates of dielectric constants $ ε_{s}=2.2-11.9$ at areal densities $n$ varying from $ 3\times 10^{9}$ cm$^{-2}$ to $ 1.3\times 10^{10}$ cm$^{-2}$. Our results are in good agreement with the available theoretical and experimental results.
△ Less
Submitted 12 November, 2004;
originally announced November 2004.