-
Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience
Authors:
Loïc Pottier,
Konstantia Georgouli,
Timothy S. Carpenter,
Fikret Aydin,
Jeremy O. B. Tempkin,
Dwight V. Nissley,
Frederick H. Streitz,
Thomas R. W. Scogland,
Peer-Timo Bremer,
Felice C. Lightstone,
Helgi I. Ingólfsson
Abstract:
Computational models have become one of the prevalent methods to model complex phenomena. To accurately model complex interactions, such as detailed biomolecular interactions, scientists often rely on multiscale models comprised of several internal models operating at difference scales, ranging from microscopic to macroscopic length and time scales. Bridging the gap between different time and leng…
▽ More
Computational models have become one of the prevalent methods to model complex phenomena. To accurately model complex interactions, such as detailed biomolecular interactions, scientists often rely on multiscale models comprised of several internal models operating at difference scales, ranging from microscopic to macroscopic length and time scales. Bridging the gap between different time and length scales has historically been challenging but the advent of newer machine learning (ML) approaches has shown promise for tackling that task. Multiscale models require massive amounts of computational power and a powerful workflow management system. Orchestrating ML-driven multiscale studies on parallel systems with thousands of nodes is challenging, the workflow must schedule, allocate and control thousands of simulations operating at different scales. Here, we discuss the massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), a multiscale workflow management infrastructure, that can orchestrate thousands of molecular dynamics (MD) simulations operating at different timescales, spanning from millisecond to nanosecond. More specifically, we introduce a novel version of MuMMI called "mini-MuMMI". Mini-MuMMI is a curated version of MuMMI designed to run on modest HPC systems or even laptops whereas MuMMI requires larger HPC systems. We demonstrate mini-MuMMI utility by exploring RAS-RAF membrane interactions and discuss the different challenges behind the generalization of multiscale workflows and how mini-MuMMI can be leveraged to target a broader range of applications outside of MD and RAS-RAF interactions.
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models
Authors:
Hyuna Kwon,
Tim Hsu,
Wenyu Sun,
Wonseok Jeong,
Fikret Aydin,
James Chapman,
Xiao Chen,
Matthew R. Carbone,
Deyu Lu,
Fei Zhou,
Tuan Anh Pham
Abstract:
The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of…
▽ More
The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of amorphous carbons ($a$-C) as a representative material system from the target X-ray absorption near edge structure (XANES) spectra--a common experimental technique to probe atomic structures of materials. We show that conditional generation guided by XANES spectra reproduces key features of the target structures. Furthermore, we show that our model can steer the generative process to tailor atomic arrangements for a specific XANES spectrum. Finally, our generative model exhibits a remarkable scale-agnostic property, thereby enabling generation of realistic, large-scale structures through learning from a small-scale dataset (i.e., with small unit cells). Our work represents a significant stride in bridging the gap between materials characterization and atomic structure determination; in addition, it can be leveraged for materials discovery in exploring various material properties as targeted.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
JugglePAC: a Pipelined Accumulation Circuit
Authors:
Ahmad Houraniah,
H. Fatih Ugurdag,
Furkan Aydin
Abstract:
Reducing a set of numbers to a single value is a fundamental operation in applications such as signal processing, data compression, scientific computing, and neural networks. Accumulation, which involves summing a dataset to obtain a single result, is crucial for these tasks. Due to hardware constraints, large vectors or matrices often cannot be fully stored in memory and must be read sequentially…
▽ More
Reducing a set of numbers to a single value is a fundamental operation in applications such as signal processing, data compression, scientific computing, and neural networks. Accumulation, which involves summing a dataset to obtain a single result, is crucial for these tasks. Due to hardware constraints, large vectors or matrices often cannot be fully stored in memory and must be read sequentially, one item per clock cycle. For high-speed inputs, such as rapidly arriving floating-point numbers, pipelined adders are necessary to maintain performance. However, pipelining introduces multiple intermediate sums and requires delays between back-to-back datasets unless their processing is overlapped. In this paper, we present JugglePAC, a novel accumulation circuit designed to address these challenges. JugglePAC operates quickly, is area-efficient, and features a fully pipelined design. It effectively manages back-to-back variable-length datasets while consistently producing results in the correct input order. Compared to the state-of-the-art, JugglePAC achieves higher throughput and reduces area complexity, offering significant improvements in performance and efficiency.
△ Less
Submitted 16 September, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Identifying Orientation-specific Lipid-protein Fingerprints using Deep Learning
Authors:
Fikret Aydin,
Konstantia Georgouli,
Gautham Dharuman,
James N. Glosli,
Felice C. Lightstone,
Helgi I. Ingólfsson,
Peer-Timo Bremer,
Harsh Bhatia
Abstract:
Improved understanding of the relation between the behavior of RAS and RAF proteins and the local lipid environment in the cell membrane is critical for getting insights into the mechanisms underlying cancer formation. In this work, we employ deep learning (DL) to learn this relationship by predicting protein orientational states of RAS and RAS-RAF protein complexes with respect to the lipid membr…
▽ More
Improved understanding of the relation between the behavior of RAS and RAF proteins and the local lipid environment in the cell membrane is critical for getting insights into the mechanisms underlying cancer formation. In this work, we employ deep learning (DL) to learn this relationship by predicting protein orientational states of RAS and RAS-RAF protein complexes with respect to the lipid membrane based on the lipid densities around the protein domains from coarse-grained (CG) molecular dynamics (MD) simulations. Our DL model can predict six protein states with an overall accuracy of over 80%. The findings of this work offer new insights into how the proteins modulate the lipid environment, which in turn may assist designing novel therapies to regulate such interactions in the mechanisms associated with cancer development.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Emerging Patterns in the Continuum Representation of Protein-Lipid Fingerprints
Authors:
Konstantia Georgouli,
Helgi I Ingólfsson,
Fikret Aydin,
Mark Heimann,
Felice C Lightstone,
Peer-Timo Bremer,
Harsh Bhatia
Abstract:
Capturing intricate biological phenomena often requires multiscale modeling where coarse and inexpensive models are developed using limited components of expensive and high-fidelity models. Here, we consider such a multiscale framework in the context of cancer biology and address the challenge of evaluating the descriptive capabilities of a continuum model developed using 1-dimensional statistics…
▽ More
Capturing intricate biological phenomena often requires multiscale modeling where coarse and inexpensive models are developed using limited components of expensive and high-fidelity models. Here, we consider such a multiscale framework in the context of cancer biology and address the challenge of evaluating the descriptive capabilities of a continuum model developed using 1-dimensional statistics from a molecular dynamics model. Using deep learning, we develop a highly predictive classification model that identifies complex and emergent behavior from the continuum model. With over 99.9% accuracy demonstrated for two simulations, our approach confirms the existence of protein-specific "lipid fingerprints", i.e. spatial rearrangements of lipids in response to proteins of interest. Through this demonstration, our model also provides external validation of the continuum model, affirms the value of such multiscale modeling, and can foster new insights through further analysis of these fingerprints.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
Medical Multimodal Classifiers Under Scarce Data Condition
Authors:
Faik Aydin,
Maggie Zhang,
Michelle Ananda-Rajah,
Gholamreza Haffari
Abstract:
Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas o…
▽ More
Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas of interest. The detection of the anomalies is achieved through a novel technique which extends the integrated gradients methodology with an unsupervised clustering algorithm. This technique also introduces a tuning parameter which trades off true positive signals to denoise false positive signals in the detection process. To overcome the challenges of the small training dataset which only has 3K frontal X-ray images and medical reports in pairs, we have adopted transfer learning for the multimodal which concatenates the layers of image and text submodels. The image submodel was trained on the vast ChestX-ray14 dataset, while the text submodel transferred a pertained word embedding layer from a hospital-specific corpus. Experimental results show that our multimodal improves the accuracy of the classification by 4% and 7% on average of 50 epochs, compared to the individual text and image model, respectively.
△ Less
Submitted 23 February, 2019;
originally announced February 2019.