UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion
Authors:
Gongbo Zhang,
Yanting Li,
Renqian Luo,
Pipi Hu,
Zeru Zhao,
Lingbo Li,
Guoqing Liu,
Zun Wang,
Ran Bi,
Kaiyuan Gao,
Liya Guo,
Yu Xie,
Chang Liu,
Jia Zhang,
Tian Xie,
Robert Pinsler,
Claudio Zeni,
Ziheng Lu,
Yingce Xia,
Marwin Segler,
Maik Riechert,
Li Yuan,
Lei Chen,
Haiguang Liu,
Tao Qin
Abstract:
Unified generation of sequence and structure for scientific data (e.g., materials, molecules, proteins) is a critical task. Existing approaches primarily rely on either autoregressive sequence models or diffusion models, each offering distinct advantages and facing notable limitations. Autoregressive models, such as GPT, Llama, and Phi-4, have demonstrated remarkable success in natural language ge…
▽ More
Unified generation of sequence and structure for scientific data (e.g., materials, molecules, proteins) is a critical task. Existing approaches primarily rely on either autoregressive sequence models or diffusion models, each offering distinct advantages and facing notable limitations. Autoregressive models, such as GPT, Llama, and Phi-4, have demonstrated remarkable success in natural language generation and have been extended to multimodal tasks (e.g., image, video, and audio) using advanced encoders like VQ-VAE to represent complex modalities as discrete sequences. However, their direct application to scientific domains is challenging due to the high precision requirements and the diverse nature of scientific data. On the other hand, diffusion models excel at generating high-dimensional scientific data, such as protein, molecule, and material structures, with remarkable accuracy. Yet, their inability to effectively model sequences limits their potential as general-purpose multimodal foundation models. To address these challenges, we propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models. This integration leverages the strengths of autoregressive models to ease the training of conditional diffusion models, while diffusion-based generative heads enhance the precision of autoregressive predictions. We validate the effectiveness of UniGenX on material and small molecule generation tasks, achieving a significant leap in state-of-the-art performance for material crystal structure prediction and establishing new state-of-the-art results for small molecule structure prediction, de novo design, and conditional generation. Notably, UniGenX demonstrates significant improvements, especially in handling long sequences for complex structures, showcasing its efficacy as a versatile tool for scientific data generation.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
A Foundation Model for the Earth System
Authors:
Cristian Bodnar,
Wessel P. Bruinsma,
Ana Lucic,
Megan Stanley,
Anna Vaughan,
Johannes Brandstetter,
Patrick Garvan,
Maik Riechert,
Jonathan A. Weyn,
Haiyu Dong,
Jayesh K. Gupta,
Kit Thambiratnam,
Alexander T. Archibald,
Chun-Chieh Wu,
Elizabeth Heider,
Max Welling,
Richard E. Turner,
Paris Perdikaris
Abstract:
Reliable forecasts of the Earth system are crucial for human progress and safety from natural disasters. Artificial intelligence offers substantial potential to improve prediction accuracy and computational efficiency in this field, however this remains underexplored in many domains. Here we introduce Aurora, a large-scale foundation model for the Earth system trained on over a million hours of di…
▽ More
Reliable forecasts of the Earth system are crucial for human progress and safety from natural disasters. Artificial intelligence offers substantial potential to improve prediction accuracy and computational efficiency in this field, however this remains underexplored in many domains. Here we introduce Aurora, a large-scale foundation model for the Earth system trained on over a million hours of diverse data. Aurora outperforms operational forecasts for air quality, ocean waves, tropical cyclone tracks, and high-resolution weather forecasting at orders of magnitude smaller computational expense than dedicated existing systems. With the ability to fine-tune Aurora to diverse application domains at only modest computational cost, Aurora represents significant progress in making actionable Earth system predictions accessible to anyone.
△ Less
Submitted 21 November, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
Open source QGIS toolkit for the Advanced Research WRF modelling system
Authors:
D. Meyer,
M. Riechert
Abstract:
The Advanced Research WRF (Weather Research and Forecasting) model is a popular atmospheric model used for research and Numerical Weather Prediction (NWP). However, despite its popularity, its set-up and configuration often demand several interdisciplinary skills that go beyond the understanding of physical processes. Pre-processing tasks, such as importing custom high-resolution datasets in the W…
▽ More
The Advanced Research WRF (Weather Research and Forecasting) model is a popular atmospheric model used for research and Numerical Weather Prediction (NWP). However, despite its popularity, its set-up and configuration often demand several interdisciplinary skills that go beyond the understanding of physical processes. Pre-processing tasks, such as importing custom high-resolution datasets in the WRF Pre-processing System (WPS), still require a considerable effort from the user. We present GIS4WRF, a free, open-source, and cross-platform QGIS Python plug-in to help scientists and practitioners with their Advanced Research WRF modelling workflows. GIS4WRF incorporates new and existing tools for data-processing, configuration, simulation, and visualization into a single graphical environment, and offers WRF-CMake binary distributions for Windows, macOS, and Linux. We highlight its main features and provide useful insights into several key approaches and techniques used in its development. We end with two example applications highlighting the contributions of GIS4WRF in simplifying several WRF-related tasks.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.