Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
Authors:
Yoel Zimmermann,
Adib Bazgir,
Zartashia Afzal,
Fariha Agbere,
Qianxiang Ai,
Nawaf Alampara,
Alexander Al-Feghali,
Mehrad Ansari,
Dmytro Antypov,
Amro Aswad,
Jiaru Bai,
Viktoriia Baibakova,
Devi Dutta Biswajeet,
Erik Bitzek,
Joshua D. Bocarsly,
Anna Borisova,
Andres M Bran,
L. Catherine Brinson,
Marcel Moran Calderon,
Alessandro Canalicchio,
Victor Chen,
Yuan Chiang,
Defne Circi,
Benjamin Charmes,
Vikrant Chaudhary
, et al. (119 additional authors not shown)
Abstract:
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) mo…
▽ More
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.
△ Less
Submitted 2 January, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
Are large language models superhuman chemists?
Authors:
Adrian Mirza,
Nawaf Alampara,
Sreekanth Kunchapu,
Martiño Ríos-García,
Benedict Emoekabu,
Aswanth Krishnan,
Tanya Gupta,
Mara Schilling-Wilhelmi,
Macjonathan Okereke,
Anagha Aneesh,
Amir Mohammad Elahi,
Mehrdad Asgari,
Juliane Eberhardt,
Hani M. Elbeheiry,
María Victoria Gil,
Maximilian Greiner,
Caroline T. Holick,
Christina Glaubitz,
Tim Hoffmann,
Abdelrahman Ibrahim,
Lea C. Klepsch,
Yannik Köster,
Fabian Alexander Kreth,
Jakob Meyer,
Santiago Miret
, et al. (10 additional authors not shown)
Abstract:
Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained.
However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here, we introduce "ChemBench," an automated framework…
▽ More
Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained.
However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists.
We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists in our study on average. However, the models struggle with some basic tasks and provide overconfident predictions.
These findings reveal LLMs' impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.
△ Less
Submitted 1 November, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.