-
Benchmarking large language models for materials synthesis: the case of atomic layer deposition
Authors:
Angel Yanguas-Gil,
Matthew T. Dearing,
Jeffrey W. Elam,
Jessica C. Jones,
Sungjoon Kim,
Adnan Mohammad,
Chi Thang Nguyen,
Bratin Sengupta
Abstract:
In this work we introduce an open-ended question benchmark, ALDbench, to evaluate the performance of large language models (LLMs) in materials synthesis, and in particular in the field of atomic layer deposition, a thin film growth technique used in energy applications and microelectronics. Our benchmark comprises questions with a level of difficulty ranging from graduate level to domain expert cu…
▽ More
In this work we introduce an open-ended question benchmark, ALDbench, to evaluate the performance of large language models (LLMs) in materials synthesis, and in particular in the field of atomic layer deposition, a thin film growth technique used in energy applications and microelectronics. Our benchmark comprises questions with a level of difficulty ranging from graduate level to domain expert current with the state of the art in the field. Human experts reviewed the questions along the criteria of difficulty and specificity, and the model responses along four different criteria: overall quality, specificity, relevance, and accuracy. We ran this benchmark on an instance of OpenAI's GPT-4o. The responses from the model received a composite quality score of 3.7 on a 1 to 5 scale, consistent with a passing grade. However, 36% of the questions received at least one below average score. An in-depth analysis of the responses identified at least five instances of suspected hallucination. Finally, we observed statistically significant correlations between the difficulty of the question and the quality of the response, the difficulty of the question and the relevance of the response, and the specificity of the question and the accuracy of the response as graded by the human experts. This emphasizes the need to evaluate LLMs across multiple criteria beyond difficulty or accuracy.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Opportunities for Retrieval and Tool Augmented Large Language Models in Scientific Facilities
Authors:
Michael H. Prince,
Henry Chan,
Aikaterini Vriza,
Tao Zhou,
Varuni K. Sastry,
Matthew T. Dearing,
Ross J. Harder,
Rama K. Vasudevan,
Mathew J. Cherukara
Abstract:
Upgrades to advanced scientific user facilities such as next-generation x-ray light sources, nanoscience centers, and neutron facilities are revolutionizing our understanding of materials across the spectrum of the physical sciences, from life sciences to microelectronics. However, these facility and instrument upgrades come with a significant increase in complexity. Driven by more exacting scient…
▽ More
Upgrades to advanced scientific user facilities such as next-generation x-ray light sources, nanoscience centers, and neutron facilities are revolutionizing our understanding of materials across the spectrum of the physical sciences, from life sciences to microelectronics. However, these facility and instrument upgrades come with a significant increase in complexity. Driven by more exacting scientific needs, instruments and experiments become more intricate each year. This increased operational complexity makes it ever more challenging for domain scientists to design experiments that effectively leverage the capabilities of and operate on these advanced instruments. Large language models (LLMs) can perform complex information retrieval, assist in knowledge-intensive tasks across applications, and provide guidance on tool usage. Using x-ray light sources, leadership computing, and nanoscience centers as representative examples, we describe preliminary experiments with a Context-Aware Language Model for Science (CALMS) to assist scientists with instrument operations and complex experimentation. With the ability to retrieve relevant information from facility documentation, CALMS can answer simple questions on scientific capabilities and other operational procedures. With the ability to interface with software tools and experimental hardware, CALMS can conversationally operate scientific instruments. By making information more accessible and acting on user needs, LLMs could expand and diversify scientific facilities' users and accelerate scientific output.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Computer-Generated Holographic Optical Tweezer Arrays
Authors:
Eric R. Dufresne,
Gabriel C. Spalding,
Matthew T. Dearing,
Steven A. Sheets,
David G. Grier
Abstract:
Holographic techniques significantly extend the capabilities of laser tweezing, making possible extended trapping patterns for manipulating large numbers of particles and volumes of soft matter. We describe practical methods for creating arbitrary configurations of optical tweezers using computer-generated diffractive optical elements. While the discussion focuses on ways to create planar arrays…
▽ More
Holographic techniques significantly extend the capabilities of laser tweezing, making possible extended trapping patterns for manipulating large numbers of particles and volumes of soft matter. We describe practical methods for creating arbitrary configurations of optical tweezers using computer-generated diffractive optical elements. While the discussion focuses on ways to create planar arrays of identical tweezers, the approach can be generalized to three-dimensional arrangements of heterogeneous tweezers and extended trapping patterns.
△ Less
Submitted 28 August, 2000;
originally announced August 2000.