Skip to main content

Showing 1–3 of 3 results for author: Persaud, D

.
  1. arXiv:2406.06489  [pdf, other

    cond-mat.mtrl-sci

    Probing out-of-distribution generalization in machine learning for materials

    Authors: Kangming Li, Andre Niyongabo Rubungo, Xiangyun Lei, Daniel Persaud, Kamal Choudhary, Brian DeCost, Adji Bousso Dieng, Jason Hattrick-Simpers

    Abstract: Scientific machine learning (ML) endeavors to develop generalizable models with broad applicability. However, the assessment of generalizability is often based on heuristics. Here, we demonstrate in the materials science setting that heuristics based evaluations lead to substantially biased conclusions of ML generalizability and benefits of neural scaling. We evaluate generalization performance in… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2310.07044  [pdf

    cond-mat.mtrl-sci

    Reproducibility in Computational Materials Science: Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'

    Authors: Daniel Persaud, Logan Ward, Jason Hattrick-Simpers

    Abstract: The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing usefulness and capabilities of these tools, developers neglecting to follow reproducible practices creates a significant barrier for researchers looking… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Main text: 15 pages, 1 table, 1 figure

  3. On the redundancy in large material datasets: efficient and robust learning with less data

    Authors: Kangming Li, Daniel Persaud, Kamal Choudhary, Brian DeCost, Michael Greenwood, Jason Hattrick-Simpers

    Abstract: Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95 % of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant dat… ▽ More

    Submitted 25 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Main text: 9 pages, 2 tables, 6 figures. Supplemental information: 31 pages, 1 table, 24 figures

    Journal ref: Nature Communications 14, 7283 (2023)