-
AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models
Authors:
Kristen M. Edwards,
Farnaz Tehranchi,
Scarlett R. Miller,
Faez Ahmed
Abstract:
The subjective evaluation of early stage engineering designs, such as conceptual sketches, traditionally relies on human experts. However, expert evaluations are time-consuming, expensive, and sometimes inconsistent. Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human exp…
▽ More
The subjective evaluation of early stage engineering designs, such as conceptual sketches, traditionally relies on human experts. However, expert evaluations are time-consuming, expensive, and sometimes inconsistent. Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human experts. However, no existing framework assesses expert equivalence. This paper introduces a rigorous statistical framework to determine whether an AI judge's ratings match those of human experts. We apply this framework in a case study evaluating four VLM-based judges on key design metrics (uniqueness, creativity, usefulness, and drawing quality). These AI judges employ various in-context learning (ICL) techniques, including uni- vs. multimodal prompts and inference-time reasoning. The same statistical framework is used to assess three trained novices for expert-equivalence. Results show that the top-performing AI judge, using text- and image-based ICL with reasoning, achieves expert-level agreement for uniqueness and drawing quality and outperforms or matches trained novices across all metrics. In 6/6 runs for both uniqueness and creativity, and 5/6 runs for both drawing quality and usefulness, its agreement with experts meets or exceeds that of the majority of trained novices. These findings suggest that reasoning-supported VLM models can achieve human-expert equivalence in design evaluation. This has implications for scaling design evaluation in education and practice, and provides a general statistical framework for validating AI judges in other domains requiring subjective content evaluation.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Capstone Experiences in Developing Augmented Reality Tables for Community Organizations
Authors:
H. Keith Edwards,
Michael R. Peterson,
Francis Cristobal
Abstract:
This paper examines two senior capstone experiences developed as augmented reality tables over the past two years. Both projects were public facing efforts that required working implementations. The first project was deployed at an astronomy center and focused on interactions between land use and ecological aspects of Hawaii Island while the second project focused more on historical sites on the s…
▽ More
This paper examines two senior capstone experiences developed as augmented reality tables over the past two years. Both projects were public facing efforts that required working implementations. The first project was deployed at an astronomy center and focused on interactions between land use and ecological aspects of Hawaii Island while the second project focused more on historical sites on the same island. Both projects leveraged brownfield development and existing code bases to allow for student success in spite of the impacts of the COVID19 pandemic.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Sketch2Prototype: Rapid Conceptual Design Exploration and Prototyping with Generative AI
Authors:
Kristen M. Edwards,
Brandon Man,
Faez Ahmed
Abstract:
Sketch2Prototype is an AI-based framework that transforms a hand-drawn sketch into a diverse set of 2D images and 3D prototypes through sketch-to-text, text-to-image, and image-to-3D stages. This framework, shown across various sketches, rapidly generates text, image, and 3D modalities for enhanced early-stage design exploration. We show that using text as an intermediate modality outperforms dire…
▽ More
Sketch2Prototype is an AI-based framework that transforms a hand-drawn sketch into a diverse set of 2D images and 3D prototypes through sketch-to-text, text-to-image, and image-to-3D stages. This framework, shown across various sketches, rapidly generates text, image, and 3D modalities for enhanced early-stage design exploration. We show that using text as an intermediate modality outperforms direct sketch-to-3D baselines for generating diverse and manufacturable 3D models. We find limitations in current image-to-3D techniques, while noting the value of the text modality for user-feedback and iterative design augmentation.
△ Less
Submitted 25 March, 2024;
originally announced May 2024.
-
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
Authors:
Cyril Picard,
Kristen M. Edwards,
Anna C. Doris,
Brandon Man,
Giorgio Giannone,
Md Ferdous Alam,
Faez Ahmed
Abstract:
Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustome…
▽ More
Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs' proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.
△ Less
Submitted 9 December, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
ADVISE: AI-accelerated Design of Evidence Synthesis for Global Development
Authors:
Kristen M. Edwards,
Binyang Song,
Jaron Porciello,
Mark Engelbert,
Carolyn Huang,
Faez Ahmed
Abstract:
When designing evidence-based policies and programs, decision-makers must distill key information from a vast and rapidly growing literature base. Identifying relevant literature from raw search results is time and resource intensive, and is often done by manual screening. In this study, we develop an AI agent based on a bidirectional encoder representations from transformers (BERT) model and inco…
▽ More
When designing evidence-based policies and programs, decision-makers must distill key information from a vast and rapidly growing literature base. Identifying relevant literature from raw search results is time and resource intensive, and is often done by manual screening. In this study, we develop an AI agent based on a bidirectional encoder representations from transformers (BERT) model and incorporate it into a human team designing an evidence synthesis product for global development. We explore the effectiveness of the human-AI hybrid team in accelerating the evidence synthesis process. To further improve team efficiency, we enhance the human-AI hybrid team through active learning (AL). Specifically, we explore different sampling strategies, including random sampling, least confidence (LC) sampling, and highest priority (HP) sampling, to study their influence on the collaborative screening process. Results show that incorporating the BERT-based AI agent into the human team can reduce the human screening effort by 68.5% compared to the case of no AI assistance and by 16.8% compared to the case of using a support vector machine (SVM)-based AI agent for identifying 80% of all relevant documents. When we apply the HP sampling strategy for AL, the human screening effort can be reduced even more: by 78.3% for identifying 80% of all relevant documents compared to no AI assistance. We apply the AL-enhanced human-AI hybrid teaming workflow in the design process of three evidence gap maps (EGMs) for USAID and find it to be highly effective. These findings demonstrate how AI can accelerate the development of evidence synthesis products and promote timely evidence-based decision making in global development in a human-AI hybrid teaming context.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
StickyLand: Breaking the Linear Presentation of Computational Notebooks
Authors:
Zijie J. Wang,
Katie Dai,
W. Keith Edwards
Abstract:
How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is often challenging to organize code in notebooks, partially because there is a mismatch between the linear presentation of code and the non-linear pro…
▽ More
How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is often challenging to organize code in notebooks, partially because there is a mismatch between the linear presentation of code and the non-linear process of exploratory data analysis. We present StickyLand, a notebook extension for empowering users to freely organize their code in non-linear ways. With sticky cells that are always shown on the screen, users can quickly access their notes, instantly observe experiment results, and easily build interactive dashboards that support complex visual analytics. Case studies highlight how our tool can enhance notebook users's productivity and identify opportunities for future notebook designs. StickyLand is available at https://github.com/xiaohk/stickyland.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Comparison of Update and Genetic Training Algorithms in a Memristor Crossbar Perceptron
Authors:
Kyle N. Edwards,
Xiao Shen
Abstract:
Memristor-based computer architectures are becoming more attractive as a possible choice of hardware for the implementation of neural networks. However, at present, memristor technologies are susceptible to a variety of failure modes, a serious concern in any application where regular access to the hardware may not be expected or even possible. In this study, we investigate whether certain trainin…
▽ More
Memristor-based computer architectures are becoming more attractive as a possible choice of hardware for the implementation of neural networks. However, at present, memristor technologies are susceptible to a variety of failure modes, a serious concern in any application where regular access to the hardware may not be expected or even possible. In this study, we investigate whether certain training algorithms may be more resilient to particular hardware failure modes, and therefore more suitable for use in those applications. We implement two training algorithms -- a local update scheme and a genetic algorithm -- in a simulated memristor crossbar, and compare their ability to train for a simple image classification task as an increasing number of memristors fail to adjust their conductance. We demonstrate that there is a clear distinction between the two algorithms in several measures of the rate of failure to train.
△ Less
Submitted 18 February, 2022; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Super learning in the SAS system
Authors:
Alexander P. Keil,
Daniel Westreich,
Jessie K Edwards,
Stephen R Cole
Abstract:
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. An implementation of stacking, called super learning, has been developed as a general approach to supervised learning and has seen frequent usage, in part due to the availability of an R package. We develop super…
▽ More
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. An implementation of stacking, called super learning, has been developed as a general approach to supervised learning and has seen frequent usage, in part due to the availability of an R package. We develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package.
Methods: Following previous work using the R SuperLearner package we assess the performance of super learning in a number of domains. We compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a predictive model as well in a set of 14 publicly available datasets to assess cross-validated accuracy.
Results: Across the simulated data and the publicly available data, the SAS macro performed similarly to the R package, despite a different set of potential algorithms available natively in R and SAS.
Conclusions: Our super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.
△ Less
Submitted 31 July, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Half-integral linkages in highly connected directed graphs
Authors:
Katherine Edwards,
Irene Muzi,
Paul Wollan
Abstract:
We study the half-integral $k$-Directed Disjoint Paths Problem ($\tfrac12$kDDPP) in highly strongly connected digraphs. The integral kDDPP is NP-complete even when restricted to instances where $k=2$, and the input graph is $L$-strongly connected, for any $L\geq 1$. We show that when the integrality condition is relaxed to allow each vertex to be used in two paths, the problem becomes efficiently…
▽ More
We study the half-integral $k$-Directed Disjoint Paths Problem ($\tfrac12$kDDPP) in highly strongly connected digraphs. The integral kDDPP is NP-complete even when restricted to instances where $k=2$, and the input graph is $L$-strongly connected, for any $L\geq 1$. We show that when the integrality condition is relaxed to allow each vertex to be used in two paths, the problem becomes efficiently solvable in highly connected digraphs (even with $k$ as part of the input). Specifically, we show that there is an absolute constant $c$ such that for each $k\geq 2$ there exists $L(k)$ such that $\tfrac12$kDDPP is solvable in time $O(|V(G)|^c)$ for a $L(k)$-strongly connected directed graph $G$. As the function $L(k)$ grows rather quickly, we also show that $\tfrac12$kDDPP is solvable in time $O(|V(G)|^{f(k)})$ in $(36k^3+2k)$-strongly connected directed graphs. We also show that for each $ε<1$ deciding half-integral feasibility of kDDPP instances is NP-complete when $k$ is given as part of the input, even when restricted to graphs with strong connectivity $εk$.
△ Less
Submitted 3 November, 2016;
originally announced November 2016.
-
Fast approximation algorithms for $p$-centres in large $δ$-hyperbolic graphs
Authors:
Katherine Edwards,
W. Sean Kennedy,
Iraj Saniee
Abstract:
We provide a quasilinear time algorithm for the $p$-center problem with an additive error less than or equal to 3 times the input graph's hyperbolic constant. Specifically, for the graph $G=(V,E)$ with $n$ vertices, $m$ edges and hyperbolic constant $δ$, we construct an algorithm for $p$-centers in time $O(p(δ+1)(n+m)\log(n))$ with radius not exceeding $r_p + δ$ when $p \leq 2$ and $r_p + 3δ$ when…
▽ More
We provide a quasilinear time algorithm for the $p$-center problem with an additive error less than or equal to 3 times the input graph's hyperbolic constant. Specifically, for the graph $G=(V,E)$ with $n$ vertices, $m$ edges and hyperbolic constant $δ$, we construct an algorithm for $p$-centers in time $O(p(δ+1)(n+m)\log(n))$ with radius not exceeding $r_p + δ$ when $p \leq 2$ and $r_p + 3δ$ when $p \geq 3$, where $r_p$ are the optimal radii. Prior work identified $p$-centers with accuracy $r_p+δ$ but with time complexity $O((n^3\log n + n^2m)\log(diam(G)))$ which is impractical for large graphs.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
Concentration of the number of solutions of random planted CSPs and Goldreich's one-way candidates
Authors:
Emmanuel Abbe,
Katherine Edwards
Abstract:
This paper shows that the logarithm of the number of solutions of a random planted $k$-SAT formula concentrates around a deterministic $n$-independent threshold. Specifically, if $F^*_{k}(α,n)$ is a random $k$-SAT formula on $n$ variables, with clause density $α$ and with a uniformly drawn planted solution, there exists a function $φ_k(\cdot)$ such that, besides for some $α$ in a set of Lesbegue m…
▽ More
This paper shows that the logarithm of the number of solutions of a random planted $k$-SAT formula concentrates around a deterministic $n$-independent threshold. Specifically, if $F^*_{k}(α,n)$ is a random $k$-SAT formula on $n$ variables, with clause density $α$ and with a uniformly drawn planted solution, there exists a function $φ_k(\cdot)$ such that, besides for some $α$ in a set of Lesbegue measure zero, we have $ \frac{1}{n}\log Z(F^*_{k}(α,n)) \to φ_k(α)$ in probability, where $Z(F)$ is the number of solutions of the formula $F$. This settles a problem left open in Abbe-Montanari RANDOM 2013, where the concentration is obtained only for the expected logarithm over the clause distribution. The result is also extended to a more general class of random planted CSPs; in particular, it is shown that the number of pre-images for the Goldreich one-way function model concentrates for some choices of the predicates.
△ Less
Submitted 30 April, 2015;
originally announced April 2015.
-
Learning from FITS: Limitations in use in modern astronomical research
Authors:
Brian Thomas,
Tim Jenness,
Frossie Economou,
Perry Greenfield,
Paul Hirst,
David S. Berry,
Erik Bray,
Norman Gray,
Demitri Muna,
James Turner,
Miguel de Val-Borro,
Juande Santander-Vela,
David Shupe,
John Good,
G. Bruce Berriman,
Slava Kitaeff,
Jonathan Fay,
Omar Laurino,
Anastasia Alexov,
Walter Landry,
Joe Masters,
Adam Brazier,
Reinhold Schaaf,
Kevin Edwards,
Russell O. Redman
, et al. (13 additional authors not shown)
Abstract:
The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data.…
▽ More
The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data. The authors of the FITS standard could not anticipate the challenges which we are facing today in astronomical computing. Difficulties we now face include, but are not limited to, addressing the need to handle an expanded range of specialized data product types (data models), being more conducive to the networked exchange and storage of data, handling very large datasets, and capturing significantly more complex metadata and data relationships.
There are members of the community today who find some or all of these limitations unworkable, and have decided to move ahead with storing data in other formats. If this fragmentation continues, we risk abandoning the advantages of broad interoperability, and ready archivability, that the FITS format provides for astronomy. In this paper we detail some selected important problems which exist within the FITS standard today. These problems may provide insight into deeper underlying issues which reside in the format and we provide a discussion of some lessons learned. It is not our intention here to prescribe specific remedies to these issues; rather, it is to call attention of the FITS and greater astronomical computing communities to these problems in the hope that it will spur action to address them.
△ Less
Submitted 10 February, 2015; v1 submitted 3 February, 2015;
originally announced February 2015.
-
Edge-colouring seven-regular planar graphs
Authors:
Maria Chudnovsky,
Katherine Edwards,
Ken-ichi Kawarabayashi,
Paul Seymour
Abstract:
A conjecture due to the fourth author states that every $d$-regular planar multigraph can be $d$-edge-coloured, provided that for every odd set $X$ of vertices, there are at least $d$ edges between $X$ and its complement. For $d = 3$ this is the four-colour theorem, and the conjecture has been proved for all $d\le 8$, by various authors. In particular, two of us proved it when $d=7$; and then thre…
▽ More
A conjecture due to the fourth author states that every $d$-regular planar multigraph can be $d$-edge-coloured, provided that for every odd set $X$ of vertices, there are at least $d$ edges between $X$ and its complement. For $d = 3$ this is the four-colour theorem, and the conjecture has been proved for all $d\le 8$, by various authors. In particular, two of us proved it when $d=7$; and then three of us proved it when $d=8$. The methods used for the latter give a proof in the $d=7$ case that is simpler than the original, and we present it here.
△ Less
Submitted 27 October, 2012;
originally announced October 2012.
-
Edge-colouring eight-regular planar graphs
Authors:
Maria Chudnovsky,
Katherine Edwards,
Paul Seymour
Abstract:
It was conjectured by the third author in about 1973 that every $d$-regular planar graph (possibly with parallel edges) can be $d$-edge-coloured, provided that for every odd set $X$ of vertices, there are at least $d$ edges between $X$ and its complement. For $d = 3$ this is the four-colour theorem, and the conjecture has been proved for all $d\le 7$, by various authors. Here we prove it for…
▽ More
It was conjectured by the third author in about 1973 that every $d$-regular planar graph (possibly with parallel edges) can be $d$-edge-coloured, provided that for every odd set $X$ of vertices, there are at least $d$ edges between $X$ and its complement. For $d = 3$ this is the four-colour theorem, and the conjecture has been proved for all $d\le 7$, by various authors. Here we prove it for $d = 8$.
△ Less
Submitted 6 September, 2012;
originally announced September 2012.
-
A superlocal version of Reed's Conjecture
Authors:
Katherine Edwards,
Andrew D. King
Abstract:
Reed's well-known $ω$, $Δ$, $χ$ conjecture proposes that every graph satisfies $χ\leq \lceil \frac 12(Δ+1+ω)\rceil$. The second author formulated a {\em local strengthening} of this conjecture that considers a bound supplied by the neighbourhood of a single vertex. Following the idea that the chromatic number cannot be greatly affected by any particular stable set of vertices, we propose a further…
▽ More
Reed's well-known $ω$, $Δ$, $χ$ conjecture proposes that every graph satisfies $χ\leq \lceil \frac 12(Δ+1+ω)\rceil$. The second author formulated a {\em local strengthening} of this conjecture that considers a bound supplied by the neighbourhood of a single vertex. Following the idea that the chromatic number cannot be greatly affected by any particular stable set of vertices, we propose a further strengthening that considers a bound supplied by the neighbourhoods of two adjacent vertices. We provide some fundamental evidence in support, namely that the stronger bound holds in the fractional relaxation and holds for both quasi-line graphs and graphs with stability number two. We also conjecture that in the fractional version, we can push the locality even further.
△ Less
Submitted 14 November, 2014; v1 submitted 26 August, 2012;
originally announced August 2012.
-
Bounding the fractional chromatic number of $K_Δ$-free graphs
Authors:
Katherine Edwards,
Andrew D. King
Abstract:
King, Lu, and Peng recently proved that for $Δ\geq 4$, any $K_Δ$-free graph with maximum degree $Δ$ has fractional chromatic number at most $Δ-\tfrac{2}{67}$ unless it is isomorphic to $C_5\boxtimes K_2$ or $C_8^2$. Using a different approach we give improved bounds for $Δ\geq 6$ and pose several related conjectures. Our proof relies on a weighted local generalization of the fractional relaxation…
▽ More
King, Lu, and Peng recently proved that for $Δ\geq 4$, any $K_Δ$-free graph with maximum degree $Δ$ has fractional chromatic number at most $Δ-\tfrac{2}{67}$ unless it is isomorphic to $C_5\boxtimes K_2$ or $C_8^2$. Using a different approach we give improved bounds for $Δ\geq 6$ and pose several related conjectures. Our proof relies on a weighted local generalization of the fractional relaxation of Reed's $ω$, $Δ$, $χ$ conjecture.
△ Less
Submitted 30 March, 2013; v1 submitted 11 June, 2012;
originally announced June 2012.
-
A note on hitting maximum and maximal cliques with a stable set
Authors:
Demetres Christofides,
Katherine Edwards,
Andrew D. King
Abstract:
It was recently proved that any graph satisfying $ω> \frac 23(Δ+1)$ contains a stable set hitting every maximum clique. In this note we prove that the same is true for graphs satisfying $ω\geq \frac 23(Δ+1)$ unless the graph is the strong product of $K_{ω/2}$ and an odd hole. We also provide a counterexample to a recent conjecture on the existence of a stable set hitting every sufficiently large m…
▽ More
It was recently proved that any graph satisfying $ω> \frac 23(Δ+1)$ contains a stable set hitting every maximum clique. In this note we prove that the same is true for graphs satisfying $ω\geq \frac 23(Δ+1)$ unless the graph is the strong product of $K_{ω/2}$ and an odd hole. We also provide a counterexample to a recent conjecture on the existence of a stable set hitting every sufficiently large maximal clique.
△ Less
Submitted 28 May, 2012; v1 submitted 14 September, 2011;
originally announced September 2011.