-
Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool
Authors:
Brian S. Freeman,
Kendall Arriola,
Dan Cottell,
Emmett Lawlor,
Matt Erdman,
Trevor Sutherland,
Brian Wells
Abstract:
This study evaluates the productivity improvements achieved using a generative artificial intelligence personal assistant tool (PAT) developed by Trane Technologies. The PAT, based on OpenAI's GPT 3.5 model, was deployed on Microsoft Azure to ensure secure access and protection of intellectual property. To assess the tool's productivity effectiveness, an experiment was conducted comparing the comp…
▽ More
This study evaluates the productivity improvements achieved using a generative artificial intelligence personal assistant tool (PAT) developed by Trane Technologies. The PAT, based on OpenAI's GPT 3.5 model, was deployed on Microsoft Azure to ensure secure access and protection of intellectual property. To assess the tool's productivity effectiveness, an experiment was conducted comparing the completion times and content quality of four common office tasks: writing an email, summarizing an article, creating instructions for a simple task, and preparing a presentation outline. Sixty-three (63) participants were randomly divided into a test group using the PAT and a control group performing the tasks manually. Results indicated significant productivity enhancements, particularly for tasks involving summarization and instruction creation, with improvements ranging from 3.3% to 69%. The study further analyzed factors such as the age of users, response word counts, and quality of responses, revealing that the PAT users generated more verbose and higher-quality content. An 'LLM-as-a-judge' method employing GPT-4 was used to grade the quality of responses, which effectively distinguished between high and low-quality outputs. The findings underscore the potential of PATs in enhancing workplace productivity and highlight areas for further research and optimization.
△ Less
Submitted 30 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Anomaly Detection of Particle Orbit in Accelerator using LSTM Deep Learning Technology
Authors:
Zhiyuan Chen,
Wei Lu,
Radhika Bhong,
Yimin Hu,
Brian Freeman,
Adam Carpenter
Abstract:
A stable, reliable, and controllable orbit lock system is crucial to an electron (or ion) accelerator because the beam orbit and beam energy instability strongly affect the quality of the beam delivered to experimental halls. Currently, when the orbit lock system fails operators must manually intervene. This paper develops a Machine Learning based fault detection methodology to identify orbit lock…
▽ More
A stable, reliable, and controllable orbit lock system is crucial to an electron (or ion) accelerator because the beam orbit and beam energy instability strongly affect the quality of the beam delivered to experimental halls. Currently, when the orbit lock system fails operators must manually intervene. This paper develops a Machine Learning based fault detection methodology to identify orbit lock anomalies and notify accelerator operations staff of the off-normal behavior. Our method is unsupervised, so it does not require labeled data. It uses Long-Short Memory Networks (LSTM) Auto Encoder to capture normal patterns and predict future values of monitoring sensors in the orbit lock system. Anomalies are detected when the prediction error exceeds a threshold. We conducted experiments using monitoring data from Jefferson Lab's Continuous Electron Beam Accelerator Facility (CEBAF). The results are promising: the percentage of real anomalies identified by our solution is 68.6%-89.3% using monitoring data of a single component in the orbit lock control system. The accuracy can be as high as 82%.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
MINT: A wrapper to make multi-modal and multi-image AI models interactive
Authors:
Jan Freyberg,
Abhijit Guha Roy,
Terry Spitz,
Beverly Freeman,
Mike Schaekermann,
Patricia Strachan,
Eva Schnider,
Renee Wong,
Dale R Webster,
Alan Karthikesalingam,
Yun Liu,
Krishnamurthy Dvijotham,
Umesh Telang
Abstract:
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method…
▽ More
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrapping a skin disease prediction model, where multiple images and a set of optional answers to $25$ standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Evaluating AI systems under uncertain ground truth: a case study in dermatology
Authors:
David Stutz,
Ali Taylan Cemgil,
Abhijit Guha Roy,
Tatiana Matejovicova,
Melih Barsbey,
Patricia Strachan,
Mike Schaekermann,
Jan Freyberg,
Rajeev Rikhye,
Beverly Freeman,
Javier Perez Matos,
Umesh Telang,
Dale R. Webster,
Yuan Liu,
Greg S. Corrado,
Yossi Matias,
Pushmeet Kohli,
Yun Liu,
Arnaud Doucet,
Alan Karthikesalingam
Abstract:
For safety, medical AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is assumed to be fixed and certain. However, this ground truth is often curated in the form of differential diagnoses. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty…
▽ More
For safety, medical AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is assumed to be fixed and certain. However, this ground truth is often curated in the form of differential diagnoses. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty through disagreement. Both forms of uncertainty are ignored in standard evaluation which aggregates these differential diagnoses to a single label. In this paper, we show that ignoring uncertainty leads to overly optimistic estimates of model performance, therefore underestimating risk associated with particular diagnostic decisions. To this end, we propose a statistical aggregation approach, where we infer a distribution on probabilities of underlying medical condition candidates themselves, based on observed annotations. This formulation naturally accounts for the potential disagreements between different experts, as well as uncertainty stemming from individual differential diagnoses, capturing the entire ground truth uncertainty. Our approach boils down to generating multiple samples of medical condition probabilities, then evaluating and averaging performance metrics based on these sampled probabilities. In skin condition classification, we find that a large portion of the dataset exhibits significant ground truth uncertainty and standard evaluation severely over-estimates performance without providing uncertainty estimates. In contrast, our framework provides uncertainty estimates on common metrics of interest such as top-k accuracy and average overlap, showing that performance can change multiple percentage points. We conclude that, while assuming a crisp ground truth can be acceptable for many AI applications, a more nuanced evaluation protocol should be utilized in medical diagnosis.
△ Less
Submitted 13 April, 2025; v1 submitted 5 July, 2023;
originally announced July 2023.
-
MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge
Authors:
Miriam Cha,
Gregory Angelides,
Mark Hamilton,
Andy Soszynski,
Brandon Swenson,
Nathaniel Maidel,
Phillip Isola,
Taylor Perron,
Bill Freeman
Abstract:
The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the mul…
▽ More
The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the multimodal representation learning communities to explore new ways of harnessing technological advancements in support of environmental monitoring. The MultiEarth Workshop also seeks to provide a common benchmark for processing multimodal remote sensing information by organizing public challenges focused on monitoring the Amazon rainforest. These challenges include estimating deforestation, detecting forest fires, translating synthetic aperture radar (SAR) images to the visible domain, and projecting environmental trends. This paper presents the challenge guidelines, datasets, and evaluation metrics. Our challenge website is available at https://sites.google.com/view/rainforest-challenge/multiearth-2023.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Developing a Series of AI Challenges for the United States Department of the Air Force
Authors:
Vijay Gadepally,
Gregory Angelides,
Andrei Barbu,
Andrew Bowne,
Laura J. Brattain,
Tamara Broderick,
Armando Cabrera,
Glenn Carl,
Ronisha Carter,
Miriam Cha,
Emilie Cowen,
Jesse Cummings,
Bill Freeman,
James Glass,
Sam Goldberg,
Mark Hamilton,
Thomas Heldt,
Kuan Wei Huang,
Phillip Isola,
Boris Katz,
Jamie Koerner,
Yen-Chen Lin,
David Mayo,
Kyle McAlpin,
Taylor Perron
, et al. (17 additional authors not shown)
Abstract:
Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme…
▽ More
Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge
Authors:
Miriam Cha,
Kuan Wei Huang,
Morgan Schmidt,
Gregory Angelides,
Mark Hamilton,
Sam Goldberg,
Armando Cabrera,
Phillip Isola,
Taylor Perron,
Bill Freeman,
Yen-Chen Lin,
Brandon Swenson,
Jean Piou
Abstract:
The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as…
▽ More
The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as well as multimodal representation learning communities to compare the relative merits of the various multimodal learning methods to deforestation estimation under well-defined and strictly comparable conditions. MultiEarth 2022 will have three sub-challenges: 1) matrix completion, 2) deforestation estimation, and 3) image-to-image translation. This paper presents the challenge guidelines, datasets, and evaluation metrics for the three sub-challenges. Our challenge website is available at https://sites.google.com/view/rainforest-challenge.
△ Less
Submitted 31 May, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Pandemics are catalysts of scientific novelty: Evidence from COVID-19
Authors:
Meijun Liu,
Yi Bu,
Chongyan Chen,
Jian Xu,
Daifeng Li,
Yan Leng,
Richard Barry Freeman,
Eric Meyer,
Wonjin Yoon,
Mujeen Sung,
Minbyul Jeong,
Jinhyuk Lee,
Jaewoo Kang,
Chao Min,
Min Song,
Yujia Zhai,
Ying Ding
Abstract:
Scientific novelty drives the efforts to invent new vaccines and solutions during the pandemic. First-time collaboration and international collaboration are two pivotal channels to expand teams' search activities for a broader scope of resources required to address the global challenge, which might facilitate the generation of novel ideas. Our analysis of 98,981 coronavirus papers suggests that sc…
▽ More
Scientific novelty drives the efforts to invent new vaccines and solutions during the pandemic. First-time collaboration and international collaboration are two pivotal channels to expand teams' search activities for a broader scope of resources required to address the global challenge, which might facilitate the generation of novel ideas. Our analysis of 98,981 coronavirus papers suggests that scientific novelty measured by the BioBERT model that is pre-trained on 29 million PubMed articles, and first-time collaboration increased after the outbreak of COVID-19, and international collaboration witnessed a sudden decrease. During COVID-19, papers with more first-time collaboration were found to be more novel and international collaboration did not hamper novelty as it had done in the normal periods. The findings suggest the necessity of reaching out for distant resources and the importance of maintaining a collaborative scientific community beyond nationalism during a pandemic.
△ Less
Submitted 14 November, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows
Authors:
Andrei Zanfir,
Eduard Gabriel Bazavan,
Hongyi Xu,
Bill Freeman,
Rahul Sukthankar,
Cristian Sminchisescu
Abstract:
Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning in complex visual scenes. In this paper we present practical semi-supervised and self-supervised models that support training and good generalization in real-world images and video. Our formulation is based o…
▽ More
Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning in complex visual scenes. In this paper we present practical semi-supervised and self-supervised models that support training and good generalization in real-world images and video. Our formulation is based on kinematic latent normalizing flow representations and dynamics, as well as differentiable, semantic body part alignment loss functions that support self-supervised learning. In extensive experiments using 3D motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, as well as image repositories like COCO, we show that the proposed methods outperform the state of the art, supporting the practical construction of an accurate family of models based on large-scale training with diverse and incompletely labeled image and video data.
△ Less
Submitted 22 August, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.