-
Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings
Authors:
Rajeev V. Rikhye,
Aaron Loh,
Grace Eunhae Hong,
Preeti Singh,
Margaret Ann Smith,
Vijaytha Muralidharan,
Doris Wong,
Rory Sayres,
Michelle Phung,
Nicolas Betancourt,
Bradley Fong,
Rachna Sahasrabudhe,
Khoban Nasim,
Alec Eschholz,
Basil Mustafa,
Jan Freyberg,
Terry Spitz,
Yossi Matias,
Greg S. Corrado,
Katherine Chou,
Dale R. Webster,
Peggy Bui,
Yuan Liu,
Yun Liu,
Justin Ko
, et al. (1 additional authors not shown)
Abstract:
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali…
▽ More
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generalizable AI that can aid in the diagnosis of skin conditions across a variety of clinical settings. In this retrospective study, we demonstrate that differences in skin condition distribution, rather than in demographics or image capture mode are the main source of errors when an AI algorithm is evaluated on data from a previously unseen source. We demonstrate a series of steps to close this generalization gap, requiring progressively more information about the new source, ranging from the condition distribution to training data enriched for data less frequently seen during training. Our results also suggest comparable performance from end-to-end fine tuning versus fine tuning solely the classification layer on top of a frozen embedding model. Our approach can inform the adaptation of AI algorithms to new settings, based on the information and resources available.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Extracting Mathematical Concepts from Text
Authors:
Jacob Collard,
Valeria de Paiva,
Brendan Fong,
Eswaran Subrahmanian
Abstract:
We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text.…
▽ More
We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set
Authors:
Roxana Daneshjou,
Kailas Vodrahalli,
Roberto A Novoa,
Melissa Jenkins,
Weixin Liang,
Veronica Rotemberg,
Justin Ko,
Susan M Swetter,
Elizabeth E Bailey,
Olivier Gevaert,
Pritam Mukherjee,
Michelle Phung,
Kiana Yekrang,
Bradley Fong,
Rachna Sahasrabudhe,
Johan A. C. Allerup,
Utako Okata-Karigane,
James Zou,
Albert Chiou
Abstract:
Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology I…
▽ More
Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. Using this dataset of 656 images, we show that state-of-the-art dermatology AI models perform substantially worse on DDI, with receiver operator curve area under the curve (ROC-AUC) dropping by 27-36 percent compared to the models' original test results. All the models performed worse on dark skin tones and uncommon diseases, which are represented in the DDI dataset. Additionally, we find that dermatologists, who typically provide visual labels for AI training and test datasets, also perform worse on images of dark skin tones and uncommon diseases compared to ground truth biopsy annotations. Finally, fine-tuning AI models on the well-characterized and diverse DDI images closed the performance gap between light and dark skin tones. Moreover, algorithms fine-tuned on diverse skin tones outperformed dermatologists on identifying malignancy on images of dark skin tones. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and diseases.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Disparities in Dermatology AI: Assessments Using Diverse Clinical Images
Authors:
Roxana Daneshjou,
Kailas Vodrahalli,
Weixin Liang,
Roberto A Novoa,
Melissa Jenkins,
Veronica Rotemberg,
Justin Ko,
Susan M Swetter,
Elizabeth E Bailey,
Olivier Gevaert,
Pritam Mukherjee,
Michelle Phung,
Kiana Yekrang,
Bradley Fong,
Rachna Sahasrabudhe,
James Zou,
Albert Chiou
Abstract:
More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show tha…
▽ More
More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show that state-of-the-art dermatology AI models perform substantially worse on DDI, with ROC-AUC dropping 29-40 percent compared to the models' original results. We find that dark skin tones and uncommon diseases, which are well represented in the DDI dataset, lead to performance drop-offs. Additionally, we show that state-of-the-art robust training methods cannot correct for these biases without diverse training data. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and across all disease.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Behavioral Mereology: A Modal Logic for Passing Constraints
Authors:
Brendan Fong,
David Jaz Myers,
David I. Spivak
Abstract:
Mereology is the study of parts and the relationships that hold between them. We introduce a behavioral approach to mereology, in which systems and their parts are known only by the types of behavior they can exhibit. Our discussion is formally topos-theoretic, and agnostic to the topos, providing maximal generality; however, by using only its internal logic we can hide the details and readers may…
▽ More
Mereology is the study of parts and the relationships that hold between them. We introduce a behavioral approach to mereology, in which systems and their parts are known only by the types of behavior they can exhibit. Our discussion is formally topos-theoretic, and agnostic to the topos, providing maximal generality; however, by using only its internal logic we can hide the details and readers may assume a completely elementary set-theoretic discussion. We consider the relationship between various parts of a whole in terms of how behavioral constraints are passed between them, and give an inter-modal logic that generalizes the usual alethic modalities in the setting of symmetric accessibility.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
String Diagrams for Regular Logic (Extended Abstract)
Authors:
Brendan Fong,
David Spivak
Abstract:
Regular logic can be regarded as the internal language of regular categories, but the logic itself is generally not given a categorical treatment. In this paper, we understand the syntax and proof rules of regular logic in terms of the free regular category FRg(T) on a set T. From this point of view, regular theories are certain monoidal 2-functors from a suitable 2-category of contexts -- the 2-c…
▽ More
Regular logic can be regarded as the internal language of regular categories, but the logic itself is generally not given a categorical treatment. In this paper, we understand the syntax and proof rules of regular logic in terms of the free regular category FRg(T) on a set T. From this point of view, regular theories are certain monoidal 2-functors from a suitable 2-category of contexts -- the 2-category of relations in FRg(T) -- to that of posets. Such functors assign to each context the set of formulas in that context, ordered by entailment. We refer to such a 2-functor as a regular calculus because it naturally gives rise to a graphical string diagram calculus in the spirit of Joyal and Street. We shall show that every natural category has an associated regular calculus, and conversely from every regular calculus one can construct a regular category.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Lenses and Learners
Authors:
Brendan Fong,
Michael Johnson
Abstract:
Lenses are a well-established structure for modelling bidirectional transformations, such as the interactions between a database and a view of it. Lenses may be symmetric or asymmetric, and may be composed, forming the morphisms of a monoidal category. More recently, the notion of a learner has been proposed: these provide a compositional way of modelling supervised learning algorithms, and again…
▽ More
Lenses are a well-established structure for modelling bidirectional transformations, such as the interactions between a database and a view of it. Lenses may be symmetric or asymmetric, and may be composed, forming the morphisms of a monoidal category. More recently, the notion of a learner has been proposed: these provide a compositional way of modelling supervised learning algorithms, and again form the morphisms of a monoidal category. In this paper, we show that the two concepts are tightly linked. We show both that there is a faithful, identity-on-objects symmetric monoidal functor embedding a category of asymmetric lenses into the category of learners, and furthermore there is such a functor embedding the category of learners into a category of symmetric lenses.
△ Less
Submitted 1 May, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Graphical Regular Logic
Authors:
Brendan Fong,
David I Spivak
Abstract:
Regular logic can be regarded as the internal language of regular categories, but the logic itself is generally not given a categorical treatment. In this paper, we understand the syntax and proof rules of regular logic in terms of the free regular category $\mathsf{FRg}(\mathrm{T})$ on a set $\mathrm{T}$. From this point of view, regular theories are certain monoidal 2-functors from a suitable 2-…
▽ More
Regular logic can be regarded as the internal language of regular categories, but the logic itself is generally not given a categorical treatment. In this paper, we understand the syntax and proof rules of regular logic in terms of the free regular category $\mathsf{FRg}(\mathrm{T})$ on a set $\mathrm{T}$. From this point of view, regular theories are certain monoidal 2-functors from a suitable 2-category of contexts---the 2-category of relations in $\mathsf{FRg}(\mathrm{T})$---to the 2-category of posets. Such functors assign to each context the set of formulas in that context, ordered by entailment. We refer to such a 2-functor as a regular calculus because it naturally gives rise to a graphical string diagram calculus in the spirit of Joyal and Street. Our key aim to prove that the category of regular categories is essentially reflective in that of regular calculi. Along the way, we demonstrate how to use this graphical calculus.
△ Less
Submitted 20 June, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.
-
Hypergraph Categories
Authors:
Brendan Fong,
David I Spivak
Abstract:
Hypergraph categories have been rediscovered at least five times, under various names, including well-supported compact closed categories, dgs-monoidal categories, and dungeon categories. Perhaps the reason they keep being reinvented is two-fold: there are many applications---including to automata, databases, circuits, linear relations, graph rewriting, and belief propagation---and yet the standar…
▽ More
Hypergraph categories have been rediscovered at least five times, under various names, including well-supported compact closed categories, dgs-monoidal categories, and dungeon categories. Perhaps the reason they keep being reinvented is two-fold: there are many applications---including to automata, databases, circuits, linear relations, graph rewriting, and belief propagation---and yet the standard definition is so involved and ornate as to be difficult to find in the literature. Indeed, a hypergraph category is, roughly speaking, a "symmetric monoidal category in which each object is equipped with the structure of a special commutative Frobenius monoid, satisfying certain coherence conditions".
Fortunately, this description can be simplified a great deal: a hypergraph category is simply a "cospan-algebra". The goal of this paper is to remove the scare-quotes and make the previous statement precise. We prove two main theorems. First is a coherence theorem for hypergraph categories, which says that every hypergraph category is equivalent to an objectwise-free hypergraph category. Second, we prove that the category of objectwise-free hypergraph categories is equivalent to the category of cospan-algebras.
△ Less
Submitted 18 January, 2019; v1 submitted 21 June, 2018;
originally announced June 2018.
-
Backprop as Functor: A compositional perspective on supervised learning
Authors:
Brendan Fong,
David I. Spivak,
Rémy Tuyéras
Abstract:
A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent---with respect to a f…
▽ More
A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent---with respect to a fixed step size and an error function satisfying a certain property---defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks.
△ Less
Submitted 1 May, 2019; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Universal Constructions for (Co)Relations: categories, monoidal categories, and props
Authors:
Brendan Fong,
Fabio Zanasi
Abstract:
Calculi of string diagrams are increasingly used to present the syntax and algebraic structure of various families of circuits, including signal flow graphs, electrical circuits and quantum processes. In many such approaches, the semantic interpretation for diagrams is given in terms of relations or corelations (generalised equivalence relations) of some kind. In this paper we show how semantic ca…
▽ More
Calculi of string diagrams are increasingly used to present the syntax and algebraic structure of various families of circuits, including signal flow graphs, electrical circuits and quantum processes. In many such approaches, the semantic interpretation for diagrams is given in terms of relations or corelations (generalised equivalence relations) of some kind. In this paper we show how semantic categories of both relations and corelations can be characterised as colimits of simpler categories. This modular perspective is important as it simplifies the task of giving a complete axiomatisation for semantic equivalence of string diagrams. Moreover, our general result unifies various theorems that are independently found in literature and are relevant for program semantics, quantum computation and control theory.
△ Less
Submitted 31 August, 2018; v1 submitted 10 October, 2017;
originally announced October 2017.
-
Decorated Corelations
Authors:
Brendan Fong
Abstract:
Let $\mathcal C$ be a category with finite colimits, and let $(\mathcal E,\mathcal M)$ be a factorisation system on $\mathcal C$ with $\mathcal M$ stable under pushouts. Writing $\mathcal C;\mathcal M^{\mathrm{op}}$ for the symmetric monoidal category with morphisms cospans of the form $\stackrel{c}\to \stackrel{m}\leftarrow$, where $c \in \mathcal C$ and $m \in \mathcal M$, we give method for con…
▽ More
Let $\mathcal C$ be a category with finite colimits, and let $(\mathcal E,\mathcal M)$ be a factorisation system on $\mathcal C$ with $\mathcal M$ stable under pushouts. Writing $\mathcal C;\mathcal M^{\mathrm{op}}$ for the symmetric monoidal category with morphisms cospans of the form $\stackrel{c}\to \stackrel{m}\leftarrow$, where $c \in \mathcal C$ and $m \in \mathcal M$, we give method for constructing a category from a symmetric lax monoidal functor $F\colon (\mathcal C; \mathcal M^{\mathrm{op}},+) \to (\mathrm{Set},\times)$. A morphism in this category, termed a \emph{decorated corelation}, comprises (i) a cospan $X \to N \leftarrow Y$ in $\mathcal C$ such that the canonical copairing $X+Y \to N$ lies in $\mathcal E$, together with (ii) an element of $FN$. Functors between decorated corelation categories can be constructed from natural transformations between the decorating functors $F$. This provides a general method for constructing hypergraph categories---symmetric monoidal categories in which each object is a special commutative Frobenius monoid in a coherent way---and their functors. Such categories are useful for modelling network languages, for example circuit diagrams, and such functors their semantics.
△ Less
Submitted 29 March, 2017;
originally announced March 2017.
-
A Universal Construction for (Co)Relations
Authors:
Brendan Fong,
Fabio Zanasi
Abstract:
Calculi of string diagrams are increasingly used to present the syntax and algebraic structure of various families of circuits, including signal flow graphs, electrical circuits and quantum processes. In many such approaches, the semantic interpretation for diagrams is given in terms of relations or corelations (generalised equivalence relations) of some kind. In this paper we show how semantic ca…
▽ More
Calculi of string diagrams are increasingly used to present the syntax and algebraic structure of various families of circuits, including signal flow graphs, electrical circuits and quantum processes. In many such approaches, the semantic interpretation for diagrams is given in terms of relations or corelations (generalised equivalence relations) of some kind. In this paper we show how semantic categories of both relations and corelations can be characterised as colimits of simpler categories. This modular perspective is important as it simplifies the task of giving a complete axiomatisation for semantic equivalence of string diagrams. Moreover, our general result unifies various theorems that are independently found in literature, including the cases of linear corelations (relevant for the semantics of electrical circuits), of partial equivalence relations and of linear subspaces (semantics of signal flow graphs and of the phase-free ZX calculus).
△ Less
Submitted 26 May, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
A categorical approach to open and interconnected dynamical systems
Authors:
Brendan Fong,
Paolo Rapisarda,
Paweł Sobociński
Abstract:
We develop a sound and complete graphical theory for discrete linear time-invariant dynamical systems. The graphical syntax, as in previous work, is closely related to the classical notion of signal flow diagrams, differently from previous work, these are understood as multi-input multi-output transducers that process streams with an \emph{infinite past} as well as an infinite future. This extende…
▽ More
We develop a sound and complete graphical theory for discrete linear time-invariant dynamical systems. The graphical syntax, as in previous work, is closely related to the classical notion of signal flow diagrams, differently from previous work, these are understood as multi-input multi-output transducers that process streams with an \emph{infinite past} as well as an infinite future. This extended semantics features non-controllable systems, and we develop a novel, structural characterisation of controllability. Our approach is formalised through the theory of props, extending the work of Bonchi, Zanasi and the third author.
△ Less
Submitted 2 February, 2016; v1 submitted 17 October, 2015;
originally announced October 2015.
-
Additive monotones for resource theories of parallel-combinable processes with discarding
Authors:
Brendan Fong,
Hugo Nava-Kopp
Abstract:
A partitioned process theory, as defined by Coecke, Fritz, and Spekkens, is a symmetric monoidal category together with an all-object-including symmetric monoidal subcategory. We think of the morphisms of this category as processes, and the morphisms of the subcategory as those processes that are freely executable. Via a construction we refer to as parallel-combinable processes with discarding, w…
▽ More
A partitioned process theory, as defined by Coecke, Fritz, and Spekkens, is a symmetric monoidal category together with an all-object-including symmetric monoidal subcategory. We think of the morphisms of this category as processes, and the morphisms of the subcategory as those processes that are freely executable. Via a construction we refer to as parallel-combinable processes with discarding, we obtain from this data a partially ordered monoid on the set of processes, with f > g if one can use the free processes to construct g from f. The structure of this partial order can then be probed using additive monotones: order-preserving monoid homomorphisms with values in the real numbers under addition. We first characterise these additive monotones in terms of the corresponding partitioned process theory.
Given enough monotones, we might hope to be able to reconstruct the order on the monoid. If so, we say that we have a complete family of monotones. In general, however, when we require our monotones to be additive monotones, such families do not exist or are hard to compute. We show the existence of complete families of additive monotones for various partitioned process theories based on the category of finite sets, in order to shed light on the way such families can be constructed.
△ Less
Submitted 4 November, 2015; v1 submitted 6 May, 2015;
originally announced May 2015.