MINT: A wrapper to make multi-modal and multi-image AI models interactive

Freyberg, Jan; Roy, Abhijit Guha; Spitz, Terry; Freeman, Beverly; Schaekermann, Mike; Strachan, Patricia; Schnider, Eva; Wong, Renee; Webster, Dale R; Karthikesalingam, Alan; Liu, Yun; Dvijotham, Krishnamurthy; Telang, Umesh

Abstract:During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrapping a skin disease prediction model, where multiple images and a set of optional answers to $25$ standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training.

Comments:	15 pages, 7 figures
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2401.12032 [cs.HC]
	(or arXiv:2401.12032v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2401.12032

Computer Science > Human-Computer Interaction

Title:MINT: A wrapper to make multi-modal and multi-image AI models interactive

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators