Search | arXiv e-print repository

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Authors: Elena Musi, Nadin Kokciyan, Khalid Al-Khatib, Davide Ceolin, Emmanuelle Dietz, Klara Gutekunst, Annette Hautli-Janisz, Cristian Manuel Santibañez Yañez, Jodi Schneider, Jonas Scholz, Cor Steging, Jacky Visser, Henning Wachsmuth

Abstract: In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical… ▽ More In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking skills rather than replacing them. We introduce the concept of \textit{reasonable parrots} that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation. △ Less

Submitted 14 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

arXiv:2411.10406 [pdf, other]

How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

Authors: Masoud Mohseni, Artur Scherer, K. Grace Johnson, Oded Wertheim, Matthew Otten, Navid Anjum Aadit, Yuri Alexeev, Kirk M. Bresniker, Kerem Y. Camsari, Barbara Chapman, Soumitra Chatterjee, Gebremedhin A. Dagnew, Aniello Esposito, Farah Fahim, Marco Fiorentino, Archit Gajjar, Abdullah Khalid, Xiangzhou Kong, Bohdan Kulchytskyy, Elica Kyoseva, Ruoyu Li, P. Aaron Lott, Igor L. Markov, Robert F. McDermott, Giacomo Pedretti , et al. (16 additional authors not shown)

Abstract: In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a ful… ▽ More In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a full-stack scalable technology is largely unknown. There are significant outstanding quantum hardware, fabrication, software architecture, and algorithmic challenges that are either unresolved or overlooked. These issues could seriously undermine the arrival of utility-scale quantum computers for the foreseeable future. Here, we provide a comprehensive review of these scaling challenges. We show how the road to scaling could be paved by adopting existing semiconductor technology to build much higher-quality qubits, employing system engineering approaches, and performing distributed quantum computation within heterogeneous high-performance computing infrastructures. These opportunities for research and development could unlock certain promising applications, in particular, efficient quantum simulation/learning of quantum data generated by natural or engineered quantum systems. To estimate the true cost of such promises, we provide a detailed resource and sensitivity analysis for classically hard quantum chemistry calculations on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. Furthermore, we argue that, to tackle industry-scale classical optimization and machine learning problems in a cost-effective manner, heterogeneous quantum-probabilistic computing with custom-designed accelerators should be considered as a complementary path toward scalability. △ Less

Submitted 31 January, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

Comments: 76 pages, 46 figures. General revision, added figures, added references, added appendices

arXiv:2409.19139 [pdf]

doi 10.1016/j.chbah.2025.100171

Gaze-informed Signatures of Trust and Collaboration in Human-Autonomy Teams

Authors: Anthony J. Ries, Stéphane Aroca-Ouellette, Alessandro Roncone, Ewart J. de Visser

Abstract: In the evolving landscape of human-autonomy teaming (HAT), fostering effective collaboration and trust between human and autonomous agents is increasingly important. To explore this, we used the game Overcooked AI to create dynamic teaming scenarios featuring varying agent behaviors (clumsy, rigid, adaptive) and environmental complexities (low, medium, high). Our objectives were to assess the perf… ▽ More In the evolving landscape of human-autonomy teaming (HAT), fostering effective collaboration and trust between human and autonomous agents is increasingly important. To explore this, we used the game Overcooked AI to create dynamic teaming scenarios featuring varying agent behaviors (clumsy, rigid, adaptive) and environmental complexities (low, medium, high). Our objectives were to assess the performance of adaptive AI agents designed with hierarchical reinforcement learning for better teamwork and measure eye tracking signals related to changes in trust and collaboration. The results indicate that the adaptive agent was more effective in managing teaming and creating an equitable task distribution across environments compared to the other agents. Working with the adaptive agent resulted in better coordination, reduced collisions, more balanced task contributions, and higher trust ratings. Reduced gaze allocation, across all agents, was associated with higher trust levels, while blink count, scan path length, agent revisits and trust were predictive of the humans contribution to the team. Notably, fixation revisits on the agent increased with environmental complexity and decreased with agent versatility, offering a unique metric for measuring teammate performance monitoring. These findings underscore the importance of designing autonomous teammates that not only excel in task performance but also enhance teamwork by being more predictable and reducing the cognitive load on human team members. Additionally, this study highlights the potential of eye-tracking as an unobtrusive measure for evaluating and improving human-autonomy teams, suggesting eye gaze could be used by agents to dynamically adapt their behaviors. △ Less

Submitted 17 June, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

ACM Class: J.4

arXiv:2409.01235 [pdf, other]

MRI-based and metabolomics-based age scores act synergetically for mortality prediction shown by multi-cohort federated learning

Authors: Pedro Mateus, Swier Garst, Jing Yu, Davy Cats, Alexander G. J. Harms, Mahlet Birhanu, Marian Beekman, P. Eline Slagboom, Marcel Reinders, Jeroen van der Grond, Andre Dekker, Jacobus F. A. Jansen, Magdalena Beran, Miranda T. Schram, Pieter Jelle Visser, Justine Moonen, Mohsen Ghanbari, Gennady Roshchupkin, Dina Vojinovic, Inigo Bermejo, Hailiang Mei, Esther E. Bron

Abstract: Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to e… ▽ More Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to estimate BrainAge in three cohorts. The federated BrainAge model yielded significantly lower error for age prediction across the cohorts than locally trained models. Harmonizing the age interval between cohorts further improved BrainAge accuracy. Subsequently, we compared BrainAge with MetaboAge using federated association and survival analyses. The results showed a small association between BrainAge and MetaboAge as well as a higher predictive value for the time to mortality of both scores combined than for the individual scores. Hence, our study suggests that both aging scores capture different aspects of the aging process. △ Less

Submitted 2 September, 2024; originally announced September 2024.

ACM Class: I.2.1

arXiv:2408.12491 [pdf]

doi 10.1016/j.ebiom.2025.105642

AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

Authors: Douwe J. Spaanderman, Matthew Marzetti, Xinyi Wan, Andrew F. Scarsbrook, Philip Robinson, Edwin H. G. Oei, Jacob J. Visser, Robert Hemke, Kirsten van Langevelde, David F. Hanff, Geert J. L. H. van Leenders, Cornelis Verhoef, Dirk J. Gruühagen, Wiro J. Niessen, Stefan Klein, Martijn P. A. Starmans

Abstract: Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for… ▽ More Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9$\pm$7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1$\pm$2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods. △ Less

Submitted 31 March, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: 25 pages, 6 figures, 8 supplementary figures

Journal ref: eBioMedicine(2025), Volume 114, 105642

arXiv:2402.07746 [pdf]

doi 10.1007/s00330-024-11167-8

Minimally Interactive Segmentation of Soft-Tissue Tumors on CT and MRI using Deep Learning

Authors: Douwe J. Spaanderman, Martijn P. A. Starmans, Gonnie C. M. van Erp, David F. Hanff, Judith H. Sluijter, Anne-Rose W. Schut, Geert J. L. H. van Leenders, Cornelis Verhoef, Dirk J. Grunhagen, Wiro J. Niessen, Jacob J. Visser, Stefan Klein

Abstract: Segmentations are crucial in medical imaging to obtain morphological, volumetric, and radiomics biomarkers. Manual segmentation is accurate but not feasible in the radiologist's clinical workflow, while automatic segmentation generally obtains sub-par performance. We therefore developed a minimally interactive deep learning-based segmentation method for soft-tissue tumors (STTs) on CT and MRI. The… ▽ More Segmentations are crucial in medical imaging to obtain morphological, volumetric, and radiomics biomarkers. Manual segmentation is accurate but not feasible in the radiologist's clinical workflow, while automatic segmentation generally obtains sub-par performance. We therefore developed a minimally interactive deep learning-based segmentation method for soft-tissue tumors (STTs) on CT and MRI. The method requires the user to click six points near the tumor's extreme boundaries. These six points are transformed into a distance map and serve, with the image, as input for a Convolutional Neural Network. For training and validation, a multicenter dataset containing 514 patients and nine STT types in seven anatomical locations was used, resulting in a Dice Similarity Coefficient (DSC) of 0.85$\pm$0.11 (mean $\pm$ standard deviation (SD)) for CT and 0.84$\pm$0.12 for T1-weighted MRI, when compared to manual segmentations made by expert radiologists. Next, the method was externally validated on a dataset including five unseen STT phenotypes in extremities, achieving 0.81$\pm$0.08 for CT, 0.84$\pm$0.09 for T1-weighted MRI, and 0.88\pm0.08 for previously unseen T2-weighted fat-saturated (FS) MRI. In conclusion, our minimally interactive segmentation method effectively segments different types of STTs on CT and MRI, with robust generalization to previously unseen phenotypes and imaging modalities. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Journal ref: Eur Radiol (2024)

arXiv:2211.09390 [pdf, other]

doi 10.1109/ICE/ITMC-IAMOT55089.2022.10033313

A Study of Adoption and Effects of DevOps Practices

Authors: Tyron Offerman, Robert Blinde, Christoph Johann Stettina, Joost Visser

Abstract: Many organizations adopt DevOps practices and tools in order to break down silos within the organization, improve software quality and delivery, and increase customer satisfaction. However, the impact of the individual practices on the performance of the organization is not well known. In this paper, we collect evidence on the effects of DevOps practices and tools on organizational performance. In… ▽ More Many organizations adopt DevOps practices and tools in order to break down silos within the organization, improve software quality and delivery, and increase customer satisfaction. However, the impact of the individual practices on the performance of the organization is not well known. In this paper, we collect evidence on the effects of DevOps practices and tools on organizational performance. In an extensive literature search we identified 14 DevOps practices, consisting of 47 subpractices. Based on these practices, we conducted a global survey to study their effects in practice, and measure DevOps maturity. Across 123 respondents, working in 11 different industries, we found that 13 of the 14 DevOps practices are adopted, determined by 50\% of the participants indicating that practices are `always', `most of the time', and 'about half of the time' applied. There is a positive correlation between the adoption of all practices and independently measured maturity. In particular, practices concerning sandboxes for minimum deployment, test-driven development, and trunk based development show the lowest correlations in our data. Effects of software delivery and organizational performance are mainly perceived positive. Yet, DevOps is also considered by some to have a negative impact such as respondents mentioning the predictability of product delivery has decreased and work is less fun. Concluding, our detailed overview of DevOps practices allows more targeted application of DevOps practices to obtain its positive effects while minimizing any negative effects. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: to be published in conference proceedings of 28th IEEE ICE/ITMC & 31st IAMOT Conference IEEE

arXiv:2110.03071 [pdf, other]

Two Many Cooks: Understanding Dynamic Human-Agent Team Communication and Perception Using Overcooked 2

Authors: Andres Rosero, Faustina Dinh, Ewart J. de Visser, Tyler Shaw, Elizabeth Phillips

Abstract: This paper describes a research study that aims to investigate changes in effective communication during human-AI collaboration with special attention to the perception of competence among team members and varying levels of task load placed on the team. We will also investigate differences between human-human teamwork and human-agent teamwork. Our project will measure differences in the communicat… ▽ More This paper describes a research study that aims to investigate changes in effective communication during human-AI collaboration with special attention to the perception of competence among team members and varying levels of task load placed on the team. We will also investigate differences between human-human teamwork and human-agent teamwork. Our project will measure differences in the communication quality, team perception and performance of a human actor playing a Commercial Off - The Shelf game (COTS) with either a human teammate or a simulated AI teammate under varying task load. We argue that the increased cognitive workload associated with increases task load will be negatively associated with team performance and have a negative impact on communication quality. In addition, we argue that positive team perceptions will have a positive impact on the communication quality between a user and teammate in both the human and AI teammate conditions. This project will offer more refined insights on Human - AI relationship dynamics in collaborative tasks by considering communication quality, team perception, and performance under increasing cognitive workload. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: Presented at AI-HRI symposium as part of AAAI-FSS 2021 (arXiv:2109.10836)

Report number: AIHRI/2021/28

arXiv:2108.08618 [pdf, other]

An automated machine learning framework to optimize radiomics model construction validated on twelve clinical applications

Authors: Martijn P. A. Starmans, Sebastian R. van der Voort, Thomas Phil, Milea J. M. Timbergen, Melissa Vos, Guillaume A. Padmos, Wouter Kessels, David Hanff, Dirk J. Grunhagen, Cornelis Verhoef, Stefan Sleijfer, Martin J. van den Bent, Marion Smits, Roy S. Dwarkasing, Christopher J. Els, Federico Fiduzi, Geert J. L. H. van Leenders, Anela Blazevic, Johannes Hofland, Tessa Brabander, Renza A. H. van Gils, Gaston J. H. Franssen, Richard A. Feelders, Wouter W. de Herder, Florian E. Buisman , et al. (21 additional authors not shown)

Abstract: Predicting clinical outcomes from medical images using quantitative features (``radiomics'') requires many method design choices, Currently, in new clinical applications, finding the optimal radiomics method out of the wide range of methods relies on a manual, heuristic trial-and-error process. We introduce a novel automated framework that optimizes radiomics workflow construction per application… ▽ More Predicting clinical outcomes from medical images using quantitative features (``radiomics'') requires many method design choices, Currently, in new clinical applications, finding the optimal radiomics method out of the wide range of methods relies on a manual, heuristic trial-and-error process. We introduce a novel automated framework that optimizes radiomics workflow construction per application by standardizing the radiomics workflow in modular components, including a large collection of algorithms for each component, and formulating a combined algorithm selection and hyperparameter optimization problem. To solve it, we employ automated machine learning through two strategies (random search and Bayesian optimization) and three ensembling approaches. Results show that a medium-sized random search and straight-forward ensembling perform similar to more advanced methods while being more efficient. Validated across twelve clinical applications, our approach outperforms both a radiomics baseline and human experts. Concluding, our framework improves and streamlines radiomics research by fully automatically optimizing radiomics workflow construction. To facilitate reproducibility, we publicly release six datasets, software of the method, and code to reproduce this study. △ Less

Submitted 10 March, 2025; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: 22 pages, 3 figures, 2 tables, 1 algorithm, 3 supplementary figures, 4 supplementary tables, 1 supplementary algorithm

arXiv:2107.14039 [pdf, other]

A Checklist for Explainable AI in the Insurance Domain

Authors: Olivier Koster, Ruud Kosman, Joost Visser

Abstract: Artificial intelligence (AI) is a powerful tool to accomplish a great many tasks. This exciting branch of technology is being adopted increasingly across varying sectors, including the insurance domain. With that power arise several complications. One of which is a lack of transparency and explainability of an algorithm for experts and non-experts alike. This brings into question both the usefulne… ▽ More Artificial intelligence (AI) is a powerful tool to accomplish a great many tasks. This exciting branch of technology is being adopted increasingly across varying sectors, including the insurance domain. With that power arise several complications. One of which is a lack of transparency and explainability of an algorithm for experts and non-experts alike. This brings into question both the usefulness as well as the accuracy of the algorithm, coupled with an added difficulty to assess potential biases within the data or the model. In this paper, we investigate the current usage of AI algorithms in the Dutch insurance industry and the adoption of explainable artificial intelligence (XAI) techniques. Armed with this knowledge we design a checklist for insurance companies that should help assure quality standards regarding XAI and a solid foundation for cooperation between organisations. This checklist extends an existing checklist of SIVI, the standardisation institute for digital cooperation and innovation in Dutch insurance. △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: Preprint of short paper for QUATIC 2021 conference

ACM Class: D.2

arXiv:2105.12427 [pdf, other]

Deep Repulsive Prototypes for Adversarial Robustness

Authors: Alex Serban, Erik Poll, Joost Visser

Abstract: While many defences against adversarial examples have been proposed, finding robust machine learning models is still an open problem. The most compelling defence to date is adversarial training and consists of complementing the training data set with adversarial examples. Yet adversarial training severely impacts training time and depends on finding representative adversarial samples. In this pape… ▽ More While many defences against adversarial examples have been proposed, finding robust machine learning models is still an open problem. The most compelling defence to date is adversarial training and consists of complementing the training data set with adversarial examples. Yet adversarial training severely impacts training time and depends on finding representative adversarial samples. In this paper we propose to train models on output spaces with large class separation in order to gain robustness without adversarial training. We introduce a method to partition the output space into class prototypes with large separation and train models to preserve it. Experimental results shows that models trained with these prototypes -- which we call deep repulsive prototypes -- gain robustness competitive with adversarial training, while also preserving more accuracy on natural samples. Moreover, the models are more resilient to large perturbation sizes. For example, we obtained over 50% robustness for CIFAR-10, with 92% accuracy on natural samples and over 20% robustness for CIFAR-100, with 71% accuracy on natural samples without adversarial training. For both data sets, the models preserved robustness against large perturbations better than adversarially trained models. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.12422 [pdf, other]

Adapting Software Architectures to Machine Learning Challenges

Authors: Alex Serban, Joost Visser

Abstract: Unique developmental and operational characteristics of ML components as well as their inherent uncertainty demand robust engineering principles are used to ensure their quality. We aim to determine how software systems can be (re-) architected to enable robust integration of ML components. Towards this goal, we conducted a mixed-methods empirical study consisting of (i) a systematic literature re… ▽ More Unique developmental and operational characteristics of ML components as well as their inherent uncertainty demand robust engineering principles are used to ensure their quality. We aim to determine how software systems can be (re-) architected to enable robust integration of ML components. Towards this goal, we conducted a mixed-methods empirical study consisting of (i) a systematic literature review to identify the challenges and their solutions in software architecture for ML, (ii) semi-structured interviews with practitioners to qualitatively complement the initial findings and (iii) a survey to quantitatively validate the challenges and their solutions. We compiled and validated twenty challenges and solutions for (re-) architecting systems with ML components. Our results indicate, for example, that traditional software architecture challenges (e.g., component coupling) also play an important role when using ML components; along with new ML specific challenges (e.g., the need for continuous retraining). Moreover, the results indicate that ML heightened decision drivers, such as privacy, play a marginal role compared to traditional decision drivers, such as scalability. Using the survey we were able to establish a link between architectural solutions and software quality attributes, which enabled us to provide twenty architectural tactics used to satisfy individual quality requirements of systems with ML components. Altogether, the results of the study can be interpreted as an empirical framework that supports the process of (re-) architecting software systems with ML components. △ Less

Submitted 8 January, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: Published at the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022)

arXiv:2103.00964 [pdf, other]

Practices for Engineering Trustworthy Machine Learning Applications

Authors: Alex Serban, Koen van der Blom, Holger Hoos, Joost Visser

Abstract: Following the recent surge in adoption of machine learning (ML), the negative impact that improper use of ML can have on users and society is now also widely recognised. To address this issue, policy makers and other stakeholders, such as the European Commission or NIST, have proposed high-level guidelines aiming to promote trustworthy ML (i.e., lawful, ethical and robust). However, these guidelin… ▽ More Following the recent surge in adoption of machine learning (ML), the negative impact that improper use of ML can have on users and society is now also widely recognised. To address this issue, policy makers and other stakeholders, such as the European Commission or NIST, have proposed high-level guidelines aiming to promote trustworthy ML (i.e., lawful, ethical and robust). However, these guidelines do not specify actions to be taken by those involved in building ML systems. In this paper, we argue that guidelines related to the development of trustworthy ML can be translated to operational practices, and should become part of the ML development life cycle. Towards this goal, we ran a multi-vocal literature review, and mined operational practices from white and grey literature. Moreover, we launched a global survey to measure practice adoption and the effects of these practices. In total, we identified 14 new practices, and used them to complement an existing catalogue of ML engineering practices. Initial analysis of the survey results reveals that so far, practice adoption for trustworthy ML is relatively low. In particular, practices related to assuring security of ML components have very low adoption. Other practices enjoy slightly larger adoption, such as providing explanations to users. Our extended practice catalogue can be used by ML development teams to bridge the gap between high-level guidelines and actual development of trustworthy ML systems; it is open for review and contribution △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Published at WAIN'21 - 1st Workshop on AI Engineering - Software Engineering for AI

arXiv:2010.06824 [pdf]

doi 10.1007/s10278-022-00590-2

Differential diagnosis and molecular stratification of gastrointestinal stromal tumors on CT images using a radiomics approach

Authors: Martijn P. A. Starmans, Milea J. M. Timbergen, Melissa Vos, Michel Renckens, Dirk J. Grünhagen, Geert J. L. H. van Leenders, Roy S. Dwarkasing, François E. J. A. Willemssen, Wiro J. Niessen, Cornelis Verhoef, Stefan Sleijfer, Jacob J. Visser, Stefan Klein

Abstract: Distinguishing gastrointestinal stromal tumors (GISTs) from other intra-abdominal tumors and GISTs molecular analysis is necessary for treatment planning, but challenging due to its rarity. The aim of this study was to evaluate radiomics for distinguishing GISTs from other intra-abdominal tumors, and in GISTs, predict the c-KIT, PDGFRA,BRAF mutational status and mitotic index (MI). All 247 include… ▽ More Distinguishing gastrointestinal stromal tumors (GISTs) from other intra-abdominal tumors and GISTs molecular analysis is necessary for treatment planning, but challenging due to its rarity. The aim of this study was to evaluate radiomics for distinguishing GISTs from other intra-abdominal tumors, and in GISTs, predict the c-KIT, PDGFRA,BRAF mutational status and mitotic index (MI). All 247 included patients (125 GISTS, 122 non-GISTs) underwent a contrast-enhanced venous phase CT. The GIST vs. non-GIST radiomics model, including imaging, age, sex and location, had a mean area under the curve (AUC) of 0.82. Three radiologists had an AUC of 0.69, 0.76, and 0.84, respectively. The radiomics model had an AUC of 0.52 for c-KIT, 0.56 for c-KIT exon 11, and 0.52 for the MI. Hence, our radiomics model was able to distinguish GIST from non-GISTS with a performance similar to three radiologists, but was not able to predict the c-KIT mutation or MI. △ Less

Submitted 15 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: Martijn P.A. Starmans and Milea J.M. Timbergen contributed equally

Journal ref: J Digit Imaging (2022)

arXiv:2010.02654 [pdf, other]

Extracting Implicitly Asserted Propositions in Argumentation

Authors: Yohan Jo, Jacky Visser, Chris Reed, Eduard Hovy

Abstract: Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to im… ▽ More Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to implicitly asserted propositions in argumentation. In this paper, we examine a wide range of computational methods for extracting propositions that are implicitly asserted in questions, reported speech, and imperatives in argumentation. By evaluating the models on a corpus of 2016 U.S. presidential debates and online commentary, we demonstrate the effectiveness and limitations of the computational models. Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2008.05247 [pdf, other]

Learning to Learn from Mistakes: Robust Optimization for Adversarial Noise

Authors: Alex Serban, Erik Poll, Joost Visser

Abstract: Sensitivity to adversarial noise hinders deployment of machine learning algorithms in security-critical applications. Although many adversarial defenses have been proposed, robustness to adversarial noise remains an open problem. The most compelling defense, adversarial training, requires a substantial increase in processing time and it has been shown to overfit on the training data. In this paper… ▽ More Sensitivity to adversarial noise hinders deployment of machine learning algorithms in security-critical applications. Although many adversarial defenses have been proposed, robustness to adversarial noise remains an open problem. The most compelling defense, adversarial training, requires a substantial increase in processing time and it has been shown to overfit on the training data. In this paper, we aim to overcome these limitations by training robust models in low data regimes and transfer adversarial knowledge between different models. We train a meta-optimizer which learns to robustly optimize a model using adversarial examples and is able to transfer the knowledge learned to new models, without the need to generate new adversarial examples. Experimental results show the meta-optimizer is consistent across different architectures and data sets, suggesting it is possible to automatically patch adversarial vulnerabilities. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: Published at ICANN 2020

arXiv:2008.04884 [pdf, other]

GraphRepo: Fast Exploration in Software Repository Mining

Authors: Alex Serban, Magiel Bruntink, Joost Visser

Abstract: Mining and storage of data from software repositories is typically done on a per-project basis, where each project uses a unique combination of data schema, extraction tools, and (intermediate) storage infrastructure. We introduce GraphRepo, a tool that enables a unified approach to extract data from Git repositories, store it, and share it across repository mining projects. GraphRepo usesNeo4j, a… ▽ More Mining and storage of data from software repositories is typically done on a per-project basis, where each project uses a unique combination of data schema, extraction tools, and (intermediate) storage infrastructure. We introduce GraphRepo, a tool that enables a unified approach to extract data from Git repositories, store it, and share it across repository mining projects. GraphRepo usesNeo4j, an ACID-compliant graph database management system, and allows modular plug-in of components for repository extraction (drillers), analysis (miners), and export (mappers). The graph enables a natural way to query the data by removing the need for data normalisation. GraphRepo is built in Python and offers multiple ways to interface with the rich Python ecosystem and with big data solutions. The schema of the graph database is generic and extensible. Using GraphRepo for software repository mining offers several advantages versus creating project-specific infrastructure: (i) high performance for short-iteration exploration and scalability to large data sets (ii) easy distribution of extracted data(e.g., for replication) or sharing of extracted data among projects, and (iii) extensibility and interoperability. A set of benchmarks on four open source projects demonstrate that GraphRepo allows very fast querying of repository data, once extracted and indexed. More information can be found in the project's documentation (available at https://tinyurl.com/grepodoc) and in the project's repository (available at https://tinyurl.com/grrepo). A video demonstration isalso available online (https://tinyurl.com/grrepov) △ Less

Submitted 14 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

arXiv:2008.04094 [pdf, ps, other]

Adversarial Examples on Object Recognition: A Comprehensive Survey

Authors: Alex Serban, Erik Poll, Joost Visser

Abstract: Deep neural networks are at the forefront of machine learning research. However, despite achieving impressive performance on complex tasks, they can be very sensitive: Small perturbations of inputs can be sufficient to induce incorrect behavior. Such perturbations, called adversarial examples, are intentionally designed to test the network's sensitivity to distribution drifts. Given their surprisi… ▽ More Deep neural networks are at the forefront of machine learning research. However, despite achieving impressive performance on complex tasks, they can be very sensitive: Small perturbations of inputs can be sufficient to induce incorrect behavior. Such perturbations, called adversarial examples, are intentionally designed to test the network's sensitivity to distribution drifts. Given their surprisingly small size, a wide body of literature conjectures on their existence and how this phenomenon can be mitigated. In this article we discuss the impact of adversarial examples on security, safety, and robustness of neural networks. We start by introducing the hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models. Altogether, the goal is to provide a comprehensive and self-contained survey of this growing field of research. △ Less

Submitted 3 September, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: Published in ACM CSUR. arXiv admin note: text overlap with arXiv:1810.01185

arXiv:2008.03046 [pdf, other]

Towards Using Probabilistic Models to Design Software Systems with Inherent Uncertainty

Authors: Alex Serban, Erik Poll, Joost Visser

Abstract: The adoption of machine learning (ML) components in software systems raises new engineering challenges. In particular, the inherent uncertainty regarding functional suitability and the operation environment makes architecture evaluation and trade-off analysis difficult. We propose a software architecture evaluation method called Modeling Uncertainty During Design (MUDD) that explicitly models the… ▽ More The adoption of machine learning (ML) components in software systems raises new engineering challenges. In particular, the inherent uncertainty regarding functional suitability and the operation environment makes architecture evaluation and trade-off analysis difficult. We propose a software architecture evaluation method called Modeling Uncertainty During Design (MUDD) that explicitly models the uncertainty associated to ML components and evaluates how it propagates through a system. The method supports reasoning over how architectural patterns can mitigate uncertainty and enables comparison of different architectures focused on the interplay between ML and classical software components. While our approach is domain-agnostic and suitable for any system where uncertainty plays a central role, we demonstrate our approach using as example a perception system for autonomous driving. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: Published at the European Conference on Software Architecture (ECSA)

arXiv:2007.14130 [pdf, other]

doi 10.1145/3382494.3410681

Adoption and Effects of Software Engineering Best Practices in Machine Learning

Authors: Alex Serban, Koen van der Blom, Holger Hoos, Joost Visser

Abstract: The increasing reliance on applications with machine learning (ML) components calls for mature engineering techniques that ensure these are built in a robust and future-proof manner. We aim to empirically determine the state of the art in how teams develop, deploy and maintain software with ML components. We mined both academic and grey literature and identified 29 engineering best practices for M… ▽ More The increasing reliance on applications with machine learning (ML) components calls for mature engineering techniques that ensure these are built in a robust and future-proof manner. We aim to empirically determine the state of the art in how teams develop, deploy and maintain software with ML components. We mined both academic and grey literature and identified 29 engineering best practices for ML applications. We conducted a survey among 313 practitioners to determine the degree of adoption for these practices and to validate their perceived effects. Using the survey responses, we quantified practice adoption, differentiated along demographic characteristics, such as geography or team size. We also tested correlations and investigated linear and non-linear relationships between practices and their perceived effect using various statistical models. Our findings indicate, for example, that larger teams tend to adopt more practices, and that traditional software engineering practices tend to have lower adoption than ML specific practices. Also, the statistical models can accurately predict perceived effects such as agility, software quality and traceability, from the degree of adoption for specific sets of practices. Combining practice adoption rates with practice importance, as revealed by statistical models, we identify practices that are important but have low adoption, as well as practices that are widely adopted but are less important for the effects we studied. Overall, our survey and the analysis of responses received provide a quantitative basis for assessment and step-wise improvement of practice adoption by ML teams. △ Less

Submitted 29 July, 2020; v1 submitted 28 July, 2020; originally announced July 2020.

Comments: Accepted and published at the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2020

arXiv:1902.02693 [pdf, other]

doi 10.1109/ICIP.2019.8803767

StampNet: unsupervised multi-class object discovery

Authors: Joost Visser, Alessandro Corbetta, Vlado Menkovski, Federico Toschi

Abstract: Unsupervised object discovery in images involves uncovering recurring patterns that define objects and discriminates them against the background. This is more challenging than image clustering as the size and the location of the objects are not known: this adds additional degrees of freedom and increases the problem complexity. In this work, we propose StampNet, a novel autoencoding neural network… ▽ More Unsupervised object discovery in images involves uncovering recurring patterns that define objects and discriminates them against the background. This is more challenging than image clustering as the size and the location of the objects are not known: this adds additional degrees of freedom and increases the problem complexity. In this work, we propose StampNet, a novel autoencoding neural network that localizes shapes (objects) over a simple background in images and categorizes them simultaneously. StampNet consists of a discrete latent space that is used to categorize objects and to determine the location of the objects. The object categories are formed during the training, resulting in the discovery of a fixed set of objects. We present a set of experiments that demonstrate that StampNet is able to localize and cluster multiple overlapping shapes with varying complexity including the digits from the MNIST dataset. We also present an application of StampNet in the localization of pedestrians in overhead depth-maps. △ Less

Submitted 7 February, 2019; originally announced February 2019.

Journal ref: IEEE International Conference on Image Processing (ICIP 2019), pp. 2951-2955, 2019

arXiv:1810.01185 [pdf, ps, other]

Adversarial Examples - A Complete Characterisation of the Phenomenon

Authors: Alexandru Constantin Serban, Erik Poll, Joost Visser

Abstract: We provide a complete characterisation of the phenomenon of adversarial examples - inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial exa… ▽ More We provide a complete characterisation of the phenomenon of adversarial examples - inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples. △ Less

Submitted 17 February, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

arXiv:1701.06146 [pdf]

The Influence of Teamwork Quality on Software Team Performance

Authors: Emily Weimar, Ariadi Nugroho, Joost Visser, Aske Plaat, Martijn Goudbeek, Alexander P. Schouten

Abstract: Traditionally, software quality is thought to depend on sound software engineering and development methodologies such as structured programming and agile development. However, high quality software depends just as much on high quality collaboration within the team. Since the success rate of software development projects is low (Wateridge, 1995; The Standish Group, 2009), it is important to underst… ▽ More Traditionally, software quality is thought to depend on sound software engineering and development methodologies such as structured programming and agile development. However, high quality software depends just as much on high quality collaboration within the team. Since the success rate of software development projects is low (Wateridge, 1995; The Standish Group, 2009), it is important to understand which characteristics of interactions within software development teams significantly influence performance. Hoegl and Gemuenden (2001) reported empirical evidence for the relation between teamwork quality and software quality, using a six-factor teamwork quality (TWQ) model. This article extends the work of Hoegl and Gemuenden (2001) with the aim of finding additional factors that may influence software team performance. We introduce three new TWQ factors: trust, value sharing, and coordination of expertise. The relationship between TWQ and team performance and the improvement of the model are tested using data from 252 team members and stakeholders. Results show that teamwork quality is significantly related to team performance, as rated by both team members and stakeholders: TWQ explains 81% of the variance of team performance as rated by team members and 61% as rated by stakeholders. This study shows that trust, shared values, and coordination of expertise are important factors for team leaders to consider in order to achieve high quality software team work. △ Less

Submitted 22 January, 2017; originally announced January 2017.

arXiv:1211.7100 [pdf]

Governance of Spreadsheets through Spreadsheet Change Reviews

Authors: Miguel A. Ferreira, Joost Visser

Abstract: We present a pragmatic method for management of risks that arise due to spreadsheet use in large organizations. We combine peer-review, tool-assisted evaluation and other pre-existing approaches into a single organization-wide approach that reduces spreadsheet risk without overly restricting spreadsheet use. The method was developed in the course of several spreadsheet evaluation assignments for a… ▽ More We present a pragmatic method for management of risks that arise due to spreadsheet use in large organizations. We combine peer-review, tool-assisted evaluation and other pre-existing approaches into a single organization-wide approach that reduces spreadsheet risk without overly restricting spreadsheet use. The method was developed in the course of several spreadsheet evaluation assignments for a corporate customer. Our method addresses a number of issues pertinent to spreadsheet risks that were raised by the Sarbanes-Oxley act. △ Less

Submitted 29 November, 2012; originally announced November 2012.

Comments: 13 Pages, 1 Figure, 1 Table; Proc. European Spreadsheet Risks Int. Grp. (EuSpRIG) 2012, ISBN: 978-0-9569258-6-2

arXiv:cs/0212048 [pdf, ps, other]

Strategic polymorphism requires just two combinators!

Authors: Ralf Laemmel, Joost Visser

Abstract: In previous work, we introduced the notion of functional strategies: first-class generic functions that can traverse terms of any type while mixing uniform and type-specific behaviour. Functional strategies transpose the notion of term rewriting strategies (with coverage of traversal) to the functional programming paradigm. Meanwhile, a number of Haskell-based models and combinator suites were p… ▽ More In previous work, we introduced the notion of functional strategies: first-class generic functions that can traverse terms of any type while mixing uniform and type-specific behaviour. Functional strategies transpose the notion of term rewriting strategies (with coverage of traversal) to the functional programming paradigm. Meanwhile, a number of Haskell-based models and combinator suites were proposed to support generic programming with functional strategies. In the present paper, we provide a compact and matured reconstruction of functional strategies. We capture strategic polymorphism by just two primitive combinators. This is done without commitment to a specific functional language. We analyse the design space for implementational models of functional strategies. For completeness, we also provide an operational reference model for implementing functional strategies (in Haskell). We demonstrate the generality of our approach by reconstructing representative fragments of the Strafunski library for functional strategies. △ Less

Submitted 19 December, 2002; originally announced December 2002.

Comments: A preliminary version of this paper was presented at IFL 2002, and included in the informal preproceedings of the workshop

ACM Class: D.1.1; D.3.3; I.1.3

arXiv:cs/0204015 [pdf, ps, other]

Design Patterns for Functional Strategic Programming

Authors: Ralf Laemmel, Joost Visser

Abstract: In previous work, we introduced the fundamentals and a supporting combinator library for \emph{strategic programming}. This an idiom for generic programming based on the notion of a \emph{functional strategy}: a first-class generic function that cannot only be applied to terms of any type, but which also allows generic traversal into subterms and can be customized with type-specific behaviour.… ▽ More In previous work, we introduced the fundamentals and a supporting combinator library for \emph{strategic programming}. This an idiom for generic programming based on the notion of a \emph{functional strategy}: a first-class generic function that cannot only be applied to terms of any type, but which also allows generic traversal into subterms and can be customized with type-specific behaviour. This paper seeks to provide practicing functional programmers with pragmatic guidance in crafting their own strategic programs. We present the fundamentals and the support from a user's perspective, and we initiate a catalogue of \emph{strategy design patterns}. These design patterns aim at consolidating strategic programming expertise in accessible form. △ Less

Submitted 9 April, 2002; originally announced April 2002.

ACM Class: D.1.1; D.2.3; D.2.10

Showing 1–26 of 26 results for author: Visser, J